Once cycle for stack reads is great, as are the CALLD and RETD instructions.
If I read your previous message correctly, then
REP #1,#64
RDLONGC INDA++,PTR++
Would load 64 longs from the hub to the cog in eight hub cycles (plus time for loading INDA, PTR, REP, and time to sync to the first hub access)
I will document the REPeat instructions next.
For what you are doing there:
REPS #64,#1 'repeat 1 instruction 64 times
SETINDA startreg 'need one spacer instruction between REPS and what is going to get repeated
RDLONG INDA++,PTRA++
REPS #64,#1 'repeat 1 instruction 64 times
SETINDA startreg 'need one spacer instruction between REPS and what is going to get repeated
RDLONG INDA++,PTRA++
If I read the specification PDF right, REPD would need two spacer instructions - right?
It needs three. The difference between REPS and REPD is that REPS uses an immediate repeat count, while REPD can use a register. REPS can execute early because the immediate repeat count is in the instruction, whereas REPD must wait for the register to be read.
I understand the difference and the requirement that they use different instructions because of the differing pipeline delays.
I could not find a REP instruction where the pipe just stalls.
Might a better way to name the REPS & REPD instructions be
REPD1 #times,#loops
REPD3 times,#loops
At least this way we are forced to remember how many instructions will be excuted. Otherwise I can see lots of wasted time debugging because we forgot how many instructions are executed before the loop takes place.
Perhaps the same could be applied for other Delayed instructions?
I've placed updated documentation, along with a .zip that contains everything anyone needs for DE0-Nano and DE2-115 Prop2 emulation. There's a new PNUT.EXE which supports the new SETINDx/FIXINDx #,# syntax, too:
I've placed updated documentation, along with a .zip that contains everything anyone needs for DE0-Nano and DE2-115 Prop2 emulation. There's a new PNUT.EXE which supports the new SETINDx/FIXINDx #,# syntax, too:
I've been working on the instruction set documentation and I've completed the parts that cover:
1) Hub memory instructions
2) Hub control instructions
3) Cog RAM indirect instructions - New # syntax for SETINDx/FIXINDx
4) Cog stack RAM instructions
There is a new PNUT.EXE in this .zip which supports the new SETINDx/FIXINDx syntax. Also, all the files anyone needs to use the DE0-Nano or DE2-115 are in here:
I've placed updated documentation, along with a .zip that contains everything anyone needs for DE0-Nano and DE2-115 Prop2 emulation. There's a new PNUT.EXE which supports the new SETINDx/FIXINDx #,# syntax, too:
I was looking at your DE0_Nano_Hookup.png, and it gave me some ideas...
- It looks like it would be possible to map P64-P89 onto the header that PropPlug is plugged into (JP1).
- JP3 could provide P32-P52
- the LED's could be mapped to P53-P60
This would allow trying other I/O intensive tests before the SDRAM and VGA out work.
If it would take too much of your time please forget the above suggestion, as we need the docs more
Thanks for all your hard work!
Yes, that's possible. I was figuring I'd wait for our board and Sapieha's board to be done, and then map the pins to those, as they provide nice DACs and other hook-ups.
I'll be working on docs for a little while longer, it seems.
I've written the beginnings of a second-stage loader for the P2 and I'm having some trouble getting the program I'm loading to actually start. In case you want to see my code I've attached it to this message. The idea is that the PC sends a bunch of CRC-checked packets to the second-stage loader and it writes them to memory. If it gets a full load with no CRC errors it starts the program. I start writing code at $e80 and I load the COG image at that location after a successful download. I use the following code to start the downloaded image:
CON
BASE = $e80
DAT
' code to load hub memory over the serial link
start setcog #0 'relaunch cog0 with loaded program
coginit base_addr, zero
base_addr long BASE
zero long 0
My assumption is that the coginit instruction will load a COG image starting at $e80. What I'm loading is the 2K image produced by PNut.exe for my "Hello, Propeller II!" program that I know works when loaded directly by PNut.exe. Any idea what might be going wrong?
I've written the beginnings of a second-stage loader for the P2 and I'm having some trouble getting the program I'm loading to actually start. In case you want to see my code I've attached it to this message. The idea is that the PC sends a bunch of CRC-checked packets to the second-stage loader and it writes them to memory. If it gets a full load with no CRC errors it starts the program. I start writing code at $e80 and I load the COG image at that location after a successful download. I use the following code to start the downloaded image:
CON
BASE = $e80
DAT
' code to load hub memory over the serial link
start setcog #0 'relaunch cog0 with loaded program
coginit base_addr, zero
base_addr long BASE
zero long 0
My assumption is that the coginit instruction will load a COG image starting at $e80. What I'm loading is the 2K image produced by PNut.exe for my "Hello, Propeller II!" program that I know works when loaded directly by PNut.exe. Any idea what might be going wrong?
Ugh, never mind. There was a bug on the PC side of my program. The two-stage loader is now working for loading a program that is only 2K. Of course, the PNut.exe loader can already do that. Now I need to create a bigger program to make sure that loads correctly as well. If so, it should be possible to load all 32k of hub memory on the DE0-Nano. I don't have a DE2-115 so I can't try loading all 128k.
Ugh, never mind. There was a bug on the PC side of my program. The two-stage loader is now working for loading a program that is only 2K. Of course, the PNut.exe loader can already do that. Now I need to create a bigger program to make sure that loads correctly as well. If so, it should be possible to load all 32k of hub memory on the DE0-Nano. I don't have a DE2-115 so I can't try loading all 128k.
You don't need to do that 'SETCOG #0', since that register was initialized to 0 when then your loader launched.
You don't need to do that 'SETCOG #0', since that register was initialized to 0 when then your loader launched.
Thanks! That will save me an instruction.
Can you comment on how programs should be layed out in hub memory? I am currently just writing your .obj files starting at $e80. Does that make sense or should there be something else in low memory like CLKFREQ, etc?
I think $0E80-$0FFF should be reserved for mailboxes and various system pointers, and suggest that programs be loaded starting at $1000
I've been thinking about this and it might be good to support a scheme where the second-stage loader could load COG images and start the COGs before loading the main program. I think RossH does this in Catalina. The idea would be to have the COGs start up spinning on a mailbox in this $0e80-$0fff area of memory. Then when the main program starts it can pass any initialization parameters to the COGs and start them up. This has the advantage that you don't waste 2K of space for the COG image for every "driver". This would require a slightly more complex executable image format but it could be optional so it is still possible to have programs that work in the traditional way where everything is loaded at once. In fact, some programs may want to use a combination of these methods so that, for instance, a Forth COG image can be linked in with the main program but COG images for "drivers" could be loaded by this second-stage loader. If we do something like this, we could either use a fixed layout in low memory to describe the drivers that are pre-loaded (like Catalina's registry) or we could have a linker bind those addresses statically to avoid any runtime searches.
Beyond that, I suggest that we modify Bill's proposal a bit and have the loader load memory starting at $0e80 but have the address of the COG image for the main program stored at $1000. That way the loader can initialize the data in the $0e80-$0fff area.
Has anyone started using their DE0-Nano or DE2-115 P2 boards yet? If so, what platforms do you use? I can make my loader available on Windows, Macintosh, or Linux but I'm wondering what platforms people are actually using. I guess this is a silly question to some extent because PNut.exe will only run under Windows and that will still be needed to assemble your PASM code but I suspect that situation will change rapidly once Chip's instruction set document is done and Roy adapts his compiler to generate P2 binaries. My assumption is that Windows will probably be the most common platform once the P2 gets into general use but I wasn't sure if that would also be true of the early adopter population. What platform do you want to use for your early P2 work?
Has anyone started using their DE0-Nano or DE2-115 P2 boards yet? If so, what platforms do you use? I can make my loader available on Windows, Macintosh, or Linux but I'm wondering what platforms people are actually using. I guess this is a silly question to some extent because PNut.exe will only run under Windows and that will still be needed to assemble your PASM code but I suspect that situation will change rapidly once Chip's instruction set document is done and Roy adapts his compiler to generate P2 binaries. My assumption is that Windows will probably be the most common platform once the P2 gets into general use but I wasn't sure if that would also be true of the early adopter population. What platform do you want to use for your early P2 work?
You're able to load the Propeller at that rate? I have never tried anything higher than 115200. I can easily provide support for selecting different baud rates. I can even use one baud rate for the first-phase loader that talks to Chip's ROM loader and a different one for the second-stage loader that loads your program if that would be helpful.
You're able to load the Propeller at that rate? I have never tried anything higher than 115200. I can easily provide support for selecting different baud rates. I can even use one baud rate for the first-phase loader that talks to Chip's ROM loader and a different one for the second-stage loader that loads your program if that would be helpful.
Comments
I will document the REPeat instructions next.
For what you are doing there:
REPS #64,#1 'repeat 1 instruction 64 times
SETINDA startreg 'need one spacer instruction between REPS and what is going to get repeated
RDLONG INDA++,PTRA++
If I read the specification PDF right, REPD would need two spacer instructions - right?
It needs three. The difference between REPS and REPD is that REPS uses an immediate repeat count, while REPD can use a register. REPS can execute early because the immediate repeat count is in the instruction, whereas REPD must wait for the register to be read.
I don't mind pipeline delays, after all, it does not matter where initialization/setup code goes
I just realized that the way it works for deltas is already:
SETINDA ++3
SETINDB --4
SETINDS --6,++5
Do you like that better than using @'s?
Personally, I don't like the use of @ where it is not an indirection/address of.
Therefore, I prefer:
SETINDA #32
SETINDA #cogaddrs
and
SETINDA ++3
SETINDA --4
SETINDS --6,++5
I agree. So, I'll make the assembler insist on # for immediate values and ++/-- for delta values.
re REPS & REPD
I understand the difference and the requirement that they use different instructions because of the differing pipeline delays.
I could not find a REP instruction where the pipe just stalls.
Might a better way to name the REPS & REPD instructions be
REPD1 #times,#loops
REPD3 times,#loops
At least this way we are forced to remember how many instructions will be excuted. Otherwise I can see lots of wasted time debugging because we forgot how many instructions are executed before the loop takes place.
Perhaps the same could be applied for other Delayed instructions?
jmp #$-1 'Loop back endlessly
You can do 'JMP #$' to loop endlessly, or 'JMP #$-1' to jump back one instruction.
http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196&viewfull=1#post1146196
I'm working on the two-stage loader. Hopefully, it will be done soon.
It is any changes in Config file for FPGA?
I was looking at your DE0_Nano_Hookup.png, and it gave me some ideas...
- It looks like it would be possible to map P64-P89 onto the header that PropPlug is plugged into (JP1).
- JP3 could provide P32-P52
- the LED's could be mapped to P53-P60
This would allow trying other I/O intensive tests before the SDRAM and VGA out work.
If it would take too much of your time please forget the above suggestion, as we need the docs more
Thanks for all your hard work!
Not yet. I'll need to add the SDRAM pins a little later, and then the I/O for our various boards.
Yes, that's possible. I was figuring I'd wait for our board and Sapieha's board to be done, and then map the pins to those, as they provide nice DACs and other hook-ups.
I'll be working on docs for a little while longer, it seems.
Thanks.
No need for reprograming
My assumption is that the coginit instruction will load a COG image starting at $e80. What I'm loading is the 2K image produced by PNut.exe for my "Hello, Propeller II!" program that I know works when loaded directly by PNut.exe. Any idea what might be going wrong?
Ugh, never mind. There was a bug on the PC side of my program. The two-stage loader is now working for loading a program that is only 2K. Of course, the PNut.exe loader can already do that. Now I need to create a bigger program to make sure that loads correctly as well. If so, it should be possible to load all 32k of hub memory on the DE0-Nano. I don't have a DE2-115 so I can't try loading all 128k.
You don't need to do that 'SETCOG #0', since that register was initialized to 0 when then your loader launched.
Can you comment on how programs should be layed out in hub memory? I am currently just writing your .obj files starting at $e80. Does that make sense or should there be something else in low memory like CLKFREQ, etc?
I think $0E80-$0FFF should be reserved for mailboxes and various system pointers, and suggest that programs be loaded starting at $1000
Sounds reasonable. I'll modify my loader. I'll try to get an early version posted later today.
I've been thinking about this and it might be good to support a scheme where the second-stage loader could load COG images and start the COGs before loading the main program. I think RossH does this in Catalina. The idea would be to have the COGs start up spinning on a mailbox in this $0e80-$0fff area of memory. Then when the main program starts it can pass any initialization parameters to the COGs and start them up. This has the advantage that you don't waste 2K of space for the COG image for every "driver". This would require a slightly more complex executable image format but it could be optional so it is still possible to have programs that work in the traditional way where everything is loaded at once. In fact, some programs may want to use a combination of these methods so that, for instance, a Forth COG image can be linked in with the main program but COG images for "drivers" could be loaded by this second-stage loader. If we do something like this, we could either use a fixed layout in low memory to describe the drivers that are pre-loaded (like Catalina's registry) or we could have a linker bind those addresses statically to avoid any runtime searches.
Beyond that, I suggest that we modify Bill's proposal a bit and have the loader load memory starting at $0e80 but have the address of the COG image for the main program stored at $1000. That way the loader can initialize the data in the $0e80-$0fff area.
I run on win XP.
BUT -- I run 2.500.000 Baud rate in PuTTY.
Sounds reasonable.