Propeller II: Emulation of the P2 on FPGA boards (Prop123-A7/A9, DE0-NANO, DE2-115, etc)

cgracey · 2012-12-04 12:04

Bill Henning wrote: »
Thanks Chip!

Once cycle for stack reads is great, as are the CALLD and RETD instructions.

If I read your previous message correctly, then
       REP #1,#64
       RDLONGC      INDA++,PTR++
Would load 64 longs from the hub to the cog in eight hub cycles (plus time for loading INDA, PTR, REP, and time to sync to the first hub access)

I will document the REPeat instructions next.

For what you are doing there:

REPS #64,#1 'repeat 1 instruction 64 times
SETINDA startreg 'need one spacer instruction between REPS and what is going to get repeated
RDLONG INDA++,PTRA++

David Betz · 2012-12-04 12:07

cgracey wrote: »

Okay, but what about the delta modes where a '+' or '-' must be expressed?

SETINDA -#5
SETINDA +#5

I don't really like that.

How about this:

SETINDA @5
SETINDA @-5

I like your second option better.

Bill Henning · 2012-12-04 12:11

Thanks Chip.

If I read the specification PDF right, REPD would need two spacer instructions - right?

cgracey wrote: »

I will document the REPeat instructions next.

For what you are doing there:

REPS #64,#1 'repeat 1 instruction 64 times
SETINDA startreg 'need one spacer instruction between REPS and what is going to get repeated
RDLONG INDA++,PTRA++

cgracey · 2012-12-04 12:24

Bill Henning wrote: »

Thanks Chip.

If I read the specification PDF right, REPD would need two spacer instructions - right?

It needs three. The difference between REPS and REPD is that REPS uses an immediate repeat count, while REPD can use a register. REPS can execute early because the immediate repeat count is in the instruction, whereas REPD must wait for the register to be read.

Bill Henning · 2012-12-04 12:50

Thanks!

I don't mind pipeline delays, after all, it does not matter where initialization/setup code goes

cgracey · 2012-12-04 14:58

David Betz wrote: »

I like your second option better.

I just realized that the way it works for deltas is already:

SETINDA ++3
SETINDB --4
SETINDS --6,++5

Do you like that better than using @'s?

Cluso99 · 2012-12-04 15:07

cgracey wrote: »

I just realized that the way it works for deltas is already:

SETINDA ++3
SETINDB --4
SETINDS --6,++5

Do you like that better than using @'s?

Personally, I don't like the use of @ where it is not an indirection/address of.

Therefore, I prefer:

SETINDA #32
SETINDA #cogaddrs

and

SETINDA ++3
SETINDA --4
SETINDS --6,++5

cgracey · 2012-12-04 15:10

Cluso99 wrote: »

Personally, I don't like the use of @ where it is not an indirection/address of.

Therefore, I prefer:

SETINDA #32
SETINDA #cogaddrs

and

SETINDA ++3
SETINDA --4
SETINDS --6,++5

I agree. So, I'll make the assembler insist on # for immediate values and ++/-- for delta values.

Cluso99 · 2012-12-04 15:17

Chip:

re REPS & REPD

I understand the difference and the requirement that they use different instructions because of the differing pipeline delays.

I could not find a REP instruction where the pipe just stalls.

Might a better way to name the REPS & REPD instructions be

REPD1 #times,#loops
REPD3 times,#loops

At least this way we are forced to remember how many instructions will be excuted. Otherwise I can see lots of wasted time debugging because we forgot how many instructions are executed before the loop takes place.

Perhaps the same could be applied for other Delayed instructions?

Rayman · 2012-12-04 15:31

Can you use something like this syntax?
jmp #$-1 'Loop back endlessly

cgracey wrote: »

I agree. So, I'll make the assembler insist on # for immediate values and ++/-- for delta values.

cgracey · 2012-12-04 16:22

Rayman wrote: »

Can you use something like this syntax?
jmp #$-1 'Loop back endlessly

You can do 'JMP #$' to loop endlessly, or 'JMP #$-1' to jump back one instruction.

cgracey · 2012-12-04 16:42

I've placed updated documentation, along with a .zip that contains everything anyone needs for DE0-Nano and DE2-115 Prop2 emulation. There's a new PNUT.EXE which supports the new SETINDx/FIXINDx #,# syntax, too:

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196&viewfull=1#post1146196

David Betz · 2012-12-04 16:59

cgracey wrote: »

I've placed updated documentation, along with a .zip that contains everything anyone needs for DE0-Nano and DE2-115 Prop2 emulation. There's a new PNUT.EXE which supports the new SETINDx/FIXINDx #,# syntax, too:

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196&viewfull=1#post1146196

Thanks Chip!!
I'm working on the two-stage loader. Hopefully, it will be done soon.

Sapieha · 2012-12-04 17:12

Hi Chip.

It is any changes in Config file for FPGA?

cgracey wrote: »

I've been working on the instruction set documentation and I've completed the parts that cover:

1) Hub memory instructions
2) Hub control instructions
3) Cog RAM indirect instructions - New # syntax for SETINDx/FIXINDx
4) Cog stack RAM instructions

There is a new PNUT.EXE in this .zip which supports the new SETINDx/FIXINDx syntax. Also, all the files anyone needs to use the DE0-Nano or DE2-115 are in here:

Bill Henning · 2012-12-04 17:18

Hi Chip,

I was looking at your DE0_Nano_Hookup.png, and it gave me some ideas...

- It looks like it would be possible to map P64-P89 onto the header that PropPlug is plugged into (JP1).

- JP3 could provide P32-P52

- the LED's could be mapped to P53-P60

This would allow trying other I/O intensive tests before the SDRAM and VGA out work.

If it would take too much of your time please forget the above suggestion, as we need the docs more

Thanks for all your hard work!

cgracey wrote: »

I've placed updated documentation, along with a .zip that contains everything anyone needs for DE0-Nano and DE2-115 Prop2 emulation. There's a new PNUT.EXE which supports the new SETINDx/FIXINDx #,# syntax, too:

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196&viewfull=1#post1146196

cgracey · 2012-12-04 17:26

Sapieha wrote: »

Hi Chip.

It is any changes in Config file for FPGA?

Not yet. I'll need to add the SDRAM pins a little later, and then the I/O for our various boards.

cgracey · 2012-12-04 17:30

Bill Henning wrote: »

Hi Chip,

I was looking at your DE0_Nano_Hookup.png, and it gave me some ideas...

- It looks like it would be possible to map P64-P89 onto the header that PropPlug is plugged into (JP1).

- JP3 could provide P32-P52

- the LED's could be mapped to P53-P60

This would allow trying other I/O intensive tests before the SDRAM and VGA out work.

If it would take too much of your time please forget the above suggestion, as we need the docs more

Thanks for all your hard work!

Yes, that's possible. I was figuring I'd wait for our board and Sapieha's board to be done, and then map the pins to those, as they provide nice DACs and other hook-ups.

I'll be working on docs for a little while longer, it seems.

Sapieha · 2012-12-04 17:31

Hi Chip.

Thanks.

No need for reprograming

cgracey wrote: »

Not yet. I'll need to add the SDRAM pins a little later, and then the I/O for our various boards.

David Betz · 2012-12-04 20:53

I've written the beginnings of a second-stage loader for the P2 and I'm having some trouble getting the program I'm loading to actually start. In case you want to see my code I've attached it to this message. The idea is that the PC sends a bunch of CRC-checked packets to the second-stage loader and it writes them to memory. If it gets a full load with no CRC errors it starts the program. I start writing code at $e80 and I load the COG image at that location after a successful download. I use the following code to start the downloaded image:

CON

  BASE = $e80

DAT

' code to load hub memory over the serial link

start                   setcog  #0                      'relaunch cog0 with loaded program
                        coginit base_addr, zero

base_addr       long    BASE
zero            long    0

My assumption is that the coginit instruction will load a COG image starting at $e80. What I'm loading is the 2K image produced by PNut.exe for my "Hello, Propeller II!" program that I know works when loaded directly by PNut.exe. Any idea what might be going wrong?

David Betz · 2012-12-04 21:08

David Betz wrote: »
I've written the beginnings of a second-stage loader for the P2 and I'm having some trouble getting the program I'm loading to actually start. In case you want to see my code I've attached it to this message. The idea is that the PC sends a bunch of CRC-checked packets to the second-stage loader and it writes them to memory. If it gets a full load with no CRC errors it starts the program. I start writing code at $e80 and I load the COG image at that location after a successful download. I use the following code to start the downloaded image:
CON

  BASE = $e80

DAT

' code to load hub memory over the serial link

start                   setcog  #0                      'relaunch cog0 with loaded program
                        coginit base_addr, zero

base_addr       long    BASE
zero            long    0
My assumption is that the coginit instruction will load a COG image starting at $e80. What I'm loading is the 2K image produced by PNut.exe for my "Hello, Propeller II!" program that I know works when loaded directly by PNut.exe. Any idea what might be going wrong?

Ugh, never mind. There was a bug on the PC side of my program. The two-stage loader is now working for loading a program that is only 2K. Of course, the PNut.exe loader can already do that. Now I need to create a bigger program to make sure that loads correctly as well. If so, it should be possible to load all 32k of hub memory on the DE0-Nano. I don't have a DE2-115 so I can't try loading all 128k.

cgracey · 2012-12-04 21:38

David Betz wrote: »

Ugh, never mind. There was a bug on the PC side of my program. The two-stage loader is now working for loading a program that is only 2K. Of course, the PNut.exe loader can already do that. Now I need to create a bigger program to make sure that loads correctly as well. If so, it should be possible to load all 32k of hub memory on the DE0-Nano. I don't have a DE2-115 so I can't try loading all 128k.

You don't need to do that 'SETCOG #0', since that register was initialized to 0 when then your loader launched.

David Betz · 2012-12-04 21:42

cgracey wrote: »

You don't need to do that 'SETCOG #0', since that register was initialized to 0 when then your loader launched.

Thanks! That will save me an instruction.

Can you comment on how programs should be layed out in hub memory? I am currently just writing your .obj files starting at $e80. Does that make sense or should there be something else in low memory like CLKFREQ, etc?

Bill Henning · 2012-12-04 21:48

Chip & David,

I think $0E80-$0FFF should be reserved for mailboxes and various system pointers, and suggest that programs be loaded starting at $1000

Cluso99 · 2012-12-05 02:12

Bill Henning wrote: »

Chip & David,

I think $0E80-$0FFF should be reserved for mailboxes and various system pointers, and suggest that programs be loaded starting at $1000

I think this is an excellent suggestion.

David Betz · 2012-12-05 03:03

Bill Henning wrote: »

Chip & David,

I think $0E80-$0FFF should be reserved for mailboxes and various systbem pointers, and suggest that programs be loaded starting at $1000

Sounds reasonable. I'll modify my loader. I'll try to get an early version posted later today.

David Betz · 2012-12-05 03:57

Bill Henning wrote: »

Chip & David,

I think $0E80-$0FFF should be reserved for mailboxes and various system pointers, and suggest that programs be loaded starting at $1000

I've been thinking about this and it might be good to support a scheme where the second-stage loader could load COG images and start the COGs before loading the main program. I think RossH does this in Catalina. The idea would be to have the COGs start up spinning on a mailbox in this $0e80-$0fff area of memory. Then when the main program starts it can pass any initialization parameters to the COGs and start them up. This has the advantage that you don't waste 2K of space for the COG image for every "driver". This would require a slightly more complex executable image format but it could be optional so it is still possible to have programs that work in the traditional way where everything is loaded at once. In fact, some programs may want to use a combination of these methods so that, for instance, a Forth COG image can be linked in with the main program but COG images for "drivers" could be loaded by this second-stage loader. If we do something like this, we could either use a fixed layout in low memory to describe the drivers that are pre-loaded (like Catalina's registry) or we could have a linker bind those addresses statically to avoid any runtime searches.

Beyond that, I suggest that we modify Bill's proposal a bit and have the loader load memory starting at $0e80 but have the address of the COG image for the main program stored at $1000. That way the loader can initialize the data in the $0e80-$0fff area.

David Betz · 2012-12-05 04:05

Has anyone started using their DE0-Nano or DE2-115 P2 boards yet? If so, what platforms do you use? I can make my loader available on Windows, Macintosh, or Linux but I'm wondering what platforms people are actually using. I guess this is a silly question to some extent because PNut.exe will only run under Windows and that will still be needed to assemble your PASM code but I suspect that situation will change rapidly once Chip's instruction set document is done and Roy adapts his compiler to generate P2 binaries. My assumption is that Windows will probably be the most common platform once the P2 gets into general use but I wasn't sure if that would also be true of the early adopter population. What platform do you want to use for your early P2 work?

Sapieha · 2012-12-05 04:09

Hi David.

I run on win XP.

BUT -- I run 2.500.000 Baud rate in PuTTY.

David Betz wrote: »

Has anyone started using their DE0-Nano or DE2-115 P2 boards yet? If so, what platforms do you use? I can make my loader available on Windows, Macintosh, or Linux but I'm wondering what platforms people are actually using. I guess this is a silly question to some extent because PNut.exe will only run under Windows and that will still be needed to assemble your PASM code but I suspect that situation will change rapidly once Chip's instruction set document is done and Roy adapts his compiler to generate P2 binaries. My assumption is that Windows will probably be the most common platform once the P2 gets into general use but I wasn't sure if that would also be true of the early adopter population. What platform do you want to use for your early P2 work?

David Betz · 2012-12-05 04:12

Sapieha wrote: »

Hi David.

I run on win XP.

BUT -- I run 2.500.000 Baud rate in PuTTY.

You're able to load the Propeller at that rate? I have never tried anything higher than 115200. I can easily provide support for selecting different baud rates. I can even use one baud rate for the first-phase loader that talks to Chip's ROM loader and a different one for the second-stage loader that loads your program if that would be helpful.

Sapieha · 2012-12-05 04:14

Hi David.

Sounds reasonable.

David Betz wrote: »

You're able to load the Propeller at that rate? I have never tried anything higher than 115200. I can easily provide support for selecting different baud rates. I can even use one baud rate for the first-phase loader that talks to Chip's ROM loader and a different one for the second-stage loader that loads your program if that would be helpful.

Propeller II: Emulation of the P2 on FPGA boards (Prop123-A7/A9, DE0-NANO, DE2-115, etc)

Comments