Execution mode questions: HUBEXEC, COGEXEC, LUTEXEC
mindrobots
Posts: 6,506
This has been my little puzzle for today. (surely, I'm missing things in these scenarios)
Start with Chip's simple blinky demo:
You start with COG0 in HUBEXEC and it's at effectively HUBRAM address 0.
COG0 executes the next 15 COGINIT instructions in HUBEXEC mode telling each new cog to start executing at $0F of HUBRAM
Then COG0 falls through the loop to run the blinky code.
You end up with some number of COGS all running the same code from HUBRAM with each using one long of their COGRAM for X. Which all makes sense in the P2 universe.
Now, say I have some large piece of HUBRAM code that lives above $1000 that I want to have multiple COGS using (shared code space separate data space).
I can't do 'COGINIT #16,@entrypoint' because I don't have enough bits in my COGINIT. So, in my HUBRAM in space less than $1FF, I put a label I can COGINIT to (COGINIT #16,#jmp_point) and then a 'JMP @entrypoint'. This leaves me in HUBEXEC and running code in HUBRAM above $1000. That seems workable, I'll just end up with a bunch of little code snippets that do absolute jumps to code in HUBRAM.
Now, let's say I have a block of COG sized code that I want to run in COGRAM (Or LUTRAM, I believe) on multiple COGs. For this, I need to do a 'COGINIT #16,#loader' to some code that runs in HUBEXEC and loads the COG code into COGRAM (or LUTRAM) and then when done, it switches gears to COGEXEC. How does I switch gears to COGEXEC from HUBEXEC? I Potentially have both overlapping $0-$1FF address spaces visible.
I thought at first I could have my original COG running in HUBEXEC load the COG code before the COGINIT but it can;t see into another COG's RAM or LUT, so that won't work.
In general, for any code you want to run in COGEXEC mode, you need to first COGINIT to some loader code in low HUBRAM (new term?) while running in HUBEXEC mode load it the code into your COGRAM and then somehow shift from low HUBEXEC memory space to COGRAM for COGEXEC mode?
What did am I missing?
Start with Chip's simple blinky demo:
dat orgh 1 ' launch 15 cogs (cog 0 falls through and runs 'blink', too) ' any cogs missing from the FPGA won't blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink coginit #16,#blink blink cogid x 'which cog am I? setb dirb,x 'make that pin an output notb outb,x 'flip its output state add x,#16 'add to my id shl x,#18 'shift it up to make it big waitx x 'wait that many clocks jmp @blink 'do it again org x res 1 'variable at cog address 32 (register 8, RAM)
You start with COG0 in HUBEXEC and it's at effectively HUBRAM address 0.
COG0 executes the next 15 COGINIT instructions in HUBEXEC mode telling each new cog to start executing at $0F of HUBRAM
Then COG0 falls through the loop to run the blinky code.
You end up with some number of COGS all running the same code from HUBRAM with each using one long of their COGRAM for X. Which all makes sense in the P2 universe.
Now, say I have some large piece of HUBRAM code that lives above $1000 that I want to have multiple COGS using (shared code space separate data space).
I can't do 'COGINIT #16,@entrypoint' because I don't have enough bits in my COGINIT. So, in my HUBRAM in space less than $1FF, I put a label I can COGINIT to (COGINIT #16,#jmp_point) and then a 'JMP @entrypoint'. This leaves me in HUBEXEC and running code in HUBRAM above $1000. That seems workable, I'll just end up with a bunch of little code snippets that do absolute jumps to code in HUBRAM.
Now, let's say I have a block of COG sized code that I want to run in COGRAM (Or LUTRAM, I believe) on multiple COGs. For this, I need to do a 'COGINIT #16,#loader' to some code that runs in HUBEXEC and loads the COG code into COGRAM (or LUTRAM) and then when done, it switches gears to COGEXEC. How does I switch gears to COGEXEC from HUBEXEC? I Potentially have both overlapping $0-$1FF address spaces visible.
I thought at first I could have my original COG running in HUBEXEC load the COG code before the COGINIT but it can;t see into another COG's RAM or LUT, so that won't work.
In general, for any code you want to run in COGEXEC mode, you need to first COGINIT to some loader code in low HUBRAM (new term?) while running in HUBEXEC mode load it the code into your COGRAM and then somehow shift from low HUBEXEC memory space to COGRAM for COGEXEC mode?
What did am I missing?
Comments
Close, but I think that orgh 1 detail starts at 0x00001, and that is the Hubexec alignment needed for low-ram start.
That means PC on RST exit, is HUBEXEC ready, at 0x00001, not 0x0000.
Ok, let's just call it "the first executable memory location in Hubram"
So, how do you get a COG to shift from HUBEXEC(HE) mode to COGEXEC mode? If you are in HE and jump to any address, you stay in HE because the address spaces overlap.
I'll have to look at Electrodude's comment if COGINIT can take a ## to generate a AUXD override, that could answer one question.
Time to play more hen back at my P2.
This is my observation and assumption based on the code examples that I have seen.
The 'orgh 1' does set you to the first executable HUBRAM address.
There almost need to be orgc0, orgc1, orgc2, orglut0, etc. directives maybe so you can organize your code in chunks that match up with the real address spaces?
My first try at multi cog code using just orgh and org didn't work so well. Need more play time!!
So jump to 0 = cog, jump to 1 = hub
That is what I thought initially but the blink example seems to break that theory. That code is all in HUBRAM and the start address is $0F - I'm pretty sure that example stays in HUBEXEC since no code is ever copied to COGRAM.
Ok, that kind of makes sense (in a nonsensical way). So my HUBRAM instruction longs are at byte addresses 1,5,9,13,etc.
Orgh 1 means start at BYTE 1 and increment by 4 bytes?
1. Starts cog (the next cog will be 1 in this instance) in hubexec
2. Loads hub code into LUT
3. Jumps to LUT for cogexec
I think it also simplifies Chips ROM loader, which is a verilog state engine, and means the PC is reset to 0x0001 to start in Hubexec, and so can avoid a COG load step for most boot code.
There are other postings that discuss being able to access these 'ROM' routines for more than simple loaders - eg Debug and Monitor type apps
The ROM is 16k Bytes, so no longer has to be as squashed as on first P2. (where it was manually patched RAM)
Rather than a cryptic ORG variance, better may be a more explicit SEG like syntax : those SEGs can also help with code overflow and alignment checking, which a simpler ORG cannot do.
pasm instructions are always 32bits (4bytes), so if the first one is at byte address $0001 (orgh 1 does that), then all the code will be at unaligned addresses.
It's the kluge Chip implemented so he could have hubexec code below $1000. I think we convinced him to change it, but not before the FPGA image was released.
I think COG code can directly call those ROM routines.
User code can start from above LUT, or above ROM image
It's always active.
You do need to add the end of the COGINITs so COG0 knows where to go execute - it can no longer just fall through to the blink routine since it has been moved with the ORGH.
Time to go play more with code placement and program structure.