Execution mode questions: HUBEXEC, COGEXEC, LUTEXEC

mindrobots · 2015-09-28 19:44

This has been my little puzzle for today. (surely, I'm missing things in these scenarios)

Start with Chip's simple blinky demo:

dat
	orgh	1

' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink

	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink
	coginit	#16,#blink

blink	 cogid	x		'which cog am I?
	setb	dirb,x		'make that pin an output
	notb	outb,x		'flip its output state
	add	x,#16		'add to my id
	shl	x,#18		'shift it up to make it big
	waitx	x		'wait that many clocks
	jmp	@blink		'do it again

	org
x	res	1		'variable at cog address 32 (register 8, RAM)

You start with COG0 in HUBEXEC and it's at effectively HUBRAM address 0.
COG0 executes the next 15 COGINIT instructions in HUBEXEC mode telling each new cog to start executing at $0F of HUBRAM
Then COG0 falls through the loop to run the blinky code.

You end up with some number of COGS all running the same code from HUBRAM with each using one long of their COGRAM for X. Which all makes sense in the P2 universe.

Now, say I have some large piece of HUBRAM code that lives above $1000 that I want to have multiple COGS using (shared code space separate data space).
I can't do 'COGINIT #16,@entrypoint' because I don't have enough bits in my COGINIT. So, in my HUBRAM in space less than $1FF, I put a label I can COGINIT to (COGINIT #16,#jmp_point) and then a 'JMP @entrypoint'. This leaves me in HUBEXEC and running code in HUBRAM above $1000. That seems workable, I'll just end up with a bunch of little code snippets that do absolute jumps to code in HUBRAM.

Now, let's say I have a block of COG sized code that I want to run in COGRAM (Or LUTRAM, I believe) on multiple COGs. For this, I need to do a 'COGINIT #16,#loader' to some code that runs in HUBEXEC and loads the COG code into COGRAM (or LUTRAM) and then when done, it switches gears to COGEXEC. How does I switch gears to COGEXEC from HUBEXEC? I Potentially have both overlapping $0-$1FF address spaces visible.

I thought at first I could have my original COG running in HUBEXEC load the COG code before the COGINIT but it can;t see into another COG's RAM or LUT, so that won't work.

In general, for any code you want to run in COGEXEC mode, you need to first COGINIT to some loader code in low HUBRAM (new term?) while running in HUBEXEC mode load it the code into your COGRAM and then somehow shift from low HUBEXEC memory space to COGRAM for COGEXEC mode?

What did am I missing?

Electrodude · 2015-09-28 20:33

The second (destination?) field of COGINIT doesn't have to be a literal - you can do ##any_address instead of #nine_bit_address

coginit #16, ##pointer_to_code_above_1FF

Dave Hein · 2015-09-28 21:32

How does coginit work in the P2? Does it copy hub RAM to cog RAM, and then begin executing the cog RAM? Clearly it can no long start execution at address 0 since the cog registers have been moved there.

David Betz · 2015-09-28 21:36

I don't think it does any copying at all. I think it just starts at the address passed to it. So, it always starts in hub exec mode.

jmg · 2015-09-28 21:47

mindrobots wrote: »
dat
	orgh	1
You start with COG0 in HUBEXEC and it's at effectively HUBRAM address 0.

Close, but I think that orgh 1 detail starts at 0x00001, and that is the Hubexec alignment needed for low-ram start.
That means PC on RST exit, is HUBEXEC ready, at 0x00001, not 0x0000.

mindrobots · 2015-09-28 22:12

jmg wrote: »
mindrobots wrote: »
dat
	orgh	1
You start with COG0 in HUBEXEC and it's at effectively HUBRAM address 0.
Close, but I think that orgh 1 detail starts at 0x00001, and that is the Hubexec alignment needed for low-ram start.
That means PC on RST exit, is HUBEXEC ready, at 0x00001, not 0x0000.

Ok, let's just call it "the first executable memory location in Hubram"

So, how do you get a COG to shift from HUBEXEC(HE) mode to COGEXEC mode? If you are in HE and jump to any address, you stay in HE because the address spaces overlap.

I'll have to look at Electrodude's comment if COGINIT can take a ## to generate a AUXD override, that could answer one question.

Time to play more hen back at my P2.

mindrobots · 2015-09-28 22:20

David Betz wrote: »

I don't think it does any copying at all. I think it just starts at the address passed to it. So, it always starts in hub exec mode.

This is my observation and assumption based on the code examples that I have seen.

The 'orgh 1' does set you to the first executable HUBRAM address.

There almost need to be orgc0, orgc1, orgc2, orglut0, etc. directives maybe so you can organize your code in chunks that match up with the real address spaces?

My first try at multi cog code using just orgh and org didn't work so well. Need more play time!!

Electrodude · 2015-09-28 22:21

To switch to cogexec, I'm pretty sure you just jump to an address less than $1FF (or $7FC, or whatever).

Roy Eltham · 2015-09-28 22:25

With the current FPGA image the kluge setup is such that jumping to long aligned addresses before $1000 = cog/lut exec mode, jumping to unaligned addresses below $1000 = hubexec.
So jump to 0 = cog, jump to 1 = hub

mindrobots · 2015-09-28 22:28

Electrodude wrote: »

To switch to cogexec, I'm pretty sure you just jump to an address less than $1FF (or $7FC, or whatever).

That is what I thought initially but the blink example seems to break that theory. That code is all in HUBRAM and the start address is $0F - I'm pretty sure that example stays in HUBEXEC since no code is ever copied to COGRAM.

mindrobots · 2015-09-28 22:33

Roy Eltham wrote: »

With the current FPGA image the kluge setup is such that jumping to long aligned addresses before $1000 = cog/lut exec mode, jumping to unaligned addresses below $1000 = hubexec.
So jump to 0 = cog, jump to 1 = hub

Ok, that kind of makes sense (in a nonsensical way). So my HUBRAM instruction longs are at byte addresses 1,5,9,13,etc.

Orgh 1 means start at BYTE 1 and increment by 4 bytes?

Cluso99 · 2015-09-28 23:09

I think this may work...
1. Starts cog (the next cog will be 1 in this instance) in hubexec
2. Loads hub code into LUT
3. Jumps to LUT for cogexec

dat
        orgh    1                       'start in hub-exec at $00001 (non-aligned address below $1000)

        coginit #16,#loadcog1
here    jmp     @here                   'cog0 will loop here indefinately. cog1 will not execute this instr.

'---------------------------------------------------------------------------------------------------

' cog 1=next-cog starts here in hubexec mode
loadcog1

' run a 1k-instruction program in  lut


        loc     adra,@code + $200<<2    'load lut starting at $200 with code
        setq2   #$200-1                 'causes the rdlong to use LUT,not COG, as the destination
        rdlong  $000,adra

        jmp     #begin                  'cogexec in lut now hold one contiguous program, jump to it

'---------------------------------------------------------------------------------------------------
' cog code located in hub ready for copy to cog

code                                    'hub address of cog/lut program

        org     $200 << 2, $400 << 2    'set cog/lut org to register $200 (LUT), set limit to end of lut

begin   mov     dira,#$1F               'start of cog/lut program, enable outputs

loop    notb    outa,#4                 'toggle pins in a loop

        long    $F4240400 [250]         'notb outa,#0 (250 instances)

        jmp     @loop

jmg · 2015-09-28 23:14

mindrobots wrote: »

Roy Eltham wrote: »

With the current FPGA image the kluge setup is such that jumping to long aligned addresses before $1000 = cog/lut exec mode, jumping to unaligned addresses below $1000 = hubexec.
So jump to 0 = cog, jump to 1 = hub

Ok, that kind of makes sense (in a nonsensical way). So my HUBRAM instruction longs are at byte addresses 1,5,9,13,etc.

Orgh 1 means start at BYTE 1 and increment by 4 bytes?

Yes, the offset gives a means to allow auto-select of COG or HUB, and means you can run code in HUB that overlaps COG.LUT
I think it also simplifies Chips ROM loader, which is a verilog state engine, and means the PC is reset to 0x0001 to start in Hubexec, and so can avoid a COG load step for most boot code.

There are other postings that discuss being able to access these 'ROM' routines for more than simple loaders - eg Debug and Monitor type apps
The ROM is 16k Bytes, so no longer has to be as squashed as on first P2. (where it was manually patched RAM)

Rather than a cryptic ORG variance, better may be a more explicit SEG like syntax :

    SEG  HubExecPOR     ' < for ROM boot code, RST vector
    SEG  CogExec        ' < just Above COG registers 
    SEG  LutExec        ' < May be needed ?
    SEG  HubExecUser    ' < Above 'ROM' ?
    ... etc

those SEGs can also help with code overflow and alignment checking, which a simpler ORG cannot do.

Roy Eltham · 2015-09-28 23:16

mindrobots,
pasm instructions are always 32bits (4bytes), so if the first one is at byte address $0001 (orgh 1 does that), then all the code will be at unaligned addresses.

It's the kluge Chip implemented so he could have hubexec code below $1000. I think we convinced him to change it, but not before the FPGA image was released.

jmg · 2015-09-28 23:23

Roy Eltham wrote: »

It's the kluge Chip implemented so he could have hubexec code below $1000. I think we convinced him to change it, but not before the FPGA image was released.

I do not see it as much of a kludge, as it only affects the ROM loader, and it makes sense to have that placed away from default user code.
I think COG code can directly call those ROM routines.

User code can start from above LUT, or above ROM image

Roy Eltham · 2015-09-28 23:30

It doesn't "only affect the ROM loader".

It's always active.

jmg · 2015-09-28 23:41

Roy Eltham wrote: »

It doesn't "only affect the ROM loader".
It's always active.

Well, yes, but my thinking is users would tend to preserve the ROM loader, and I'd expect only remove or move into that space if they are extremely space-limited, or paranoid

mindrobots · 2015-09-29 12:38

Electrodude's ## solution works to allow COGINIT with > 9 bit target entry points. So this code works to run one copy of the code blink code from High HUBRAM. I didn't know the AUGD/S worked on most any D or S field.

dat
	orgh	1

' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink

	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	coginit	#16,##blink
	jmp	@blink

	orgh	$1000
blink	 cogid	x		'which cog am I?
	setb	dirb,x		'make that pin an output
	notb	outb,x		'flip its output state
	add	x,#16		'add to my id
	shl	x,#18		'shift it up to make it big
	waitx	x		'wait that many clocks
	jmp	@blink		'do it again

	org
x	res	1		'variable at cog address 32 (register 8, RAM)

You do need to add the end of the COGINITs so COG0 knows where to go execute - it can no longer just fall through to the blink routine since it has been moved with the ORGH.

Time to go play more with code placement and program structure.

Execution mode questions: HUBEXEC, COGEXEC, LUTEXEC

Comments