Launching LMM C function in another COG

ImageCraft · 2008-04-10 07:55

We are prototyping a library function that launches LMM C function in another COG, and we run into some issues. Perhaps the collective wisdom of the forum has some ideas. From the developer:

****
__coginit_lmm is a library routine which launches new cog (new cog is decided by HUB as we set bit 3 in COGINIT instruction) with some proper initialization. __coginit_lmm itself is a LMM function so @FRET LMM "instruction" is used at the end.

This routine looks like:
/* Library routine for COGINIT_LMM */
;R0 - address of function to be executed

.area text(rom, rel)
__coginit_lmm::
mov R1,#0
or R1,#8 ;Set bit 3 in coginit instruction to give HUB to start cog on its own
mov TEMP0,#128 ;skip Initial 32 bytes as user code starts at 0x20
or R1,TEMP0
mov TEMP0,#$7c ;Location where function address is written
wrlong R0,TEMP0
COGINIT R1 WR
nop
nop
@FRET

How we launch new cogs:

In the test case, we pass one parameter in R0. That parameter will be the address of the function to be executed. In library function __coginit_lmm, We are trying to launch lowest available cog. So COGINIT instruction has one parameter i.e register R1. This R1 should contain some information. 0-2 bits COG id. 3rd bit to inform HUB to start new available cog (what we are trying exactly. Set bit 3 and reset 0-2 bits). And bits from 4 to 17should contain destination function address. Here we are giving start of kernel address as we need to have LMM model in that cog also. But the address of destination function (passed as parameter in R0) will be written to HUB memory at $7C. This $7C will be mapped to cog address $17 when cog copies 496 longs. This $17 is used in ‘finit’ routine to initialize PC value.
And bits 18 to 31 is address for PAR. This is not use for now and is initialized to zero.

So COGINIT should launch lowest available COG and that cog should copy first 496 longs from HUB and start executing kernel. And in turn call finit which sets PC to destination function address.

As I tested this design works fine. I am able to launch two cogs parallel and glow two LEDs. But I am facing some issues here.
If you see library routine __coginit_lmm, you will see two NOP instructions after ‘COGINIT R1 WR’. If I don’t use these there is undefined behavior i.e both cogs start executing same functions because of HUB memory corruption. To avoid this I used two NOPs. Still I am not sure that it is a generic solution. Or it needs semaphore handling here. Please let me know what you think.
If user wants to launch his own cog by giving cogid, i.e pass cog id as another parameter may be through R1 and encode that in 0-2 bits of COGINIT’s dest register and reset bit number 3. I tried this, but it is not working. If I debug this in GEAR debugger, No new cogs are being launched.

***

Ideas? Suggestions?

Thanks

// richard

stevenmess2004 · 2008-04-10 08:07

Why don't use PAR as the initial function pointer. I think that your problem is that if two cogs start a new cog at the same time the hub memory gets corrupted. If you use PAR than this shouldn't happen. Your other option would be to use the locks so that only one cog can start a new cog at a given time.

How are you planning on getting arguments into the new cog?

ImageCraft · 2008-04-10 08:41

stevenmess2004 said...
Why don't use PAR as the initial function pointer. I think that your problem is that if two cogs start a new cog at the same time the hub memory gets corrupted. If you use PAR than this shouldn't happen. Your other option would be to use the locks so that only one cog can start a new cog at a given time.

How are you planning on getting arguments into the new cog?

As for arguments, initially, the user will have to use global variables.

Upon reflecting on this, I think what happens is that since the same HUB location 0x7C is used to pass the function address, 2 (LMM executed) NOPs just happens to be the fudge factor needs for the new COG to be launched and the memory copying to the new COG to copy up through 0x7C. So if this routine changes, or if the LMM kernel changes, this fudge factor may change as well. Meaning that it may be best if we use some sort of locking mechanism. PAR doesn't quite do the job, I think, as that involves more work in the LMM kernel.

Thanks

stevenmess2004 · 2008-04-10 08:58

Whats the problem with using PAR? Just have some variable space at the start of the cog and use it for the initial setup. Something like this

DAT
org 0
R0 mov programCounter,PAR  
R1 jmp runLoop

Of course, this just shifts the problem to how to pass variables. How about this, the PAR variable is a pointer to a long that contains a pointer to the function and a pointer to the variables in hub memory. So some code like this may do the trick. Don''t forget that we can reuse this space as variable space.

DAT
org 0
R0 rdword programCounter,PAR
R1 mov R0,PAR
R2 add  R0,#2
R3 rdword argPointer,R0
R4 jmp runLoop

mirror · 2008-04-10 10:08

ImageCraft said...
If I debug this in GEAR debugger, No new cogs are being launched.

***

Ideas? Suggestions?

It could be a limitation in GEAR. Are you using my variation of GEAR http://forums.parallax.com/showthread.php?p=701256, it has a number of fixes for the simulation model - I seem to recall making some change to a HUB operation a while ago, but the version posted·is as I'm currently using it·except for a minor bug discovered yesterday - the display of rdlong/wrlong·rdword/wrword rdbyte/wrbyte instructions is swapped. The operation is correct,·but the instructions are displayed incorrectly.

It could also be that coginit is not yet handled as part of the PASM simulation - I'll have a look.

stevenmess2004 · 2008-04-10 10:13

coginit does work in GEAR. Otherwise most programs that use the tv or VGA plugins wouldn't work

.

Edit: actually, maybe not because it could be using the spin cognew or ccoginit.

mirror · 2008-04-10 10:18

stevenmess2004 said...
coginit does work in GEAR. Otherwise most programs that use the tv or VGA plugins wouldn't work.

Edit: actually, maybe not because it could be using the spin cognew or ccoginit.

It definitely works from Spin! The question is about from PASM - believe it or not you don't see too many cogs started from PASM!

The code is all in place - I guess it might just need a little bit of testing!
·

ImageCraft · 2008-04-10 10:28

Another question just came up. On reset, SPIN loads 496 long words to COG0, and the stub code we have in the first 0x20 bytes tells SPIN to copy code from 0x20 to 0 at COG1 and then start there, so the C LMM kernel really starts at COG1. We don't stop COG0 per se, but experimenting with our coginit_lmm code with GEAR, it looks like COG0 will be used. Why is that so? Shouldn't we have to manually do a COGSTOP at COG0 for it to be reused? Is this how the real HW works, or just an issue with GEAR?

mirror · 2008-04-10 10:35

ImageCraft said...
Another question just came up. On reset, SPIN loads 496 long words to COG0, and the stub code we have in the first 0x20 bytes tells SPIN to copy code from 0x20 to 0 at COG1 and then start there, so the C LMM kernel really starts at COG1. We don't stop COG0 per se, but experimenting with our coginit_lmm code with GEAR, it looks like COG0 will be used. Why is that so? Shouldn't we have to manually do a COGSTOP at COG0 for it to be reused? Is this how the real HW works, or just an issue with GEAR?

What you see is probably correct!! The stub code (as I understand it) re-uses COG0. There is some·historical signifcance to this - although someone else may need to·correct me on the exact details of what follows:

The original "stub" was written by Cliffe Biffle for Propeller Forth - and as Propeller Forth doesn't "play nice" with spin (as with ImageCraft C), he chose to overwrite the spin interpreter with his own forth interpreter.

Ariba · 2008-04-10 18:12

The Spin Interpreter Cog stops itself, when the end of the Spin code is reached (if you don't have an endless repeat loop at the End).

hippy · 2008-04-10 23:05

I don't know if this helps at all but here's some code which starts an LMM ( not a C-LMM but could be ) whose LMM code launches three more Cogs each running the same LMM interpreter but executing different LMM code. This example toggles Leds on P0, P1 and P2 at different rates.

Note that each LMM sub-program has its own access to LMM_xxx variables ( what would be C's R0, R1 etc I presume ) as they are unique / local for each Cog.

LmmCogs_001 - LMM CogNew calls an LMM Kernel routine run in Cog
LmmCogs_002 - LMM CogNew calls Library Code executed as LMM itself

Version 002 is preferable in most cases because it minimises the amount of code held in Cog. It's also extensible for any other library routines which may need to be added.

The 'add Lmm_Pc,#$10' adjusts for object base are necessary because 'long @label' addresses compile as offsets from the start of object ($10), not the start of memory ($0). This is required for Spin but I expect C-LMM assembler uses offsets from $0.

mirror · 2008-04-10 23:36

Thanks hippy,

I just downloaded LmmCogs_002.spin and was able to confirm that the CogInit functionality is working fine for GEAR.

I changed the LED toggle rates (using shr Lmm_1sTick,#15 and other similar shifts), because otherwise GEAR is about as exciting as watching continental drift.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

hippy · 2008-04-11 01:27

@ mirror : Glad it was useful. I've been re-trying GEAR and have had a lot more success with it than before, thanks.

@ ImageCraft : I see your problem regarding shared Hub memory. One solution could be to reserve 8 Hub longs as parameters one per Cog. CogInit will indicate which Cog was invoked and the parameter can then be put there after CogInit. The Cog may have to delay / check the parameter is there but it would avoid shared contention or needing locking.

mirror · 2008-04-11 01:38

hippy said...
@ ImageCraft : I see your problem regarding shared Hub memory. One solution could be to reserve 8 Hub longs as parameters one per Cog. CogInit will indicate which Cog was invoked and the parameter can then be put there after CogInit. The Cog may have to delay / check the parameter is there but it would avoid shared contention or needing locking.

This solution·wont even need any locking. A cog takes 496 hub accesses to load before it starts - so you have an·age in which to save any variables to a global shared hub area.

ImageCraft · 2008-04-11 06:06

mirror said...

hippy said...

@ ImageCraft : I see your problem regarding shared Hub memory. One solution could be to reserve 8 Hub longs as parameters one per Cog. CogInit will indicate which Cog was invoked and the parameter can then be put there after CogInit. The Cog may have to delay / check the parameter is there but it would avoid shared contention or needing locking.
This solution wont even need any locking. A cog takes 496 hub accesses to load before it starts - so you have an age in which to save any variables to a global shared hub area.

Actually, I think that's the problem: COGINIT returns in 7-22 cycles, but the actual copying of the LMM kernel in the HUB RAM to the COG RAM, including the shared variable in question, has not yet completed. So when the second COGINIT is called, the shared variable may be overwritten before the first instance has finished copying it.

ImageCraft · 2008-04-12 09:37

Steven's solution is the simplest - using PAR works great. Now we can send off any LMM C function to another COG. Bwahahahahahha!!! Now to support launching another COG with native code. This will make the driver writers happy.

I also added a (hopefully) simple to use interface to initialize CLKMODE, CLKFREQ to the Project->Options->Target. Things are falling into place...

stevenmess2004 · 2008-04-12 09:46

You shouldn't need to do anything to launch another assembly cog except to wrap a function around it. Its all done in a single instruction (well actually two, you need to get the pointer (say using a mov) and then do the coginit).

How did you solve the argument problem?

ImageCraft · 2008-04-12 09:55

stevenmess2004 said...
You shouldn't need to do anything to launch another assembly cog except to wrap a function around it. Its all done in a single instruction (well actually two, you need to get the pointer (say using a mov) and then do the coginit).

How did you solve the argument problem?

The "big" problem is the asm/linker syntax. Remember, it has to work with the rest of the LMM C and asm program! The actualy function is as you said, just COGINIT

For arguments, as said earlier, just use global variables right now

Launching LMM C function in another COG

Comments