CogInit dark magic

Surac · 2021-04-10 05:47

Spoon-feeding P2 knowledge to me every day now i stumbled across the COGINIT instruction.

Reading the Documentation COGINIT can find an unused cog, copy $1F8 longs form HUB-Ram to the COG-Ram and then jumps to COG address $0 of the newly initialized cog. No Problems to understand here.

BUT:
In the Google Tables Documentation of the P2 Instructions COGINIT seams to achieve all this in 2..9 clock cycles. What kind of dark magic is built into the P2 to copy $1F8 longs this fast?
Or is the instruction table simply wrong here?

cgracey · 2021-04-10 06:24

It takes 2..9 cycles to execute the instruction. The instruction initiates the self-loading procedure in the target cog.

Surac · 2021-04-10 06:47

Ahh so there is a self loading + jump to $0 build into the target cog.

So the calling cog uses 2..9 clocks and then can proceed. But it takes then some time for the new cog to come to live. So the calling cog haste to wait some time till it can communicate or signal the new cog.

How long does the self loading mechanism take?

evanh · 2021-04-10 07:48

For hubexec, very little, but most examples are cogexec. For cogexec, it'll be something like $1f8 longwords (clocks) plus a few extra, so somewhere around 512 clocks.

It'll be quite easy to measure by using GETCT in both cogs, and comparing.

evanh · 2021-04-10 08:02

529 to 536 ticks, not including the COGINIT.
536 to 543 531 to 542 ticks, including the COGINIT.

        waitx   #1
        getct   pb
        coginit #2, launchptr
'       coginit #$22, launchptr
'       getct   pb

        waitatn
        rdlong  pa, #0
        sub pa, pb
        call    #itod
        call    #putnl

        jmp #$


launchptr   long    launch


orgh  $400
        long    0[7]
launch
        getct   pa
        wrlong  pa, #0
        cogatn  #1      'cog0

        cogid   pa
        cogstop pa

evanh · 2021-04-10 08:30

28 to 35 ticks for hubexec, not including the COGINIT.
31 to 39 30 to 44 ticks for hubexec, including the COGINIT.

        waitx   #7
        getct   pb
'       coginit #2, launchptr
        coginit #$22, launchptr
'       getct   pb

        waitatn
        rdlong  pa, #0
        sub pa, pb
        call    #itod
        call    #putnl

        jmp #$


launchptr   long    launch


orgh  $400
        long    0[8]
launch
        getct   pa
        wrlong  pa, #0
        cogatn  #1      'cog0

        cogid   pa
        cogstop pa

evanh · 2021-04-10 08:57

Hmm, I've still missed 3 ticks from the 542. It should be 545. In hindsight, the simple way is measure the post-COGINIT times then just add the execute range of COGINIT to them.

Surac · 2021-04-10 09:45

using this program

con _clkfreq = 10_000_000

dat org
    asmclk
    dirh #0
    coginit #16,##$400

    rep #1,#0
    drvnot #0   

dat
    orgh $400
    org

    rep #1,#0       
    drvnot #2

I measure 534 clocks with my logic analyzer.
This kind of knowledge seems to be in the documentation, i guess. I will add some later.

Cluso99 · 2021-04-10 09:50

And the way to sync ie know the new cog is running is to clear/set a hub location.

Surac · 2021-04-10 10:00

What baffles me with COGINIT is that the self-loading of the started cog seems to be able to copy its data OUT of the COG and HUB ram of the COG that initiates the COGINIT instruction
It is clearly mentioned inside the documentation that target cog copy's it's data from the hub.

con _clkfreq = 10_000_000

dat org
    asmclk
    coginit #$1,##@cog2 'do not use lable to make copy+run form 0 sure

    rep #1,#0
    drvnot #0   

dat org

cog2    rep #1,#0       'blink pin 57 takes 53,4 us
    drvnot #2

this works perfectly well, even as the cog2 lable is clearly NOT in the HUB ram but in the COG ram of cog 0.

So does cog1 copy out of cog0 COG ram here? how is it working? is there a hidden third port on the COG ram?
how much does ist copy?
does it also copy bytes out of the HUB ram of cog0? or does it stop at $1f8?
and btw. what does ##@ do?

evanh · 2021-04-10 10:03

@Surac said:
using this program
...
I measure 534 clocks with my logic analyzer.

In that case there is a range of 8 ticks, depending on hubRAM fetching alignment of the launched cog. So if you, say, change the ORGH from $400 to $404 then the time taken to transfer the data is delayed by a tick, making it 535. That increases up to 536 then wraps back to 529 ticks with ORGH $40c. The alignment is different phase for different cogs, both launcher and launchee.

evanh · 2021-04-10 10:06

@Surac said:
this works perfectly well, even as the cog2 lable is clearly NOT in the HUB ram but in the COG ram of cog 0.

Ah, another illusion. even cog0 is copied from hubRAM. So all program code is sitting in hubRAM and is repeatedly copied to each cog as directed by the COGINITs.

evanh · 2021-04-10 10:11

@Surac said:

and btw. what does ##@ do?

The ## means it's a 32-bit immediate data value embedded in the program code. ie: A hidden AUGx instruction is prefixed.
The @ means a program label is forced to hubRAM byte scaled addressing.

evanh · 2021-04-10 10:22

@evanh said:
... So all program code is sitting in hubRAM and is repeatedly copied to each cog as directed by the COGINITs.

A detail: Once a cogexec cog is launched, the cog runs with the content of cogRAM. It no longer needs the code in hubRAM unless there is a need to launch further copies later on. So that space in hubRAM can be reused after launch.

Surac · 2021-04-10 10:33

No

have a look here

00000 000             | dat org
00000 000 04 80 80 FF 
00004 001 00 30 67 FD |     hubset  ##clkmode_ & !%11
00008 002 86 01 80 FF 
0000c 003 1F 80 66 FD |     waitx   ##20_000_000/100
00010 004 04 80 80 FF 
00014 005 00 36 67 FD |     hubset  ##clkmode_
00018 006             | 
00018 006 24 02 EC FC |     coginit #$1,#$24    'do not use lable to make copy+run form 0 sure
0001c 007             |     
0001c 007 00 02 DC FC |     rep #1,#0
00020 008 5F 00 64 FD |     drvnot #0   
00024 009             | 
00024 009             | 
00024 009             | cog2    rep #1,#0       'blink pin 57 takes 53,4 us
00024 009 00 02 DC FC 
00028 00a 5F 04 64 FD |     drvnot #2

the coginit instruction clearly points to byte address $24, that is inside the cog ram but counted as bytes, how on earth would coginit know where to look for this "shadow copy of cog0" you mentioned?

evanh · 2021-04-10 10:34

@evanh said:
The @ means a program label is forced to hubRAM byte scaled addressing.

This is useful for something like a COGINIT code reference because sometimes the referenced code is not specified as living in hubRAM.

EDIT: Especially if the referenced code is meant to be cogexec. The @ then provides the hubRAM address of the code rather than the assembled ORGin.

Surac · 2021-04-10 10:41

We are not talking about the cogexec spin instruction but the coginit assembler instruction

evanh · 2021-04-10 10:41

eg: This doesn't works:

dat
org
        coginit #1, #cogexec_reference
        jmp #$

orgh
hubexec_reference
org
cogexec_reference
        dirh    #0
.loop
        outnot  #0
        jmp #.loop

This works:

dat
org
        coginit #1, #hubexec_reference
        jmp #$

orgh
hubexec_reference
org
cogexec_reference
        dirh    #0
.loop
        outnot  #0
        jmp #.loop

And so does this:

dat
org
        coginit #1, #@cogexec_reference
        jmp #$

orgh
hubexec_reference
org
cogexec_reference
        dirh    #0
.loop
        outnot  #0
        jmp #.loop

EDIT: Doh! Forgot the jmp #$ ... and the dat ... and the first case doesn't work. Pays to test these things

evanh · 2021-04-10 10:43

@Surac said:
We are not talking about the cogexec spin instruction but the coginit assembler instruction

Correct, I don't write Spin at all. Never got the hang of it.

evanh · 2021-04-10 11:02

@Surac said:
We are not talking about the cogexec spin instruction but the coginit assembler instruction

Oh, yeah, the @ has a similar but different meaning in Spin. I haven't tried to work it out.

evanh · 2021-04-10 11:16

@Surac said:
...
00024 009             | cog2  rep #1,#0       'blink pin 57 takes 53,4 us
00024 009 00 02 DC FC 
00028 00a 5F 04 64 FD |   drvnot #2
the coginit instruction clearly points to byte address $24, that is inside the cog ram but counted as bytes, how on earth would coginit know where to look for this "shadow copy of cog0" you mentioned?

See the addresses on the very left. That's the hubRAM, byte scaled, addresses where the loader will place the program in hubRAM before launching cog0. The second column is the cogRAM addresses for ORG'd code.

COGINIT only accepts a hubRAM address. So that $24 is hubRAM address $24. So the program code is copied from a hubRAM range of $24 to $24+$1f8*4-1 or $803.

evanh · 2021-04-10 11:26

PS: hubRAM below $400 exists as regular addressable RAM. It's only accessible as data, code execution doesn't happen there, but it's very much still there.

Rayman · 2021-04-10 11:26

So I guess there is some hidden code somewhere that does the dark magic of copying the data from hubram to the new cog's RAM for cogexec mode.

If this hidden code could be treated as hubexec, seems to me that the new cog could load and jump to cog address #0 itself and save the cog that did the coginit several hundred clocks...

Could one write their own version of coginit that does this? I.e., start the new cog in hubexec mode, pointed to some new code that loads the new cog's local RAM and then jumps to address #0 to switch to cogexec mode?

evanh · 2021-04-10 11:34

@Rayman said:
If this hidden code could be treated as hubexec, seems to me that the new cog could load and jump to cog address #0 itself and save the cog that did the coginit several hundred clocks...

That's how it works already. The COGINIT itself only take a few clocks. We're measuring how long the newly launched cog takes.

evanh · 2021-04-10 11:46

Second last page of the prop2 hardware doc has the verilog wired code for a newly coginit'd cog. The comments say:

mov outa,#0    (clear port shadow registers)
mov outb,#0
mov ina,#$1f8  (point ina/ijmp0 to cog's initial int0 handler)
setq #$1f7     (if !hubs, load $1f8 longs from ptrb)
rdlong 0,ptrb
jmp dirb/ptrb  (if !hubs, jump to $000 (dirb=0), else ptrb)

The last three instructions have a dynamic encoding that changes depending on "hubs". This will be hubexec or not.

Surac · 2021-04-10 13:12

Sorry @evanh , nothing you wrote here helps me to understand.

I think I wait till @cgracey clears this out for us.

evanh · 2021-04-10 13:30

Surac,
Here's the .lst from the middle example just above. I've changed the final JMP to an absolute encoded branch so you can see it is jumping to address $1 in cogRAM.

00000                 | dat
00000 000             | org
00000 000 08 02 EC FC |         coginit #1, #hubexec_reference
00004 001 FC FF 9F FD |         jmp #$
00008 002             | 
00008                 | orgh
00008                 | hubexec_reference
00008 000             | org
00008 000             | cogexec_reference
00008 000 41 00 64 FD |         dirh    #0
0000c 001             | .loop
0000c 001 4F 00 64 FD |         outnot  #0
00010 002 01 00 80 FD |         jmp #\.loop

It shows the cog addresses in the second column. You can see those addresses reset back to $0 at the second ORG. With the ORG I'm informing the assembler of my intention to use that section of code as if it were located at address $0 (in cogRAM). And the COGINIT causes the newly launched cog to copy that code into cogRAM at detination address $0.

But it is copied from hubRAM. And in hubRAM it is located at address $8. So the COGINIT needs to be told to pass the source address of $8 to the newly launched cog.

evanh · 2021-04-10 13:41

When Pnut or Loadp2 or Proptool or any other loader loads a pure pasm2 program into the Prop2 it places it in hubRAM from address $0. From there the loader, which itself will be running in cog0, will typically perform a COGINIT #0,#0 to load cog0 with the first $1f8 longwords from hubRAM into cog0 and restart from $0.

Spin programs load differently.

Surac · 2021-04-10 13:54

@evanh said:
When Pnut or Loadp2 or Proptool or any other loader loads a pure pasm2 program into the Prop2 it places it in hubRAM from address $0. From there the loader, which itself will be running in cog0, will typically perform a COGINIT #0,#0 to load cog0 with the first $1f8 longwords from hubRAM into cog0 and restart from $0.

Spin programs load differently.

Ok if it works this way, this information will help

evanh · 2021-04-10 14:05

Err, I was slightly inaccurate. That's how the initial program must load after a reset. It must be machine code and it is always deposited at hubRAM address $0 and launched with COGINIT #0,#0. For a pure pasm2 program that maybe all there is.

An actual staged loader might provide more flexible options.

evanh · 2021-04-10 15:33

And of course the full story has more to it. A cold boot, hard reset, actually loads 16 kB of code from mask ROM into hubRAM and then performs a COGINIT #0,#0. That ROM code then performs a number of actions including attempting to load code from EEPROM and SD card. It is sort of a collection of programs that has a simple command over serial mechanism, the Taqoz interpreter and a monitor/debugger program as well.

But the end result is, when just plain booting, it also does a COGINIT #0,#0 or loads the next stage into cogRAM of cog0 and JMP #0. So still equivalent to above declaration.

CogInit dark magic

Comments