CogInit dark magic
Surac
Posts: 176
in Propeller 2
Spoon-feeding P2 knowledge to me every day now i stumbled across the COGINIT instruction.
Reading the Documentation COGINIT can find an unused cog, copy $1F8 longs form HUB-Ram to the COG-Ram and then jumps to COG address $0 of the newly initialized cog. No Problems to understand here.
BUT:
In the Google Tables Documentation of the P2 Instructions COGINIT seams to achieve all this in 2..9 clock cycles. What kind of dark magic is built into the P2 to copy $1F8 longs this fast?
Or is the instruction table simply wrong here?
Comments
It takes 2..9 cycles to execute the instruction. The instruction initiates the self-loading procedure in the target cog.
Ahh so there is a self loading + jump to $0 build into the target cog.
So the calling cog uses 2..9 clocks and then can proceed. But it takes then some time for the new cog to come to live. So the calling cog haste to wait some time till it can communicate or signal the new cog.
How long does the self loading mechanism take?
For hubexec, very little, but most examples are cogexec. For cogexec, it'll be something like $1f8 longwords (clocks) plus a few extra, so somewhere around 512 clocks.
It'll be quite easy to measure by using GETCT in both cogs, and comparing.
529 to 536 ticks, not including the COGINIT.
536 to 543 531 to 542 ticks, including the COGINIT.
28 to 35 ticks for hubexec, not including the COGINIT.
31 to 39 30 to 44 ticks for hubexec, including the COGINIT.
Hmm, I've still missed 3 ticks from the 542. It should be 545. In hindsight, the simple way is measure the post-COGINIT times then just add the execute range of COGINIT to them.
using this program
I measure 534 clocks with my logic analyzer.
This kind of knowledge seems to be in the documentation, i guess. I will add some later.
And the way to sync ie know the new cog is running is to clear/set a hub location.
What baffles me with COGINIT is that the self-loading of the started cog seems to be able to copy its data OUT of the COG and HUB ram of the COG that initiates the COGINIT instruction
It is clearly mentioned inside the documentation that target cog copy's it's data from the hub.
this works perfectly well, even as the cog2 lable is clearly NOT in the HUB ram but in the COG ram of cog 0.
In that case there is a range of 8 ticks, depending on hubRAM fetching alignment of the launched cog. So if you, say, change the ORGH from $400 to $404 then the time taken to transfer the data is delayed by a tick, making it 535. That increases up to 536 then wraps back to 529 ticks with ORGH $40c. The alignment is different phase for different cogs, both launcher and launchee.
Ah, another illusion. even cog0 is copied from hubRAM. So all program code is sitting in hubRAM and is repeatedly copied to each cog as directed by the COGINITs.
The ## means it's a 32-bit immediate data value embedded in the program code. ie: A hidden AUGx instruction is prefixed.
The @ means a program label is forced to hubRAM byte scaled addressing.
A detail: Once a cogexec cog is launched, the cog runs with the content of cogRAM. It no longer needs the code in hubRAM unless there is a need to launch further copies later on. So that space in hubRAM can be reused after launch.
No
have a look here
the coginit instruction clearly points to byte address $24, that is inside the cog ram but counted as bytes, how on earth would coginit know where to look for this "shadow copy of cog0" you mentioned?
This is useful for something like a COGINIT code reference because sometimes the referenced code is not specified as living in hubRAM.
EDIT: Especially if the referenced code is meant to be cogexec. The @ then provides the hubRAM address of the code rather than the assembled ORGin.
We are not talking about the cogexec spin instruction but the coginit assembler instruction
eg: This doesn't works:
This works:
And so does this:
EDIT: Doh! Forgot the
jmp #$
... and thedat
... and the first case doesn't work. Pays to test these thingsCorrect, I don't write Spin at all. Never got the hang of it.
Oh, yeah, the @ has a similar but different meaning in Spin. I haven't tried to work it out.
See the addresses on the very left. That's the hubRAM, byte scaled, addresses where the loader will place the program in hubRAM before launching cog0. The second column is the cogRAM addresses for ORG'd code.
COGINIT only accepts a hubRAM address. So that $24 is hubRAM address $24. So the program code is copied from a hubRAM range of $24 to $24+$1f8*4-1 or $803.
PS: hubRAM below $400 exists as regular addressable RAM. It's only accessible as data, code execution doesn't happen there, but it's very much still there.
So I guess there is some hidden code somewhere that does the dark magic of copying the data from hubram to the new cog's RAM for cogexec mode.
If this hidden code could be treated as hubexec, seems to me that the new cog could load and jump to cog address #0 itself and save the cog that did the coginit several hundred clocks...
Could one write their own version of coginit that does this? I.e., start the new cog in hubexec mode, pointed to some new code that loads the new cog's local RAM and then jumps to address #0 to switch to cogexec mode?
That's how it works already. The COGINIT itself only take a few clocks. We're measuring how long the newly launched cog takes.
Second last page of the prop2 hardware doc has the verilog wired code for a newly coginit'd cog. The comments say:
The last three instructions have a dynamic encoding that changes depending on "hubs". This will be hubexec or not.
Sorry @evanh , nothing you wrote here helps me to understand.
I think I wait till @cgracey clears this out for us.
Surac,
Here's the .lst from the middle example just above. I've changed the final JMP to an absolute encoded branch so you can see it is jumping to address $1 in cogRAM.
It shows the cog addresses in the second column. You can see those addresses reset back to $0 at the second ORG. With the ORG I'm informing the assembler of my intention to use that section of code as if it were located at address $0 (in cogRAM). And the COGINIT causes the newly launched cog to copy that code into cogRAM at detination address $0.
But it is copied from hubRAM. And in hubRAM it is located at address $8. So the COGINIT needs to be told to pass the source address of $8 to the newly launched cog.
When Pnut or Loadp2 or Proptool or any other loader loads a pure pasm2 program into the Prop2 it places it in hubRAM from address $0. From there the loader, which itself will be running in cog0, will typically perform a COGINIT #0,#0 to load cog0 with the first $1f8 longwords from hubRAM into cog0 and restart from $0.
Spin programs load differently.
Ok if it works this way, this information will help
Err, I was slightly inaccurate. That's how the initial program must load after a reset. It must be machine code and it is always deposited at hubRAM address $0 and launched with COGINIT #0,#0. For a pure pasm2 program that maybe all there is.
An actual staged loader might provide more flexible options.
And of course the full story has more to it. A cold boot, hard reset, actually loads 16 kB of code from mask ROM into hubRAM and then performs a COGINIT #0,#0. That ROM code then performs a number of actions including attempting to load code from EEPROM and SD card. It is sort of a collection of programs that has a simple command over serial mechanism, the Taqoz interpreter and a monitor/debugger program as well.
But the end result is, when just plain booting, it also does a COGINIT #0,#0 or loads the next stage into cogRAM of cog0 and JMP #0. So still equivalent to above declaration.