Questions about PASM2 and register allocation (flexspin specific, maybe?)

Kaio · 2025-11-09 00:02

@TMM said:
Does COGINIT always copy 504 longwords? The documentation just says it'll "start" I was vaguely assuming that there was some hidden magic to tell the cog how much data to copy. But reading the COGINIT page in the PASM2 doc suggests you only give it a start address. I figured it "couldn't be that" because the docs say it completes in 2-9 instructions! How can copying 504 longwords only take so few instructions! But I'm guessing then that that is only the cost on the calling cog, and there will be some delay (presumably of 63 - 70 clocks?) before the new cog starts executing code?

Hi @TMM, you was interested in the exact delay for initializing of a cog. I wrote a little program to measure it. You can use the FlexProp app to run it on your board.
You need to enable the debug option (P2 only) and for easy use ANSI or PST terminal in Options menu before. Then use "Compile and run" from Commands menu.

The delay is more as you would think. Unfortunately, I cannot explain why.

evanh · 2025-11-09 01:35

That's quite a lot. Certainly many more times what it takes to read 504 longwords. I was expecting around 700-1000 ticks. Something to keep in mind I guess.

EDIT: Hang-on, I'm getting a consistent 42167 ticks with my Eval board. That's a massive difference from 300k!

Cog0  INIT $0000_0000 $0000_0000 load
Cog1  INIT $0000_0010 $0000_0000 load
Cog1  cog1 initialization: cog1:delay = 42_167

Ah, the debugger setup for cog1 is probably stealing lots before the user code runs ...

Wuerfel_21 · 2025-11-09 02:17

Yes, when a cog starts up the debugger is first executed if loaded. This prints the "Cog N INIT" message. Printing to the serial console, in the grand scheme of things, is very slow. Only after that can the user code run.

TMM · 2025-11-09 04:41

I got curious so I wanted to write a thing to get as close as I know how to (which is probably not very close) to the correct number.

Based on that it looks like a coginit will take in absolute terms around 544 cycles. (+/- 2-9 for the coginit itself and 2 for the getct, so probably 540?)

I've attached the code I wrote, it's probably not very good. I'm still getting to grips with this it's a bit of a weird assembly compared to what I'm used to! (loads of fun tho!)

EDIT: I do realize in my code that my coginit #1, ##@main2 kind of works 'accidentally' to an extent. Or rather, that it isn't guaranteed. I realize that this is assembled to be at cog address 0, where 0 is actually entry. So just schlepping main2 to cog address 0 and having it work is probably not the most elegant. But I just wanted to copy the print_hex stuff into both cogs so I wouldn't have to do some kind of dance through hubram I verified the listing and it looked okay.

I guess a better way would've been to run print_hex from hubram so I wouldn't have had this problem at all?

EDIT2: I just read about ALTGN so I guess I probably could do something clever with ALTGN, and REP and the auto-indexer. I'll try that tomorrow

evanh · 2025-11-09 06:13

Cool, fast work! The code doesn't crash because the local branches are PC-relative and local variable references are all beyond the code.

I dug up some very old routines of my own and polished them up slightly and patched them into Kaio's program and got a measurement of 537 ticks. That's a relief. Most of the startup time is indeed the fast copy to load cogRAM.

Kaio · 2025-11-09 10:22

@evanh said:
EDIT: Hang-on, I'm getting a consistent 42167 ticks with my Eval board. That's a massive difference from 300k!
Cog0  INIT $0000_0000 $0000_0000 load
Cog1  INIT $0000_0010 $0000_0000 load
Cog1  cog1 initialization: cog1:delay = 42_167

I was using the default baudrate 230400, with 2 MBits I see the same as you.

@evanh said:
I dug up some very old routines of my own and polished them up slightly and patched them into Kaio's program and got a measurement of 537 ticks. That's a relief. Most of the startup time is indeed the fast copy to load cogRAM.

Nice work, @evanh. I thought there must be many overhead by the debugger.

Questions about PASM2 and register allocation (flexspin specific, maybe?)

Comments