Cog Launch Procedure and Timing

Ken Peterson · 2008-06-27 14:23

I just want to make sure I fully understand how cogs are launched in the Prop.· Here's my take:

For the sake of simplicity, let's leave·SPIN out of it.· Assume I'm launching a PASM·routine into a cog from another PASM routine.

1.· PASM coginit instruction takes 7-22 clocks like any hub instruction and returns immediately
2.· HUB system loads cog with 496 logs, each long transferred during that cog's hub window
3.· Cog starts executing at first instruction in cog memory

If this is true, then according to my calculations it should take 7936 (496*16) system clock cycles to load the cog, plus 7-22 to execute the coginit instruction, and at 80MHz this should take about 100 microseconds. ·Is this accurate?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Post Edited (Ken Peterson) : 6/27/2008 4:28:16 PM GMT

Ken Peterson · 2008-06-28 02:15

Anybody??

Just hoping for someone to offer some guesses before this falls off page 1. I didn't find anything in the docs that clearly describes what happens and how long it takes.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Mike Green · 2008-06-28 03:05

Sounds right. It's a little over 100us at 80MHz.

Ken Peterson · 2008-06-28 03:18

Thanks, Mike. I suppose I should just test it, shouldn't I? If my experiments don't support my theory, then I'll be back with more info and/or questions.

Regards,
Ken

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Bob Lawrence (VE1RLL) · 2008-06-28 16:30

Ken said...
I didn't find anything in the docs that clearly describes what happens and how long it takes.

Reference:
www.parallax.com/Portals/0/Downloads/docs/prod/prop/PropellerDatasheet-v1.0.pdf

Item 1:

Hub instructions, the Propeller Assembly instructions that access mutually exclusive resources, require 7 cycles to execute but they first need to be synchronized to the start of the Hub Access Window. It takes up to 15 cycles (16 minus 1, if we just missed it) to synchronize to the Hub Access Window plus 7 cycles to execute the hub instruction, so hub instructions take from 7 to 22 cycles to complete.

Propeller Assembly Instruction Table

* 000011 0001 1111 ddddddddd

010 COGINIT D Initialize a cog according to D Result = 0 No cog free 0 7..22 (Clocks)

========================================================================================================================================
Item 2. - Page 15:

When a cog is booted up, locations 0 ($000) through 495($1EF) are loaded sequentially from Main RAM / ROM and its special purpose locations, 496 ($1F0) through 511($1FF), are cleared to zero.

Each Special Purpose register may be accessed via its physical address, its predefined name, or indirectly in Spin via a register array variable SPR with an index of 0 to 15, the last four bits of the register's address.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Aka: CosmicBob

Ale · 2008-06-28 16:58

You just make a, in launcher

mov val1,CNT
wrlong val1,ptr_1st_long

and, in launched

mov val1,CNT
wrlong val1,ptr_2nd_long

And later the difference should say how much. But you kew that. So how much ?

Ken Peterson · 2008-06-29 12:58

I don't have my Propeller Demo board with me, it's in the office. I'll experiment and report my results here when I get back to the office.

@Bob: I saw the part you posted in the manual, but it doesn't explicitly describe the mechanism whereby the 496 longs get copied. I assume it happens during normal hub access windows to avoid screwing up the timing for the rest of the cogs, but that's only a guess on my part.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Mike Green · 2008-06-29 14:33

From the descriptions that have been posted, I'm assuming that the internal cog program counter gets set to zero and some internal cog register used for hub access is used for tracking the hub address. The longs are copied using the same logic that's used for RDLONG except that both registers are incremented after each transfer. When the program counter overflows, the copy is finished. There are "shadow" memory locations for each of the 16 special locations and I'm pretty sure that 512 longs get copied with the last 16 being copied into the "shadow" memory locations rather than the actual registers which are zeroed.

Bob Lawrence (VE1RLL) · 2008-06-29 15:59

Ken said...
3. Cog starts executing at first instruction in cog memory

From Cogs (Processors) : the cog begins executing instructions, starting at location 0 of Cog RAM.

"When a cog is booted up, locations 0 ($000) through 495 ($1EF) are loaded sequentially from Main RAM / ROM and its special purpose locations, 496 ($1EF) through 511 ($1FF) are cleared to zero. After loading, the cog begins executing instructions, starting at location 0 of Cog RAM. It will continue to execute code until it is stopped or rebooted by either itself or another cog, or a reset occurs."

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Aka: CosmicBob

Ken Peterson · 2008-06-29 21:36

All good information. I guess my main focus with the question was to find out exactly how long the process takes. I'll do some testing to find that out.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Ken Peterson · 2008-06-30 15:24

I get $202B (8235) clocks. That's about right considering there's probably a bit of overhead in addition to the copy, and I had some extra code in there to do the subtraction and put the data back out so I could send it to the screen.

I suppose where I'm going with this is that I'm wondering if it makes sense to separate a large assembly program into chunks and page it via subroutine calls rather than using LMM. When calling a subroutine, you would save all of your important data on a "stack" in hub memory, then call the subroutine by launching it in the same cog. I'm just not sure about an easy mechanism for getting the main program back into the cog and picking up at the execution point right after where the call was made. Perhaps with a jump table at the beginning...

Anyways, instead of having a 5X to 8X performance hit like with LMM, you would have a subroutine call overhead of about 8200 clocks, with another hit of 8200 clocks upon return.

200us doesn't seem like a bad price to pay for·many applications.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Post Edited (Ken Peterson) : 6/30/2008 3:29:49 PM GMT

hippy · 2008-06-30 19:17

As you're effectively doing paging, why not just copy the paged PASM into the Cog ? That way
you only have to copy what you need and you have persistent variables and I/O pins ready to
go without having to worry about the stack and loading / saving them.

LMM doesn't have to be single instruction at a time, there's no reason paging cannot also be
used. In fact it's probably better for any LMM code which loops or needs to be high speed. Put
a 'call #overlaythis" at the start of the LMM code which is paged and LMM code sections
can easily be changed from instruction at a time to paged operation.

Ken Peterson · 2008-06-30 19:57

That's true. I hadn't thought through all the ramifications (no pun intended..well...maybe) of launching a cog in the middle of execution. If the pins are all reset, that could have undesirable effects.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Cog Launch Procedure and Timing

Comments