Time for another silly question about the propeller, spin and memory access

4x5n · 2011-09-26 20:13

I had to stop my work on the propeller for a week or so do to the lack of time and when I picked it back up came across a couple of questions.

I understand that with spin the actual spin code is stored in hub memory (right term?) and are processed via an interpreter that exists in the cogs. I understand that memory access is controlled by by the hub and each cog has to wait it's turn to access the memory. My question is this. If a spin instruction finishes and doesn't have current access to hub ram what happens? Does it stop and wait until it can access the memory?

I'm also looking for a good reference on how to use the counters in the cog. I'm working my way through the PE book and have a couple other books that I got on the propeller and not a lot is said about most of the counter modes.

Rick_H · 2011-09-26 21:49

yes it waits till the cog completes its rotation for memory access.

potatohead · 2011-09-26 21:56

Hub is the right term.

Anything running in a COG waits for it's HUB access window. A very coarse rule of thumb is this: The fastest is two instructions between hub-operations. The next fastest is six instructions, and the next after that is ten. If one can't hit the window in the shorter period of time, the default will be the next longer period of time.

It's always the same, because of how the COG gets round robin access to the HUB. It's common for people to write code to hit those sweet spots too. Some of the techniques include:

1. Placing instructions out of "easy" order, such that the mandatory instruction wait times are actually spent doing something. These are essentially "free" instructions, in the context of operating with HUB memory. Cost is program complexity, benefit is speed.

2. Organizing data such that fewer HUB ops are needed. Fetching a long, vs a byte, for example. Cost is often program size and complexity, benefit is program speed.

3. "Unrolling" loops such that the lack of program control instructions makes it possible to hit the fast window. Cost is program size, benefit is program speed.

The SPIN intrepeter will wait like any program does. At the SPIN level, this really can't be seen on a individual instruction basis, though there are clearly faster and shorter ways to do similar things in SPIN. We've had many threads about that in the past. Ask on that, and 'ye will quite likely receive. One easy thing is to simply note the system counter, then perform a sequence of operations many times, then operating on the operation completed counter value to determine overall speed.

Substitute instructions and do it again, or use different routines and do it again. This can help you find optimal SPIN code without a lot of difficulty.

roedj · 2011-09-26 23:05

potatohead wrote: »

Hub is the right term.

Anything running in a COG waits for it's HUB access window.

I.

WOW! That just through my understanding of 'things' right out the window. I thought that instructions continued running in a COG but could only execute Main Memory access instructions during its turn. Non-Main Memory access instructions could always run.

Please explain,

Dan

Duane Degn · 2011-09-26 23:46

Dan,

Spin code has to wait for the hub access since that's were the program is stored. With PASM execution of the code can continue just fine independent of the hub. PASM just needs to access the hub if it's reading or writing to hub memory.

Duane

ElectricAye · 2011-09-27 04:10

4x5n wrote: »

.....

I'm also looking for a good reference on how to use the counters in the cog. I'm working my way through the PE book and have a couple other books that I got on the propeller and not a lot is said about most of the counter modes.

For Propeller Counters, have a look at this page and download AN001. I had thought they had updated this, but maybe I'm wrong.

ericball · 2011-09-27 04:17

roedj wrote: »

WOW! That just through my understanding of 'things' right out the window. I thought that instructions continued running in a COG but could only execute Main Memory access instructions during its turn. Non-Main Memory access instructions could always run.

Each COG has 496 longs of local storage. This may be used for code or data and is accessed at full speed (typically 4 cycles per instruction). However, when accessing HUB RAM, instruction execution will pause (up to 15 cycles) until it's designated access window, which occurs every 16 cycles. (8 COGS, 2 cycles per COG = 16 cycles.) A clever PASM coder will structure their code (and sometimes data) to minimize these access delays (kinda like the branch delay slots which were common in some RISC designs).

But at the SPIN level you don't have that much control. The SPIN interpreter is the PASM code and all of the SPIN code, data, and variables are stored in HUB RAM. So the optimizations aren't about minimizing access delays but minimizing accesses, i.e. figuring out how to express a function using the fewest SPIN tokens by making use of built-in operators.

4x5n · 2011-09-27 07:07

ElectricAye wrote: »

For Propeller Counters, have a look at this page and download AN001. I had thought they had updated this, but maybe I'm wrong.

Interestingly enough firefox didn't present it as a link but I was able to download it. Won't have time to look at it until later today but it looks exactly like what I'm looking for.

4x5n · 2011-09-27 08:58

Duane Degn wrote: »

Dan,

Spin code has to wait for the hub access since that's were the program is stored. With PASM execution of the code can continue just fine independent of the hub. PASM just needs to access the hub if it's reading or writing to hub memory.

Duane

That's what I was afraid of!! Allowing the individual cogs to access hub memory for method instructions independently of the hub is on my wishlist for the prop2 or prop3! :-)

It looks like I'm going to be getting good at pasm soon!

jazzed · 2011-09-27 09:31

4x5n wrote: »

That's what I was afraid of!! Allowing the individual cogs to access hub memory for method instructions independently of the hub is on my wishlist for the prop2 or prop3! :-)

Prop2 will not be any different except that 1) it will be able to access more data in a HUB cycle, and 2) it will allow more instructions between HUB accesses.

4x5n · 2011-09-27 09:58

jazzed wrote: »

Prop2 will not be any different except that 1) it will be able to access more data in a HUB cycle, and 2) it will allow more instructions between HUB accesses.

I know but I can dream! :-) It sounds like the prop2 will have some significant improvements over the prop1. I can't wait!

Mike Green · 2011-09-27 10:47

The only way to decouple cog execution from hub access for data would be to treat the hub like an I/O device. For reading, you'd have an "initiate hub read" instruction that would supply the hub address and initiate the hub operation. The hub would either put the data into a register for access later or there would be another instruction that would stall the cog until the data was available, then copy the data to some specified cog location. In the former case, there would have to be some way for the hub to signal to the cog that the data was available with a variety of possibilities there. For writing, the "initiate hub write" instruction would supply both the hub address and the data. The hub would have to have some way to signal to the cog that the operation was done and that the cog could initiate another hub operation.

Would you prefer the above scenario to the current RDxxxx / WRxxxx? If we were to have 16 cogs instead of 8, there might be a better case for decoupling the cogs from the hub a bit more since there would be more instructions between hub accesses. With only 8 cogs, it's hard to justify that since there's less that can be accomplished between hub operations.

4x5n · 2011-09-27 11:02

Mike,

I was thinking that it would be nice to bypass the hub when fetching spin commands. The user/programmer wouldn't have that ability and it wouldn't be possible to bypass in pasm. An instruction pipeline or something like it for the cogs. It causes me physical pain to think of a cog sitting doing nothing waiting to get it's next instruction to execute. :-)

4x5n · 2011-09-27 11:07

4x5n wrote: »

Mike,

I was thinking that it would be nice to bypass the hub when fetching spin commands. The user/programmer wouldn't have that ability and it wouldn't be possible to bypass in pasm. An instruction pipeline or something like it for the cogs. It causes me physical pain to think of a cog sitting doing nothing waiting to get it's next instruction to execute. :-)

For the record I understand this isn't going to happen. Just dreaming. :-)

ericball · 2011-09-27 11:28

Really, the "continue execution while waiting for the HUB" becomes a form of out-of-order execution. This requires a lot of additional transistors to create the logic to "look ahead" in the instruction stream, determine if the instruction has any dependencies on the blocked instruction(s), execute the instruction, then hold the result until the blocked instructions execute.

However, that logic exists right now - in the form of the "clever programmer", who can look at the instructions (or even the algorithm) and see that certain instructions can be executed before the HUBOP, then do the necessary re-ordering.

I don't know enough about the SPIN interpreter to say whether out-of-order HUBOP execution would have any benefit. I suspect not, but it depends on what instructions immediately follow each HUBOP.

roedj · 2011-09-27 14:56

Duane Degn wrote: »

Dan,

Spin code has to wait for the hub access since that's were the program is stored. With PASM execution of the code can continue just fine independent of the hub. PASM just needs to access the hub if it's reading or writing to hub memory.

Duane

Duane,

Thanks for the answer. I was thinking of PASM code all along. Good to see I'm not totally off the rails.

Dan

potatohead · 2011-09-27 23:44

Yeah, sorry I was unclear.

This:

hub-op
hub-op

is equal to:

hub-op

nop
nop

hub-op

;where the "nop" can be other, ordinary 4 cycle instructions, such as add, shl, etc...

, and

hub-op

nop
nop

nop

hub-op

is equal to:

hub-op

nop
nop

nop
nop

nop
nop

hub-op

, as it's the next larger window size, 6 instructions vs 2.

Many SPIN instructions require a few PASM instructions to interpret, meaning the HUB access window probably isn't all that big of a impact on SPIN program execute speed.

Time for another silly question about the propeller, spin and memory access

Comments