Time for another silly question about the propeller, spin and memory access
4x5n
Posts: 745
I had to stop my work on the propeller for a week or so do to the lack of time and when I picked it back up came across a couple of questions.
I understand that with spin the actual spin code is stored in hub memory (right term?) and are processed via an interpreter that exists in the cogs. I understand that memory access is controlled by by the hub and each cog has to wait it's turn to access the memory. My question is this. If a spin instruction finishes and doesn't have current access to hub ram what happens? Does it stop and wait until it can access the memory?
I'm also looking for a good reference on how to use the counters in the cog. I'm working my way through the PE book and have a couple other books that I got on the propeller and not a lot is said about most of the counter modes.
I understand that with spin the actual spin code is stored in hub memory (right term?) and are processed via an interpreter that exists in the cogs. I understand that memory access is controlled by by the hub and each cog has to wait it's turn to access the memory. My question is this. If a spin instruction finishes and doesn't have current access to hub ram what happens? Does it stop and wait until it can access the memory?
I'm also looking for a good reference on how to use the counters in the cog. I'm working my way through the PE book and have a couple other books that I got on the propeller and not a lot is said about most of the counter modes.
Comments
Anything running in a COG waits for it's HUB access window. A very coarse rule of thumb is this: The fastest is two instructions between hub-operations. The next fastest is six instructions, and the next after that is ten. If one can't hit the window in the shorter period of time, the default will be the next longer period of time.
It's always the same, because of how the COG gets round robin access to the HUB. It's common for people to write code to hit those sweet spots too. Some of the techniques include:
1. Placing instructions out of "easy" order, such that the mandatory instruction wait times are actually spent doing something. These are essentially "free" instructions, in the context of operating with HUB memory. Cost is program complexity, benefit is speed.
2. Organizing data such that fewer HUB ops are needed. Fetching a long, vs a byte, for example. Cost is often program size and complexity, benefit is program speed.
3. "Unrolling" loops such that the lack of program control instructions makes it possible to hit the fast window. Cost is program size, benefit is program speed.
The SPIN intrepeter will wait like any program does. At the SPIN level, this really can't be seen on a individual instruction basis, though there are clearly faster and shorter ways to do similar things in SPIN. We've had many threads about that in the past. Ask on that, and 'ye will quite likely receive. One easy thing is to simply note the system counter, then perform a sequence of operations many times, then operating on the operation completed counter value to determine overall speed.
Substitute instructions and do it again, or use different routines and do it again. This can help you find optimal SPIN code without a lot of difficulty.
WOW! That just through my understanding of 'things' right out the window. I thought that instructions continued running in a COG but could only execute Main Memory access instructions during its turn. Non-Main Memory access instructions could always run.
Please explain,
Dan
Spin code has to wait for the hub access since that's were the program is stored. With PASM execution of the code can continue just fine independent of the hub. PASM just needs to access the hub if it's reading or writing to hub memory.
Duane
For Propeller Counters, have a look at this page and download AN001. I had thought they had updated this, but maybe I'm wrong.
Each COG has 496 longs of local storage. This may be used for code or data and is accessed at full speed (typically 4 cycles per instruction). However, when accessing HUB RAM, instruction execution will pause (up to 15 cycles) until it's designated access window, which occurs every 16 cycles. (8 COGS, 2 cycles per COG = 16 cycles.) A clever PASM coder will structure their code (and sometimes data) to minimize these access delays (kinda like the branch delay slots which were common in some RISC designs).
But at the SPIN level you don't have that much control. The SPIN interpreter is the PASM code and all of the SPIN code, data, and variables are stored in HUB RAM. So the optimizations aren't about minimizing access delays but minimizing accesses, i.e. figuring out how to express a function using the fewest SPIN tokens by making use of built-in operators.
Interestingly enough firefox didn't present it as a link but I was able to download it. Won't have time to look at it until later today but it looks exactly like what I'm looking for.
That's what I was afraid of!! Allowing the individual cogs to access hub memory for method instructions independently of the hub is on my wishlist for the prop2 or prop3! :-)
It looks like I'm going to be getting good at pasm soon!
I know but I can dream! :-) It sounds like the prop2 will have some significant improvements over the prop1. I can't wait!
Would you prefer the above scenario to the current RDxxxx / WRxxxx? If we were to have 16 cogs instead of 8, there might be a better case for decoupling the cogs from the hub a bit more since there would be more instructions between hub accesses. With only 8 cogs, it's hard to justify that since there's less that can be accomplished between hub operations.
I was thinking that it would be nice to bypass the hub when fetching spin commands. The user/programmer wouldn't have that ability and it wouldn't be possible to bypass in pasm. An instruction pipeline or something like it for the cogs. It causes me physical pain to think of a cog sitting doing nothing waiting to get it's next instruction to execute. :-)
For the record I understand this isn't going to happen. Just dreaming. :-)
However, that logic exists right now - in the form of the "clever programmer", who can look at the instructions (or even the algorithm) and see that certain instructions can be executed before the HUBOP, then do the necessary re-ordering.
I don't know enough about the SPIN interpreter to say whether out-of-order HUBOP execution would have any benefit. I suspect not, but it depends on what instructions immediately follow each HUBOP.
Duane,
Thanks for the answer. I was thinking of PASM code all along. Good to see I'm not totally off the rails.
Dan
This:
hub-op
hub-op
is equal to:
hub-op
nop
nop
hub-op
;where the "nop" can be other, ordinary 4 cycle instructions, such as add, shl, etc...
, and
hub-op
nop
nop
nop
hub-op
is equal to:
hub-op
nop
nop
nop
nop
nop
nop
hub-op
, as it's the next larger window size, 6 instructions vs 2.
Many SPIN instructions require a few PASM instructions to interpret, meaning the HUB access window probably isn't all that big of a impact on SPIN program execute speed.