Shop OBEX P1 Docs P2 Docs Learn Events
Time for another silly question about the propeller, spin and memory access — Parallax Forums

Time for another silly question about the propeller, spin and memory access

4x5n4x5n Posts: 745
edited 2011-09-27 23:44 in Propeller 1
I had to stop my work on the propeller for a week or so do to the lack of time and when I picked it back up came across a couple of questions.

I understand that with spin the actual spin code is stored in hub memory (right term?) and are processed via an interpreter that exists in the cogs. I understand that memory access is controlled by by the hub and each cog has to wait it's turn to access the memory. My question is this. If a spin instruction finishes and doesn't have current access to hub ram what happens? Does it stop and wait until it can access the memory?

I'm also looking for a good reference on how to use the counters in the cog. I'm working my way through the PE book and have a couple other books that I got on the propeller and not a lot is said about most of the counter modes.

Comments

  • Rick_HRick_H Posts: 116
    edited 2011-09-26 21:49
    yes it waits till the cog completes its rotation for memory access.
  • potatoheadpotatohead Posts: 10,261
    edited 2011-09-26 21:56
    Hub is the right term.

    Anything running in a COG waits for it's HUB access window. A very coarse rule of thumb is this: The fastest is two instructions between hub-operations. The next fastest is six instructions, and the next after that is ten. If one can't hit the window in the shorter period of time, the default will be the next longer period of time.

    It's always the same, because of how the COG gets round robin access to the HUB. It's common for people to write code to hit those sweet spots too. Some of the techniques include:

    1. Placing instructions out of "easy" order, such that the mandatory instruction wait times are actually spent doing something. These are essentially "free" instructions, in the context of operating with HUB memory. Cost is program complexity, benefit is speed.

    2. Organizing data such that fewer HUB ops are needed. Fetching a long, vs a byte, for example. Cost is often program size and complexity, benefit is program speed.

    3. "Unrolling" loops such that the lack of program control instructions makes it possible to hit the fast window. Cost is program size, benefit is program speed.

    The SPIN intrepeter will wait like any program does. At the SPIN level, this really can't be seen on a individual instruction basis, though there are clearly faster and shorter ways to do similar things in SPIN. We've had many threads about that in the past. Ask on that, and 'ye will quite likely receive. One easy thing is to simply note the system counter, then perform a sequence of operations many times, then operating on the operation completed counter value to determine overall speed.

    Substitute instructions and do it again, or use different routines and do it again. This can help you find optimal SPIN code without a lot of difficulty.
  • roedjroedj Posts: 6
    edited 2011-09-26 23:05
    potatohead wrote: »
    Hub is the right term.

    Anything running in a COG waits for it's HUB access window.

    I.

    WOW! That just through my understanding of 'things' right out the window. I thought that instructions continued running in a COG but could only execute Main Memory access instructions during its turn. Non-Main Memory access instructions could always run.

    Please explain,

    Dan
  • Duane DegnDuane Degn Posts: 10,588
    edited 2011-09-26 23:46
    Dan,

    Spin code has to wait for the hub access since that's were the program is stored. With PASM execution of the code can continue just fine independent of the hub. PASM just needs to access the hub if it's reading or writing to hub memory.

    Duane
  • ElectricAyeElectricAye Posts: 4,561
    edited 2011-09-27 04:10
    4x5n wrote: »
    .....

    I'm also looking for a good reference on how to use the counters in the cog. I'm working my way through the PE book and have a couple other books that I got on the propeller and not a lot is said about most of the counter modes.


    For Propeller Counters, have a look at this page and download AN001. I had thought they had updated this, but maybe I'm wrong.
  • ericballericball Posts: 774
    edited 2011-09-27 04:17
    roedj wrote: »
    WOW! That just through my understanding of 'things' right out the window. I thought that instructions continued running in a COG but could only execute Main Memory access instructions during its turn. Non-Main Memory access instructions could always run.

    Each COG has 496 longs of local storage. This may be used for code or data and is accessed at full speed (typically 4 cycles per instruction). However, when accessing HUB RAM, instruction execution will pause (up to 15 cycles) until it's designated access window, which occurs every 16 cycles. (8 COGS, 2 cycles per COG = 16 cycles.) A clever PASM coder will structure their code (and sometimes data) to minimize these access delays (kinda like the branch delay slots which were common in some RISC designs).

    But at the SPIN level you don't have that much control. The SPIN interpreter is the PASM code and all of the SPIN code, data, and variables are stored in HUB RAM. So the optimizations aren't about minimizing access delays but minimizing accesses, i.e. figuring out how to express a function using the fewest SPIN tokens by making use of built-in operators.
  • 4x5n4x5n Posts: 745
    edited 2011-09-27 07:07
    For Propeller Counters, have a look at this page and download AN001. I had thought they had updated this, but maybe I'm wrong.

    Interestingly enough firefox didn't present it as a link but I was able to download it. Won't have time to look at it until later today but it looks exactly like what I'm looking for.
  • 4x5n4x5n Posts: 745
    edited 2011-09-27 08:58
    Duane Degn wrote: »
    Dan,

    Spin code has to wait for the hub access since that's were the program is stored. With PASM execution of the code can continue just fine independent of the hub. PASM just needs to access the hub if it's reading or writing to hub memory.

    Duane

    That's what I was afraid of!! Allowing the individual cogs to access hub memory for method instructions independently of the hub is on my wishlist for the prop2 or prop3! :-)

    It looks like I'm going to be getting good at pasm soon!
  • jazzedjazzed Posts: 11,803
    edited 2011-09-27 09:31
    4x5n wrote: »
    That's what I was afraid of!! Allowing the individual cogs to access hub memory for method instructions independently of the hub is on my wishlist for the prop2 or prop3! :-)
    Prop2 will not be any different except that 1) it will be able to access more data in a HUB cycle, and 2) it will allow more instructions between HUB accesses.
  • 4x5n4x5n Posts: 745
    edited 2011-09-27 09:58
    jazzed wrote: »
    Prop2 will not be any different except that 1) it will be able to access more data in a HUB cycle, and 2) it will allow more instructions between HUB accesses.

    I know but I can dream! :-) It sounds like the prop2 will have some significant improvements over the prop1. I can't wait!
  • Mike GreenMike Green Posts: 23,101
    edited 2011-09-27 10:47
    The only way to decouple cog execution from hub access for data would be to treat the hub like an I/O device. For reading, you'd have an "initiate hub read" instruction that would supply the hub address and initiate the hub operation. The hub would either put the data into a register for access later or there would be another instruction that would stall the cog until the data was available, then copy the data to some specified cog location. In the former case, there would have to be some way for the hub to signal to the cog that the data was available with a variety of possibilities there. For writing, the "initiate hub write" instruction would supply both the hub address and the data. The hub would have to have some way to signal to the cog that the operation was done and that the cog could initiate another hub operation.

    Would you prefer the above scenario to the current RDxxxx / WRxxxx? If we were to have 16 cogs instead of 8, there might be a better case for decoupling the cogs from the hub a bit more since there would be more instructions between hub accesses. With only 8 cogs, it's hard to justify that since there's less that can be accomplished between hub operations.
  • 4x5n4x5n Posts: 745
    edited 2011-09-27 11:02
    Mike,

    I was thinking that it would be nice to bypass the hub when fetching spin commands. The user/programmer wouldn't have that ability and it wouldn't be possible to bypass in pasm. An instruction pipeline or something like it for the cogs. It causes me physical pain to think of a cog sitting doing nothing waiting to get it's next instruction to execute. :-)
  • 4x5n4x5n Posts: 745
    edited 2011-09-27 11:07
    4x5n wrote: »
    Mike,

    I was thinking that it would be nice to bypass the hub when fetching spin commands. The user/programmer wouldn't have that ability and it wouldn't be possible to bypass in pasm. An instruction pipeline or something like it for the cogs. It causes me physical pain to think of a cog sitting doing nothing waiting to get it's next instruction to execute. :-)

    For the record I understand this isn't going to happen. Just dreaming. :-)
  • ericballericball Posts: 774
    edited 2011-09-27 11:28
    Really, the "continue execution while waiting for the HUB" becomes a form of out-of-order execution. This requires a lot of additional transistors to create the logic to "look ahead" in the instruction stream, determine if the instruction has any dependencies on the blocked instruction(s), execute the instruction, then hold the result until the blocked instructions execute.

    However, that logic exists right now - in the form of the "clever programmer", who can look at the instructions (or even the algorithm) and see that certain instructions can be executed before the HUBOP, then do the necessary re-ordering.

    I don't know enough about the SPIN interpreter to say whether out-of-order HUBOP execution would have any benefit. I suspect not, but it depends on what instructions immediately follow each HUBOP.
  • roedjroedj Posts: 6
    edited 2011-09-27 14:56
    Duane Degn wrote: »
    Dan,

    Spin code has to wait for the hub access since that's were the program is stored. With PASM execution of the code can continue just fine independent of the hub. PASM just needs to access the hub if it's reading or writing to hub memory.

    Duane

    Duane,

    Thanks for the answer. I was thinking of PASM code all along. Good to see I'm not totally off the rails.

    Dan
  • potatoheadpotatohead Posts: 10,261
    edited 2011-09-27 23:44
    Yeah, sorry I was unclear.

    This:

    hub-op
    hub-op

    is equal to:

    hub-op

    nop
    nop

    hub-op

    ;where the "nop" can be other, ordinary 4 cycle instructions, such as add, shl, etc...

    , and

    hub-op

    nop
    nop

    nop

    hub-op

    is equal to:

    hub-op

    nop
    nop

    nop
    nop

    nop
    nop

    hub-op

    , as it's the next larger window size, 6 instructions vs 2.

    Many SPIN instructions require a few PASM instructions to interpret, meaning the HUB access window probably isn't all that big of a impact on SPIN program execute speed.


Sign In or Register to comment.