Shop OBEX P1 Docs P2 Docs Learn Events
Trouble with concurrency/timing of multiple cogs running PASM code — Parallax Forums

Trouble with concurrency/timing of multiple cogs running PASM code

agsags Posts: 386
edited 2011-09-06 21:59 in Propeller 1
I am stuck - but it could be user error - on passing values between cogs at high speed. I would provide actual code but it is very large and I've not yet been able to create a tiny test case. However, I am now wondering if I have a fundamental misunderstanding of Prop programming using shared resources, how the Prop functions, or the proper techniques to be used.

I'm hoping that there is some seemingly insignifcant but huge issue that the experts know to avoid or how to deal with, but I don't. Or, as I said, it could be user error. (example: be sure there is at least one instruction between a mov[s|d|i] instruction and execution of the modified instruction due to pipelining/prefetch).

So... I do have a somewhat simplified test case where a "master cog" running SPIN is driving a "slave cog" running PASM as fast as it can. The slave is passed the hub address for a "command word" and it loops like this:
:getCmd  rdword myReg, par wz
       if_z  jmp #:getCmd
...process command

until a non-zero command word value is found. It then reads the 16 words following the "command" word, sets the command word value to zero, processes the 16 words then returns to looping until another non-zero command word is found.

This works really well with slow SPIN code providing the command and 16 data words (words, not longs or bytes). When assembled into a larger system, things broke, so I went back to debugging and found that if I drive the slave/PASM cog quickly, like this:
repeat
  repeat while cmdWord<>0
  cmdWord:=newValue
  newValue++

then things go very wrong. I realize that there's a lot of the implementation that is left out, and I'm working on further reduction of the test case. I'm wondering if there is some nuance that I'm unaware of, such as:

- operations that appear atomic in SPIN are not; writing a word value (not a byte) could cause one byte to be detected by the PASM loop before the second byte of the word has a valid value written
- some kind of strange pipelining of read/write access to hub memory that would allow interleaving of operations and conflict; this seems unlikely, in that the manual seems to be pretty clear that there is no contention on atomic memory operations
- I really need to use locks here, my design is flawed

Is there something tricky I should be aware of that I can check? The fact that the drivers function perfectly when they have plenty of time to finish before the next command is sent has me very suspicious of this aspect of the design. I saw this very same behavior in the precursor that has led me to this project, and I never did undertand what the problem was. Now I have no choice but to understand.

Any suggestions, ideas, design patterns, code examples, etc are very, very much appreciated. Thanks.

Comments

  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-09-06 18:51
    You realize, of course, that newValue++ eventually reaches zero, which will freeze processing -- right?

    -Phil
  • kuronekokuroneko Posts: 3,623
    edited 2011-09-06 18:55
    The following test case uses your suggested SPIN loop (drive quickly). It also adapts the PASM command fetch loop. Instead of reading the following 16 words it simply displays the command value and delays for a second. Then it indicates that the command area is ready to be refilled. And I get what I asked for, the LED bank on the demoboard shows an incrementing binary pattern (the command) with 1Hz update frequency. So the bug must hide somewhere else.
    VAR
      long  newValue
      word  cmdWord[17]
      
    PUB null
    
      cognew(@entry, @cmdWord)
    
      repeat
        repeat while cmdWord
        cmdWord:=newValue
        newValue++
        
    DAT             org     0
    
    entry           mov     dira, mask
    
    :getCmd         rdword  myReg, par wz
            if_z    jmp     #:getCmd
    
    ' read all the other data (delay instead)
    
                    mov     outa, myReg
                    rev     outa, #{32 -}8          ' display command
                    
                    rdlong  cnt, #0
                    add     cnt, cnt
                    waitcnt cnt, #0
    
                    wrword  par, par                ' indicate ready
                    jmp     #:getCmd
    
    mask            long    $00FF0000
    
    myReg           res     1
    
                    fit
                    
    DAT
    
  • kuronekokuroneko Posts: 3,623
    edited 2011-09-06 18:57
    You realize, of course, that newValue++ eventually reaches zero, which will freeze processing -- right?
    Why would you say that? Writing 0 as a command will simply skip the inner repeat loop.
  • agsags Posts: 386
    edited 2011-09-06 18:57
    Yes, that was a bad example. I was trying to simplify the problem. I am always writing out a non-zero, word-length command. In operation, I don't see freezing, I see garbage being output by the slave cog.
  • kuronekokuroneko Posts: 3,623
    edited 2011-09-06 19:05
    @ags: You do update the command word atomically? Not - as we had some time ago - first setting a base command then adding/or'ing flags ... Also, you use words, is the command word 4n aligned?
  • agsags Posts: 386
    edited 2011-09-06 19:09
    Does this work?
    VAR
    long newValue
    word cmdWord[17]
     
    PUB null
     
    cognew(@entry, @cmdWord)
     
    repeat
    repeat while cmdWord
    cmdWord:=newValue
    newValue++
     
    DAT org 0
     
    entry mov dira, mask
     
    :getCmd rdword myReg, par wz
    if_z jmp #:getCmd
     
    ' **** clear the command word here, since I have what I need from hub RAM (in myReg)
     
    wrword par, par ' indicate ready
     
    ' read all the other data (delay instead)
     
    mov outa, myReg
    rev outa, #{32 -}8 ' display command
     
    rdlong cnt, #0
    add cnt, cnt
    waitcnt cnt, #0
    jmp #:getCmd
     
    mask long $00FF0000
     
    myReg res 1
     
    fit
     
    DAT
    

    This is more like what I'm doing: my thinking is that I've already read the value I need (or actually, I read the next 16 word values, then do this) so there is no need to keep the "watchdog" command at a non-zero value. This allows the "master cog" in my design time to first load the 16 words, then set the command value <> 0 while I'm doing other processing. If I've already read the values I need, I'm looking to interleave the operations.
  • kuronekokuroneko Posts: 3,623
    edited 2011-09-06 19:17
    Sure, the delay in my example was used to indicate parameter fetching. As long as you read everything important from hub before clearing the command word it's fine, i.e. moving the wrword immediately before outa handling doesn't change behaviour.
  • agsags Posts: 386
    edited 2011-09-06 19:19
    kuroneko wrote: »
    @ags: You do update the command word atomically? Not - as we had some time ago - first setting a base command then adding/or'ing flags ... Also, you use words, is the command word 4n aligned?

    I'm not sure I understand what you are asking/saying.

    The slave cog running PASM uses rdword at the address provided by par, and if it is not zero I escape the loop waiting for a command.

    In the master cog running SPIN, I simply use an assignment operator: cmdWord := someValueNotZero.
    I'm not sure if that is atomic in SPIN, that is an area I am wondering is the problem.

    I thought that the Propeller Tool arranged VARs so that they are always properly aligned by ordering long, then word, then byte sized symbols together. The command word is actually one element of a word-sized array variable.

    If there is value in going through the exercise of changing my code to break up the command word into a command byte, and just querying one byte using rdbyte in my PASM loop, I can do that. I thought that was a long shot or I would have done it. (of course, I'll have to be sure to write the byte that I'm querying in PASM last in SPIN)

    Thanks for the response.
  • agsags Posts: 386
    edited 2011-09-06 19:23
    kuroneko wrote: »
    Sure, the delay in my example was used to indicate parameter fetching. As long as you read everything important from hub before clearing the command word it's fine, i.e. moving the wrword immediately before outa handling doesn't change behaviour.

    Yes, that's what I would expect. On the one hand, I'm glad it works as expected. On the other hand, I am concerned that I have some fundamental flaw in how I'm approaching this problem. It works at slow speed (no chance of the master cog trying to update the slave cog before it has processed the previous data set) but not a full speed (the master is always ready to slam the next dataset into the slave cog as soon as it is ready to accept it).

    Of course, that's how it seems, with my understanding of what I intend the code to do. That understanding may well be what is keeping me from really identifying what is actually happening, or what I've coded it to do...
  • kuronekokuroneko Posts: 3,623
    edited 2011-09-06 19:27
    ags wrote: »
    I'm not sure I understand what you are asking/saying.
    cmdWord := base
    cmdWord |= flags
    
    would be wrong as PASM may get in between both assignments and read base instead of base | flags. A single assignment is atomic (wrword). Just checking, we've seen this before and it was hard to track down (courtesy of Phil, IIRC) as we didn't have that part of the source.
    ags wrote: »
    I thought that the Propeller Tool arranged VARs so that they are always properly aligned by ordering long, then word, then byte sized symbols together. The command word is actually one element of a word-sized array variable.
    Yes, but consider this:
    VAR
      long  first
      word  dummy
      word  cmdWord[17]
    
    In this case cmdWord{0} is located at 4n+2 which breaks when used as par (you'll get the address of dummy instead).
  • agsags Posts: 386
    edited 2011-09-06 19:42
    cmdWord := base
    cmdWord |= flags
    

    Yes, I can see how that would fail. In between different assignments is not atomic. I'm writing the entire value of the command word at once, in one SPIN assignment operation.

    Yes, but consider this:
    VAR
    long first
    word dummy
    word cmdWord[17]
    
    In this case cmdWord{0} is located at 4n+2 which breaks when used as par (you'll get the address of dummy instead).[/QUOTE]

    Oh oh, this may be the root of my fundamental misunderstanding and the problem. I assure you I've read the pages on WR*/RD* in the Propeller Manual v1.1.pdf and that's not how I understood it (and I'm not claiming that my understanding is correct - far from it).

    I see that with a 9-bit literal value, only long-aligned addresses are possible. However, isn't par a long-word, so I thought it would contain 32 address bits (double what is needed for addressing the Prop RAM)?

    I think I need much more understanding here, can you point me at a good resource, or help with further details?

    Thanks! This may be on the path to a solution to a problem that has been vexing me for a while...
  • kuronekokuroneko Posts: 3,623
    edited 2011-09-06 19:50
    The problem(?) with par is that it has to fit into 14bit (PASM coginit parameter: 14bit par, 14bit address, 4bit cog ID handling). Which means the lower 2bits of the parameter value passed into coginit/cognew get lost (bits 31..16 as well). Which still lets you cover the whole 64K address range (RAM/ROM) but only in 4n steps.

    You can avoid the alignment issue by storing the value in a long variable and do a rdlong addr, par instead of using par directly. This way you'll get the whole 32bit value.
  • agsags Posts: 386
    edited 2011-09-06 20:03
    kuroneko wrote: »
    The problem(?) with par is that it has to fit into 14bit (PASM coginit parameter: 14bit par, 14bit address, 4bit cog ID handling). Which means the lower 2bits of the parameter value passed into coginit/cognew get lost (%-00). Which still lets you cover the whole 64K address range (RAM/ROM) but only in 4n steps.

    Oh Good Grieff!. I bet that's it. I see some discussion about that in the Manual on the PAR register page.

    So PAR contains the address of the long value that is guaranteed to contain the actual value I want, but it may not be the low byte or word, so with RDWORD/RDBYTE I may get garbage? Yikes! What techniques are used to avoid that (other than only using long values?)

    While I'm on the topic (roughly) - when using WRBYTE/WRWORD, the Manual says the long-register values in cog RAM are truncated to word and byte sizes, and zero-extended. However, does that mean that the entire long-size register value is altered? I thought about it and decided the answer must be "no". Even with this new knowledge about PAR, once in PASM-land, I can still manipulate all 16 bits of the source register, and increment it by 1 (one byte addressing - I've done this and it sure *looks* like it works) so I presume that I can precisely address and surgically alter single-byte values of hub RAM from PASM code. Is this correct?

    Thanks!!
  • kuronekokuroneko Posts: 3,623
    edited 2011-09-06 20:22
    ags wrote: »
    So PAR contains the address of the long value that is guaranteed to contain the actual value I want, but it may not be the low byte or word, so with RDWORD/RDBYTE I may get garbage?
    Let's take a step back. The 14bit limitation is a fact. Period. You can get any value into the cog you just have to know how to do it.
    cognew(@entry, user)
    
    In the SPIN example above you pass an opaque user value (doesn't have to be an address) to the PASM cog. par will contain user[15..2] in par[15..2], everything else is zero. Which means ordinary 4n aligned addresses can be passed as is. And you simply do a rdlong temp, par. If your address is not 4n aligned (or it's an arbitrary value, e.g. $12345678) then you can either align it or pass it indirectly.
    VAR
      long  storage
    
    PUB null
    
      storage := arbitrary_value
      cognew(@entry, @storage)
    
    DAT     org     0
    
    entry   rdlong  addr, par          ' get arbitrary_value into addr
            rdbyte  [COLOR="orange"]temp[/COLOR], addr         ' read e.g. from a byte aligned address
    
    ags wrote: »
    While I'm on the topic (roughly) - when using WRBYTE/WRWORD, the Manual says the long-register values in cog RAM are truncated to word and byte sizes, and zero-extended. However, does that mean that the entire long-size register value is altered?
    For wrbyte/wrword only that particular portion of the hub long is altered. Using rdbyte/rdword is where the zero-extension comes in (cog locations are longs). Which means you can construct a hub long by using 4 wrbytes but this doesn't work the other way around, a cog long can't be assembled directly using 4 rdbytes. Also, a wrbyte doesn't affect the long being written from.
    ags wrote: »
    Even with this new knowledge about PAR, once in PASM-land, I can still manipulate all 16 bits of the source register, and increment it by 1 (one byte addressing - I've done this and it sure *looks* like it works) so I presume that I can precisely address and surgically alter single-byte values of hub RAM from PASM code. Is this correct?
    par itself is read-only. While you can do an add par, #1 this will only affect its shadow register. Adjusting addresses based on par has to be done by taking a r/w copy and working with this instead, e.g.
    mov     addr, par
    rdlong  v1, addr
    add     addr, #4
    rdlong  v2, addr
    
    Normal registers can be manipulated in any way you want. And - to finally answer the question - every single byte in hub RAM can be modified.
  • agsags Posts: 386
    edited 2011-09-06 21:59
    The thrill of victory... and the agony of defeat.

    I thought I had a lead on the problem. Alas, while it has been a good education for me, this was not the problem. As it turns out, through some combination of luck and idiot savant, I always pass parameters to a PASM cog through PAR based on a long variable. That is, similar to the previous example, I establish a long variable, pass the address of that through PAR, and then read the value of that address, or the sequentially following longs. In other words, this is what I do:
    VAR
    long g_paramRoot
    long g_nextParam0
    long g_nextParam1
    ...
    PUB null
    cognew(@entryPoint, @g_paramRoot)
    DAT
    entryPoint
     
    mov tmpReg, par
     
    :loop rdlong paramReg, tmpReg wz
    if_z jmp #:loop
     
    'do something with g_paramRoot value now in paramReg
     
    add tmpReg, #4 'increment to the next param value address (longs)
    rdlong paramReg, tmpReg
     
    'do something with g_nextParam0 value now in paramReg
     
    add tmpReg, #4
    rdlong paramReg, tmpReg
     
    'do something with g_nextParam1 value now in paramReg
    ...
    tmpReg res 1
    paramReg res 1
    

    I would be suspecting a memory problem now, but the fact that the symptoms occur when processing is sped up makes me wonder.

    Ugh.
Sign In or Register to comment.