Shop OBEX P1 Docs P2 Docs Learn Events
Can a cog suddenly hang? — Parallax Forums

Can a cog suddenly hang?

Ronny D'HooreRonny D'Hoore Posts: 6
edited 2008-04-01 16:51 in Propeller 1
Hi all,

I'm struggling with a project which detects and times pulses on input pins in one cog, and then reports them to a host pc through a serial connection in another cog, along with their timestamps. The timing is in milliseconds, and I'm keeping track of milliseconds in a third cog, as an object used by the main program (running in the first two cogs).

Now something weird happened. After running fine for over one day, suddenly the clock stopped ticking... My program, which calls Tickcount.Get (which returns the number of milliseconds have passed since the program started) in the first two cogs, suddenly kept getting the same value, 121263066 (indicating the program had been running for over 33 hours). This made me quite worried. I don't see any cause in the source why it would stop at that value. And when I changed it to start at 121260000 instead of at zero, just to see what would happen, it ticked happily past the previous "sticky" value. Also, I don't call Tickcount.Stop anywhere, so I can't have stopped it from my program.

Therefore, I can only conclude that somehow the third cog had halted / crashed... while the first two happily kept running fine (I still had a serial connection, and it still sensed pulses, but reported them all with the same time stamp).

Is that possible? Has anyone ever had such experience?

I'm attaching here the object I made to keep track of milliseconds, in case anyone might think the problem must be there

Ronny

Comments

  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-03-31 10:25
    It is only possible for a cog to stop if you tell it to. However, it is possible to do this unintentionally smile.gif.

    I can't see anything wrong in this code but what is your other code? It sounds like it could be a problem with the stack.
  • hippyhippy Posts: 1,981
    edited 2008-03-31 10:31
    if (cnt => nextcnt)

    I can visualise a situation where nextcnt has increased to $7FFF_FFFF ( or close to it, a large positive value ) but by the time you come to this test, CNT has passed $7FFF_FFFF, become negative, and there will be a long delay before the test passes as true.

    I cannot remember the exact workround ... "if (cnt - nextcnt) => 0" ?

    Alternatively, you could re-implement and simplify UpdTickcount ...

    PRI UpdTickcount | msec
      msec := CNT
      repeat
        waitcnt( msec += CLKFREQ / 1000 )
        ++tickCount
    
    



    If you really wanted to synchronise your timing to PUB Start rather than when UpdTickCount first runs ...

    PUB Start : okay
      Tickcount := 0
      okay := cogon := (cog := CogNew(UpdTickcount(CNT), @Stack)) > 0
    
    PRI UpdTickcount( msec )
      repeat
        waitcnt( msec += CLKFREQ / 1000 )
        ++tickCount
    
    
  • Ronny D'HooreRonny D'Hoore Posts: 6
    edited 2008-03-31 11:18
    Steven,
    Here is the main program. It's still in full development...
    Could I have corrupted the stack of the Tickcount object somehow from my main program?

    Hippy,
    What you are saying makes sense [noparse]:)[/noparse] I thought it was safer not to rely on waitcnt, as one can miss the exact clock cycle, but probably the way I was doing it, I have a much bigger chance to miss it.

    But the "long delay" you mention would be around 28 seconds approx. with a clock speed of 80 MHz, as far as I can see. While I have certainly waited much longer to see if my tickcount would spring back to life... So my problem is still not solved. But I wil use your first suggestion for UpdTickcount, thanks!

    Thanks to both of you for your prompt help!

    Ronny
  • hippyhippy Posts: 1,981
    edited 2008-03-31 12:29
    Ronny D'Hoore said...
    But the "long delay" you mention would be around 28 seconds approx. with a clock speed of 80 MHz, as far as I can see.

    I had to think about that, but no, it could potentially jam up forever. If nextcnt does reach $7FFF_FFFF, you would have to hit the IF at exactly the right CNT time to satisfy the condition, one CNT too low and it fails, one CNT too high and it fails.

    In practice there's a range of nextcnt's which could jam up in the same way, this example is the extreme case.
  • AleAle Posts: 2,363
    edited 2008-03-31 20:27
    I think you can hang a cog if you access a memory location outside the memory area, I do not remember if is with rdlong or wrlong. That was when I just started with the prop, so maybe I am mistaken...
  • Mike GreenMike Green Posts: 23,101
    edited 2008-03-31 20:32
    You can't hang a cog except with a WAITPNE or WAITPEQ with a condition that never occurs or with a WAITVID where the video generator isn't running. WAITCNT will eventually be successful. RDxxxx and WRxxxx will ignore invalid address bits and can't hang.
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-04-01 05:48
    I agree Mike but a couple of points
    1. If something else changes the stack (i.e. out of bounds array) or the stack gets too big all kinds of bad things will happen. This could stop the cog causing it to hang.
    2. In Hippy's example the cog will not hang but it could take a very, very long time to get back to the $7fff_ffff. Probably a similar amount of time to what it took to generate it.

    Ronny, I probably won't get a chance to look at the rest of you code today but I'll try and look at it tomorrow.
  • Ronny D'HooreRonny D'Hoore Posts: 6
    edited 2008-04-01 11:28
    Steven,

    Thanks very much, but you don't need to spend time on it, since you all agree the problem could have been my tickcount, then it should be solved now. If it would ever happen again, you will certainly hear from me [noparse]:)[/noparse]

    Thanks very much to all of you!

    Ronny
  • JavalinJavalin Posts: 892
    edited 2008-04-01 12:32
    You can get the appearance of a hang - for example if you forget the @ on addressing - and you're code goes off somewhere else.... Also i've found that my code (obv.) does weird things when you get >= and => (one is assignment) operators confused in an IF
    PRI UpdTickcount | msec, nextCnt
        nextcnt := (clkfreq / 1000) + cnt
        msec := clkfreq / 1000
        repeat
            if cnt-nextCnt > mSec
                tickCount++                    
                nextcnt := cnt + blinkDelay
    

    or even; (update)

    PRI UpdTickcount | msec, nextCnt
        nextcnt := cnt
        msec := clkfreq / 1000
        repeat
            waitcnt(nextCnt += mSec)
            tickCount++
    
    


    James

    Post Edited (Javalin) : 4/1/2008 12:45:41 PM GMT
  • jazzedjazzed Posts: 11,803
    edited 2008-04-01 14:57
    What if two cogs try to write the same memory address at the same time?
    This usually results in a deadlock in most systems. Does wrlong ... protect against this?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley

    Traffic is slow at times, but Parallax orders·always get here fast 8)
  • JavalinJavalin Posts: 892
    edited 2008-04-01 15:38
    Jazzed,

    Not possible, each cog gets a 1/8'th timeslot to access the hub - if you read the manual it explains. A HUB access always completes before the next cog gets access.

    J
  • jazzedjazzed Posts: 11,803
    edited 2008-04-01 16:51
    Never did rtfm [noparse]:)[/noparse]
    My favorite hang command is "stop" which doesn't get highlighted in spin.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley

    Traffic is slow at times, but Parallax orders·always get here fast 8)
Sign In or Register to comment.