Can a cog suddenly hang?
Ronny D'Hoore
Posts: 6
Hi all,
I'm struggling with a project which detects and times pulses on input pins in one cog, and then reports them to a host pc through a serial connection in another cog, along with their timestamps. The timing is in milliseconds, and I'm keeping track of milliseconds in a third cog, as an object used by the main program (running in the first two cogs).
Now something weird happened. After running fine for over one day, suddenly the clock stopped ticking... My program, which calls Tickcount.Get (which returns the number of milliseconds have passed since the program started) in the first two cogs, suddenly kept getting the same value, 121263066 (indicating the program had been running for over 33 hours). This made me quite worried. I don't see any cause in the source why it would stop at that value. And when I changed it to start at 121260000 instead of at zero, just to see what would happen, it ticked happily past the previous "sticky" value. Also, I don't call Tickcount.Stop anywhere, so I can't have stopped it from my program.
Therefore, I can only conclude that somehow the third cog had halted / crashed... while the first two happily kept running fine (I still had a serial connection, and it still sensed pulses, but reported them all with the same time stamp).
Is that possible? Has anyone ever had such experience?
I'm attaching here the object I made to keep track of milliseconds, in case anyone might think the problem must be there
Ronny
I'm struggling with a project which detects and times pulses on input pins in one cog, and then reports them to a host pc through a serial connection in another cog, along with their timestamps. The timing is in milliseconds, and I'm keeping track of milliseconds in a third cog, as an object used by the main program (running in the first two cogs).
Now something weird happened. After running fine for over one day, suddenly the clock stopped ticking... My program, which calls Tickcount.Get (which returns the number of milliseconds have passed since the program started) in the first two cogs, suddenly kept getting the same value, 121263066 (indicating the program had been running for over 33 hours). This made me quite worried. I don't see any cause in the source why it would stop at that value. And when I changed it to start at 121260000 instead of at zero, just to see what would happen, it ticked happily past the previous "sticky" value. Also, I don't call Tickcount.Stop anywhere, so I can't have stopped it from my program.
Therefore, I can only conclude that somehow the third cog had halted / crashed... while the first two happily kept running fine (I still had a serial connection, and it still sensed pulses, but reported them all with the same time stamp).
Is that possible? Has anyone ever had such experience?
I'm attaching here the object I made to keep track of milliseconds, in case anyone might think the problem must be there
Ronny
Comments
I can't see anything wrong in this code but what is your other code? It sounds like it could be a problem with the stack.
I can visualise a situation where nextcnt has increased to $7FFF_FFFF ( or close to it, a large positive value ) but by the time you come to this test, CNT has passed $7FFF_FFFF, become negative, and there will be a long delay before the test passes as true.
I cannot remember the exact workround ... "if (cnt - nextcnt) => 0" ?
Alternatively, you could re-implement and simplify UpdTickcount ...
If you really wanted to synchronise your timing to PUB Start rather than when UpdTickCount first runs ...
Here is the main program. It's still in full development...
Could I have corrupted the stack of the Tickcount object somehow from my main program?
Hippy,
What you are saying makes sense [noparse]:)[/noparse] I thought it was safer not to rely on waitcnt, as one can miss the exact clock cycle, but probably the way I was doing it, I have a much bigger chance to miss it.
But the "long delay" you mention would be around 28 seconds approx. with a clock speed of 80 MHz, as far as I can see. While I have certainly waited much longer to see if my tickcount would spring back to life... So my problem is still not solved. But I wil use your first suggestion for UpdTickcount, thanks!
Thanks to both of you for your prompt help!
Ronny
I had to think about that, but no, it could potentially jam up forever. If nextcnt does reach $7FFF_FFFF, you would have to hit the IF at exactly the right CNT time to satisfy the condition, one CNT too low and it fails, one CNT too high and it fails.
In practice there's a range of nextcnt's which could jam up in the same way, this example is the extreme case.
1. If something else changes the stack (i.e. out of bounds array) or the stack gets too big all kinds of bad things will happen. This could stop the cog causing it to hang.
2. In Hippy's example the cog will not hang but it could take a very, very long time to get back to the $7fff_ffff. Probably a similar amount of time to what it took to generate it.
Ronny, I probably won't get a chance to look at the rest of you code today but I'll try and look at it tomorrow.
Thanks very much, but you don't need to spend time on it, since you all agree the problem could have been my tickcount, then it should be solved now. If it would ever happen again, you will certainly hear from me [noparse]:)[/noparse]
Thanks very much to all of you!
Ronny
or even; (update)
James
Post Edited (Javalin) : 4/1/2008 12:45:41 PM GMT
This usually results in a deadlock in most systems. Does wrlong ... protect against this?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley
Traffic is slow at times, but Parallax orders·always get here fast 8)
Not possible, each cog gets a 1/8'th timeslot to access the hub - if you read the manual it explains. A HUB access always completes before the next cog gets access.
J
My favorite hang command is "stop" which doesn't get highlighted in spin.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley
Traffic is slow at times, but Parallax orders·always get here fast 8)