Shop OBEX P1 Docs P2 Docs Learn Events
watch dog timer — Parallax Forums

watch dog timer

Zap-oZap-o Posts: 452
edited 2011-09-14 10:36 in Propeller 1
Do any of you use a watch dog timer with the propeller? I ask cause I am reading a book and it mentions that all good engineers use a WDT with microprocessors. It states how the WDT will help with devices that are ran in noisy environments etc. I have never used one with the propeller so I ask what do you think?

Comments

  • tonyp12tonyp12 Posts: 1,951
    edited 2011-09-12 19:05
    One that monitors the power conditions or one that monitors that the code did not get stuck?
  • Zap-oZap-o Posts: 452
    edited 2011-09-13 10:30
    A watch dog timer for code.
  • LeonLeon Posts: 7,620
    edited 2011-09-13 10:50
    They are essential in many systems. A cog could be devoted to the WDT function by providing it with an independent external oscillator.

    With ARM chips the WDT is the best way to implement a software reset. One sets it for a very short time to initiate the reset.
  • Zap-oZap-o Posts: 452
    edited 2011-09-13 10:52
    Leon wrote: »
    With ARM chips, the WDT is the best way to implement a software reset.

    So your saying a propeller will never need a WDT?
  • LeonLeon Posts: 7,620
    edited 2011-09-13 10:59
    Of course it will. I suggested a way to implement it.
  • Mike GreenMike Green Posts: 23,101
    edited 2011-09-13 11:34
    Leon,
    An external oscillator is not going to help you with a WDT since the Prop uses a single system clock source for all of its cogs and the hub functions. If the system clock stops working, the whole chip stops. An errant program could change the clock mode register to something that won't work and it all would stop. You would need an independent WDT. Something as simple as a CMOS 555-type timer that's reset by the Prop would work.
  • Zap-oZap-o Posts: 452
    edited 2011-09-13 11:44
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-09-13 12:45
    The difficulty with a software watchdog timer is, "Who watches the watchdog?" But, even with an external watchdog timer, how do you make sure that every cog is running properly? You'd almost have to have a watchdog timer for each cog with a wired-OR connection to /RST.

    I think that, for the Propeller, a statistical software watchdog system, where everybody watches everybody else, would be effective. The idea is that each cog would watch all the others and do a software reset if any fail to meet their deadlines. This could be done without any pins, wherein each cog would periodically write the value of cnt to its own hub location. At the same time it would check the other seven locations to make sure each of the other cogs met its deadline. This would entail always running eight cogs, but the unused ones could idle in waitcnt without drawing excessive power.

    Of course, this is not 100.00000% reliable, but from a statistical standpoint the probability of missing errant behavior is vanishingly low.

    -Phil
  • LeonLeon Posts: 7,620
    edited 2011-09-13 12:53
    Mike Green wrote: »
    Leon,
    An external oscillator is not going to help you with a WDT since the Prop uses a single system clock source for all of its cogs and the hub functions. If the system clock stops working, the whole chip stops. An errant program could change the clock mode register to something that won't work and it all would stop. You would need an independent WDT. Something as simple as a CMOS 555-type timer that's reset by the Prop would work.

    That was the sort of thing I envisaged when I suggested an oscillator - something providing a regular pulse.
  • localrogerlocalroger Posts: 3,452
    edited 2011-09-13 15:57
    I have a great story, which was much less funny at the time, about a watchdog timer that didn't. We used to sell a scale display that was designed around 1980 using the TMS9995 CPU and the 9901 controller chip. The discrete ADC used the 9901 controller counter as the accumulator for delta-sigma ADC.

    One fine day we sold a local chemical plant a very expensive drumfilling system based on this indicator. It was explosion proof (mounted in a heavy airtight steel enclosure) and arrayed with safety devices because the chemical it was dispensing into drums was highly toxic and flammable. Of course we were quite glad that the scale had a watchdog timer, which could be noted on the schematic diagram included with the service manual. (In those days we still did a lot of component level repair.)

    Then one day it overfilled a drum, which exploded from the pressure, and there was much massing of HAZMAT gear and evacuations and other expensive results. We were called on the carpet and asked to figure out how it happened.

    As it happens there was a rare failure mode where due to bus contention or outright failure of the 9901, it would become impossible to read the lower 14 bits of the counter. In this state the 16-bit ADC would become a 2-bit ADC capable of reporting only the raw count values 0, $1000, $2000, and $3000. Otherwise the scale would work fine though. I saw this happen four or five times in the 15 years we regularly serviced this model. And needless to say, of all the possible ways it could have failed, this is how the expensive explosion proof drumfiller decided to croak.

    Once I figured out what had happened, they demanded that we GUARANTEE that this would NEVER HAPPEN AGAIN before they were willing to restart the line. Now this same scale had an output from the setpoint card for MOTION, in those days generally used to prevent BCD printers from printing if the weight wasn't stable. It occurred to me that during the process of filling a drum the weight is always changing, so I added a relay that interlocked the fill operation to the motion output; after a few seconds to get things started, if the weight stopped changing it would close the valve.

    Theyre still using it today.
  • Heater.Heater. Posts: 21,230
    edited 2011-09-14 04:39
    What is a watchdog for?

    1) You suspect your code might crash under some as yet unforeseen circumstance.
    Perhaps your design is not thorough or your testing not extensive. Or you
    don't trust your compiler or whatever.

    2) You suspect that the hardware might fail in a recoverable way. Could be a
    glitch on the power supply. Could be other noise getting in and latching things
    up. Could be that cosmic ray flipping a bit in your RAM.

    These are very much the same. The assumption is that either you want the thing
    brought to a dead halt when anything goes wrong to stop it doing more damage.
    Or perhaps you are willing to bet that a reset will start it up again in a
    sensible way.

    Either way an external watchdog device could be a good idea.

    Phil raises an interesting point. With a multi-core device like the Prop
    perhaps you need a watchdog on each core to try to be sure they are all
    operating.

    Then of course you could have a single hardware watchdog on one core
    and have that be the watch dog for functionality on all the others.

    Or perhaps one core has a hardware watchdog and the others all supervise each
    other in a chain. That might fit with the flow of processing and data in your
    application.

    I would be loath to get rid of the hardware watchdog though.

    Still there are many things that can go wrong, Local Roger has a good example
    above.

    I once saw a system where the background loop had crashed causing the
    system to fail badly. However the watchdog was oblivious because it was
    still being kicked from an interrupt routine, triggered from a timer, that was
    still running OK. Note, this is similar to having one dead COG.

    Recently a system I looked at would fail at random when nearby equipment was
    powered up even though this system was opto-isolated on all it's I/O and
    running from it's own batteries. Turned out that the parallel I/O outputs were
    getting reset to inputs by some EMI. The system had no idea this had happened
    and continued it's merry way.

    A watchdog then is a means to ensure that a system recovers from a
    temporary faulty situation or perhaps shuts itself down totally. The idea is
    that it does what you want it to do over the long haul or gives up. The system
    should not end up doing things it is not designed to do. A watch dog is a crude way of trying to build a "fault tolerant" system.

    In general though, determining all possible failure modes turns out to be
    rather hard and full of surprises, as we have seen above. It's a question of how far you want to
    pursue it.

    Ultimately you might get into the world of multiple-redundant systems with
    multiple processors and multiple power supplies etc. All working on the same
    task such that if one node fails (produces a wrong result) the others can continue correctly (perhaps shutting down or restarting the failed node)

    That turns out to be rather hard to. You might think that it is sufficient that
    3 processors work on the job in a kind of democracy. In the event of one
    failure the other working nodes have a majority vote on what to do and the right
    thing gets done. Turns out that even then it can fail. You need 4 such nodes
    to detect a failure in any one of them in some cases.

    I refer to this http://en.wikipedia.org/wiki/Byzantine_fault_tolerance be sure to check the paper on the Microsoft research site linked to from there.

    You will be pleased to hear that even the Primary Flight Computers of the fly-by-wire Boeing 777 do not meet the Byzantine Generals Criteria:)
  • ericballericball Posts: 774
    edited 2011-09-14 06:37
    Putting the watchdog reset in an ISA is really dumb IMHO as two of the primary reasons to implement a watchdog timer is to detect infinite loops and cases where the PC no longer points to code (e.g. in PASM JMP label instead of JMP #label) so the processor is executing nonsense. My two biggest worries with a watchdog timer are false triggers (because the code takes a little too long between watchdog resets) and the need for the initialization routine to be able to recover from all possible initialization states (since the watchdog reset could have occurred at any point). That being said, I don't design critical hardware like localroger.
  • Bobb FwedBobb Fwed Posts: 1,119
    edited 2011-09-14 10:29
    It seems a bit wasteful to use a discrete watchdog timer with the propeller. I could see it under some circumstances (like if all eight cogs are being used to the absolute maximum capacity). But for (I'd imagine) 99% of applications, there is at least one cog that is waiting periodically. This wasted time could be used to implement a system-wide watch dog check, or pin oscillation or whatever is needed.
    In my programming if halts are theoretically possible, I generally have each individual cog change the state of something (I've used HUB memory or pin states), and have a "master" cog checking for that change. But if you are running into halts or freezes, except in the most speed-intensive applications, is probably more about sussing out coding problems, or it's just lazing programming.
    It seems that as long as you aren't doing extremely fast communication, or the like, with a little bit of work, you could get away with not using a traditional watchdog at all on the Propeller. It has a system-wide clock that can be used as a simpler WDT when needed.
  • Heater.Heater. Posts: 21,230
    edited 2011-09-14 10:36
    Bob,
    Perhaps a cog as watchdog is sufficient in 99.999% of errant code cases. One might worry that an errant cog could kill off any other cog including the watch dog cog.
    Also it does not cater for all the odd hardware glitches that can occur as described in previous posts.
Sign In or Register to comment.