Providing a watchdog timer and a millisecond and microsecond clock on one cog
LoopyByteloose
Posts: 12,537
Well, the one thing that Propeller has never had is a formal hardware watchdog timer. I am trying to port Arduino code that includes this feature. And I hate to waste a whole cog on just a Watchdog timer, so the obvious thing is to combine that feature with the clock tick for milliseconds and microsecond generation. I also need those.
I am wondering if I should consider using one of the two counters on the Cog to service the Watchdog while having the other service just work through the normal cog resources. Is there anything to be gained?
And since I am a bit tight on i/o pins, I wonder how best to share the clock ticks with other cogs that are timing dependent?
Would a general scheme of using a counter for a watchdog timer allow any cog with unused counters to take on that additional task without interferring with the cogs primary tasking?
I am wondering if I should consider using one of the two counters on the Cog to service the Watchdog while having the other service just work through the normal cog resources. Is there anything to be gained?
And since I am a bit tight on i/o pins, I wonder how best to share the clock ticks with other cogs that are timing dependent?
Would a general scheme of using a counter for a watchdog timer allow any cog with unused counters to take on that additional task without interferring with the cogs primary tasking?
Comments
Maybe set up a 555 and have a Prop pin re-trigger it, and if it doesn't get re-triggered within the threshold time then the output pin of 555 would reset the prop....
Interval.h
Interval.cpp
Here's the unit test:
I need some time this weekend to finish this,
Tachyon Forth dedicates a cog to timing functions including virtual RTC and chained timers with alarm on timeouts, so it is not unusual to use these as watchdogs too. Originally I had eight count-down timers but a while ago I upgraded that to dynamically linked timers so that simply supplying the variable to the TIMEOUT function links it into the chain. So normally these timers just decrement down to zero every millisecond and there is a runtime counter too. In an app this is what it looks like:
DOUBLE mytimer --- create room for a 32-bit timer plus 16-bit link plus 16-bit alarm action
#3,000 mytimer TIMEOUT --- link mytimer and load with a 3 second timeout (3,000 ms)
' REBOOT mytimer ALARM --- set the timeout action to reboot (default = none)
I suspect not. Why add all those parts to a build when a Watchdog might be inserted withing other code for a given cog. It is really a tiny task and the part count, board space, and added cost of a 555 are not at all necessary.
Looking at the code that desired one, a 10ms cycle before reset allows plenty of time to do other things. Peter J's mention of how he does this in Tachyon is clear to me. And after a bit of thought and investigation using the CNTA is entirely unnecessary. I suppose it could be done, but nothing may be gained.
@ Martin H
Thanks for providing the code for use of Propellerino library.
This issue may actually be moot in 3D printer controllers because a freeze situation pretty much stops all steooer motion. The heaters being stuck on is the only real hazard to justify a reset to a standby condition.
+++++++++++
It would be even more wonderful to include the watchdog funcitons, and the timer functions in the same cog as something else === a hybrid solution. I doubt if it would be compatible with asynchronous serial functions, but a cog that has a synchronous serial that is infrequent might included this. I have to have SPI read an ADC about every 250ms, so I am thinking that doing all the ADC reading AND the watchdog AND providing other timer funcitons might consolidate nicely.
Of course, that gets away from just porting the Arduino code.
It is possible that with a watchdog timer run by a cog that some added code could take control of the timer or the i/o pin and it won't be working. With the hardware watchdog timer nothing can physically remove its function.
We could say it's unlikely, but that's just another way of saying it will happen. I would not want to trust my health and wealth to it.
There is also the complexity that, even if the watchdog is some external hardware device, in the Propeller we have up to 8 processes going on. Each one of those can hang itself up independently. Ideal they should all be subject to a watchdog.
Aside: I have seen this happen in interrupt driven systems where, for example, the background loop can go AWOL but everything appears to be OK because all the interrupt handlers are still working. Some time later you find out that important work that should have been done in the background was not. This can happen the other way around of course.
The watchdog needs to be tied into every process.
Is the no non-maskable interrupt on an AVR?
It's a long time since I looked at them.
Wow I haven't heard of "NMI" since I used to program the Zilog Z80 CPU.... good times..
Your PC still has an NMI.
More likely but I don't program or design hardware for my 8 core PC....
'Correct' watchdog design usually has a separate low power Oscillator, and a direct pathway to the RESET logic, once enabled.
Once enabled, they usually cannot be disabled.
The ATtiny441/841 I just looked at, does all of this, and has further Fuse options on WDOG handling.
http://www.maximintegrated.com/products/supervisors/watchdog/
Seems this is borderline RTOS stuff, no?
The main process would be running on a cog and that process would spawn everything else. As such, it would be the watchdog over the system. It would monitor all children and figure out how to deal with problems. To do so, when it launched a cog / process it would give the process a number that the process would use to indicate the process was running. The main process / cog would periodically check that each assigned process had updated its status and if the process had not updated within the period expected the main process would react accordingly.
The idea consists of the main process establishing at least one variable which could be manipulated by itself and the children. Given there are 8 cogs and the main process will run on one of those, something like a uint16_t variable could: So the WD spawns cogs, and depending on the code in a cog it could set its "WD_EN" bit and its "STATUS" bit using a logical "OR".
The WD would then see that "WD_EN" is set and would then check the "STATUS" bit for that cog and act accordingly. My thoughts would be that once "WD_EN" for that cog is set, it can't be un-set unless the WD changes it, so a cog can't change its imperative nature on its own. And the WD would "un set" any "WD_STAT" / status bit after polling it.
I've a lot more to think about but I thought I'd share my thoughts on this.
That cog would monitor tasks in the rest of the cogs, and would use one pin connected to external hardware that would act as the software watchdog watchdog. The hardware could be a simple 555 timer set up as a missing pulse detector or a watchdog chip.
With two long variables (status and enabled) there could be up to 32 watchdog timers. More would be possible by adding more variables. The enabled variable would be read from hub and or'ed to the enabled register in the cog so once set it cannot be reset by software running in the other cogs.
Each watchdog timer would be set up by assigning it a bit in the status and enable variable, and a time between status checks. How the cog goes about determining when to check each of the timers status bits is undecided right now, so I am open to suggestions. Current thinking is to use a bit mask for each time slot.
PS - No reason why this cog could not be used to provide things like clock/calender and timing pulses as well although resolution would probably be limited to the millisecond range.
Speaking of monitoring a cog, I don't know if you can do that independently of the cog itself. While you may be seeing activity in that cog via whatever it should be doing with output pins or writes to hub ram (grey area) unless you have the cog waving a I/O pin or writing a specifically generated value to a specific hub ram location, you can not be certain that a cog has not come off the rails. If I had to think seriously about a watchdog timer that would key me to looking at hardware for it. 555s are cheap enough, and one I/O pin could reset it so that it doesn't time out unless the cog goes dark.
Just my thoughts colored by lots of medical imaging systems experience.............google VCT gantry to see one of my current toys.....
Frank
That interval is quicker than I'd like to have to deal with in the main program. I want to make it autonomous, sort of like Loopy's original question about the counters. So, I have a CTRA configured to generate a one second toggle, and CTRB is configured to mask the first counter. (See object ncoburst.spin for the logic of this.) The upshot is that the program has at least 26 seconds to get around to feeding the dog. The counters are autonomous and run on their own except for the main loop updates of the PHSB register to start another 26 seconds of protection. If the 26 seconds run out, that's it. Reset.
My advice is not to over-think the watchdog itself. It should be a mechanism of last resort, and it should be unequivocal. It should not be a cover for sloppy programming. For example, methods should not have waitpxx functions that could lock up if something goes bump in the night. Those sorts of things should always be covered with timeouts. In our applications, there are several modalities of communications, modem, XBee, smart sensors, lots to go wrong. The onus is on the programmer is to anticipate the risky segments as much as possible. The parent method should know when those tasks are being executed correctly and have preplanned fallbacks.
Of course, there's the rub. Murphy, always present; firmware, perpetually updated; lurking, the flaw that locks up the system only on Shrove Tuesdays. Or there is a power brownout or ESD surge. An unexpected problem with a cold solder joint at -20°C. A wobble in the Force. That is when the watchdog kicks in. Service call avoided, event logged to status file. (Unless of course Murphy is feeling particularly ornery )
I agree that high risk situations should go with a hardware based solution but there are also good reasons for having a hardware solution in lower risk situations. One of those reasons I am encountering currently is a tendency for some systems to drop internet communications after bad weather or a power outage. This entails a drive to that location to reset the system, which is something a watchdog timer could prevent.
As far as I know there is no way to monitor a cog independently of the cog itself which is why I planned on having each task running in a cog set a bit in a hub location to indicate that the task is still running. The software watchdog would check that bit to verify it was set and then clear it for the next time around the loop. If the bit was not set it would use its output pin to make the hardware watchdog reset the propeller. If the software watchdog should fail the hardware watchdog would time out and reset the propeller. My plan was to use a 555 timer as a missing pulse detector for the hardware watchdog but there is no reason another chip could not be used.
I googled VCT scanner and watched the video of the GE CT scanner spinning at high RPM. Is that the one you meant? I'm assuming that was some sort of diagnostic test since I can't see taking an image at that speed.
BTW, I got my start in instrumentation at Nuclear Chicago/Searle way back when, working on their gamma and liquid scintillation systems before moving on to the planar and tomographic nuclear imaging systems. Also worked on some of the GE systems after Siemens bought up Searle's nuclear med division. Interesting and challenging field to work in.
Yep, that is the animal. In cardiac mode, 800Kg rotating at 2.85 RPS. And yes, it does take images at that speed so that it can do an effective stop image of the heart (64 slices and newer units will do up to 256 slices). The image acquisition is timed to the ECG trigger from the patient. The latest Siemens is even faster. Different exams require different speeds, dose and slice thicknesses. Started these last year after a few years of angiography systems for cardiac and neuro applications. Since they are old enough for the bathtub curve to start rising again, they have provided me much experience in the last year and change. Big accurate motorized positioners, voltage and tube current regulation, the works. And lots of watchdog timers overseeing the functions. I rarely run the thing at higher speeds with the cover off, but have a time or two.
Interesting about the Searle connection. That may be what must have become Siemens Gammasonics. Gammasonics was the company that made the earlier cardiac image. processing system, the Digitron and HICOR. Still have one HICOR running. In-house service engineer, I get to lay on all the OEMs stuff.
As to monitoring a cog, I called writing to a HUB ram location "grey" because if I did it that way, I would have to use a known good value to say the cog was functioning and say a non-zero value containing error information giving status. Of course if zero was last written and then the hub goes off the rails, so goes reliability. a single pin can be go/nogo. but again if the last write to the pin....... So there is always a possibility of failure even with hardware. Missing pulse is good in that it doesn't care whether the pin was high or low, but only that it changes. I guess that should provide the best reliability. But that's why we have risk analysis.
Your mention of monitoring internet connectivity was good for a flashback. Used to use big brother open source version for that and other things. Now they want big bucks for the thing........
Taking images at that speed is pretty impressive, but then again xray imaging is inherently faster than nuclear, where you have to take the time to accumulate enough decay events to produce an image.
Searle bought Nuclear Chicago, and I think the intention was to tie the sale of equipment at reduced prices to contracts for purchasing the diagnostic kits. After a few years the management looked at the profit margins for instruments (~30%) vs pharmaceuticals (~100-200%) and decided to get rid of the instrumentation business. The analytic division was sold to Tracor Instruments and the nuclear imaging to Siemens. Typical behavior for pharmaceutical companies.
As far as monitoring cogs, processes, and computers, there is no perfect 100% reliable method to do so at this time. The best we can do is try to cover as many bases as possible.