Serious security issue with ENC28J60 driver
ManAtWork
Posts: 2,178
in Propeller 1
I recently found out that the propeller can crash really awfully when used together with a ENC28J60 ethernet interface and a "D-Link" network switch (currently sold at "Media Markt", a german electronics discounter). This switch sometimes emits a garbage packet with invalid header but (seemingly) correct CRC.
The function get_frame() in driver_enc28j60.spin contains the following code
You really don't want to see what happens if this propeller does not only control a toy robot but a full size industrial robot or a cnc machine with hydraulics and powerful motors in it. Well, Smile happens... and although this took me a month to debug it is relatively easy to fix.
What I worry more about is that the propeller is somehow prone to such runaway crashes because
1) all outputs from all cogs are OR-wired and all DIR registers are also OR-wired. A single cog crashing can take over all pins and there's no way stopping it.
2) there is no (hardware) watchdog or memory protection
3) there is no invalid instruction exception or something that could stop a cog from executing garbage code
The function get_frame() in driver_enc28j60.spin contains the following code
' protect from oversized packet if pktptr AND (rxlen =< MAXFRAME) rd_block(pktptr, rxlen-4)This reliably protects from a buffer overrun if the packet size was too large. But it does not protect against undersized packets! If rxlen is between 0 and 3 it literally opens the gate to hell. rd_block() interprets negative sizes as infinity and overwrites all the hub memory with the contents of the PHY buffer. This does not crash (freeze) the propeller which was an annoying but somehow acceptable reaction to "invalid data". But instead the still running spin interpreters in some of the cogs eat the garbage and run amok in a really uncontrolled way. They randomly toggle output pins, send random characters over the serial interface and other disastrous stuff.
You really don't want to see what happens if this propeller does not only control a toy robot but a full size industrial robot or a cnc machine with hydraulics and powerful motors in it. Well, Smile happens... and although this took me a month to debug it is relatively easy to fix.
What I worry more about is that the propeller is somehow prone to such runaway crashes because
1) all outputs from all cogs are OR-wired and all DIR registers are also OR-wired. A single cog crashing can take over all pins and there's no way stopping it.
2) there is no (hardware) watchdog or memory protection
3) there is no invalid instruction exception or something that could stop a cog from executing garbage code
Comments
Oh, man, can I relate to this.....and it usually happens when you have just handed-off the equipment to the operators and your client starts to look at you sideways....been there myself Nicolas.
When I come across this pontificating about pre-written libraries, I shake my head...no thanks.
So we thought it was a thermal problem. We cooled down the boards with ice spray or heated them with a heat gun - no success. We changed boards, power supplies and everything else (except the switch!). And even more frustrating, every single board worked perfectly well on the workbench. The spooky effects came back as soon as we put the board back into that "cursed" machine.
Then we thought it was an EMI problem, some power glitch that corrupted the code while the propeller was loading it from the EEPROM. We even suspected the mains switch (a really big one) to cause glitches due to contact bounce. We tested all possible variations of voltage ramp up slopes and reset pulses. We removed all possible sources of noise in the machine (lube pump, transformers...) and cut all connections to loads. No effect. Once the controller worked it continued to work reliably no matter how hard it was tortured and how many times it was power-cycled. But leaving it untouched and powered down for several days ... bang, gone berserk on first power-on.
The enlighment came when I (rather accidentally) power cycled the network switch while desperately plugging out and in some connectors, again. It came out that a special power on delay between the controller board and the power supply of the network switch was required to provoke the fault. The exact delay was only met when both power supplies where completely discharged and cold.
Does anybody know anything about the whereabouts of the original author, Harrison Pham? The link to his web site in OBEX doesn't work.
The client has seen this thing running for more than 2-weeks but I have refused to hand it over until I am convinced that it's absolutely bullet proof.
A flaky app on a desktop is one thing but potentially lethal machinery is a whole different issue. One can't assume that an operator is going to operate one's system the way it's supposed to be.
Every bit of my code is authored by me and I spend 90%, of my time testing, testing, testing.
You and I could share many experiences, I'm sure.
So, a bad driver wouldn't be able to take down the whole system...
Mike
1. If pins are active when output low, then then a misbehaving cog can only force outputs to inactive.
2. Cog ram is protected. So the spin interpreter can keep running. Data and code are in the hub ram, vulnerable.
2, 3. Would have to be done in software.
This is unfortunate as having multiple independent cogs could provide isolation between code. To be fair, most microcontrollers don't have any memory or IO pin protection.
I'm using the prop to pre-charge an electric vehicle drive system. My solution was to write the critical sequencing code in PASM. The code lives in a separate file so it's less likely to be modified accidentally. It doesn't solve every issue.
A classic example of this: Therac 25. Although industrial and automotive malfunctions can be equally dangerous.
Why not
The version in the obex does not check pktptr. Nor does it do the -4. Is there another version somewhere else?
It looks to me that this bug was from an optimization to avoid transferring the 4 byte FCS at the end of the packet.
His website looks blank now. May he still has email running:
Harrison Pham <harrison@harrisonpham.com>