Propeller doesn't boot when hot
ManAtWork
Posts: 2,178
Hello,
I just discovered a strange problem with the last batch of my propeller powered stepper drivers. About 10% of them hang at the boot procedure when powered up in hot (>60°C) state. When I let them cool down and then cycle power they come back to live normally. Also, when I boot the same drivers cold and let them heat up afterwards I can run them for hours at their max temperature (80°C) without problems. So it's only booting that fails.
I checked the EEPROM SCL and SDA lines. There is ~150ms silence then ~400ms activity. The crystal oscillator starts up normally (4.91MHz), so at least the XTAL/PLL config word is loaded correctly. Then there is no more activity on any pin. Supply voltage is rock solid at 3.3V. The board is a four layer full SMT multilayer. Every critical component (propeller, crystal, decoupling caps) is located within an area of <1in². The EEPROM is an Atmel AT24LC256.
Has anybody an idea what is going wrong? The propeller is rated for 125°C max ambient and the EEPROM for at least 70°C if I remember correctly. Next thing I'll check tomorry is that I'll exchange the EEPROM to se if that's the fault or if the propeller is to blame.
I just discovered a strange problem with the last batch of my propeller powered stepper drivers. About 10% of them hang at the boot procedure when powered up in hot (>60°C) state. When I let them cool down and then cycle power they come back to live normally. Also, when I boot the same drivers cold and let them heat up afterwards I can run them for hours at their max temperature (80°C) without problems. So it's only booting that fails.
I checked the EEPROM SCL and SDA lines. There is ~150ms silence then ~400ms activity. The crystal oscillator starts up normally (4.91MHz), so at least the XTAL/PLL config word is loaded correctly. Then there is no more activity on any pin. Supply voltage is rock solid at 3.3V. The board is a four layer full SMT multilayer. Every critical component (propeller, crystal, decoupling caps) is located within an area of <1in². The EEPROM is an Atmel AT24LC256.
Has anybody an idea what is going wrong? The propeller is rated for 125°C max ambient and the EEPROM for at least 70°C if I remember correctly. Next thing I'll check tomorry is that I'll exchange the EEPROM to se if that's the fault or if the propeller is to blame.
Comments
-Phil
I have never had any Propeller that have (>60°C) in run state RUN EVEN with frequencies on Crystal 14.318.180MHz PLL8.
My thinking are (Sorry for that) Layouts problems on Yours PCB's
sorry for the misunderstanding. Of course, the propeller itself doesn't heat up that much. The whole device is a high power stepper motor driver with large MOSFETs and a heatsink below the PCB. The power stage delivers 10A @ 80V and this is the reason why this beast is getting quite hot. To clarify I attached a picture.
And no, this is not a PCB layout problem. As I said, this is a 4 layer board. It has a solid signal ground plane on the left and a solid power ground plane on the right side which share only one single star point. There is also a solid 3.3V power plane and multiple 4.7uF and 0.1uF decoupling caps. Crystal and EEPROM are as close as possible to the propeller. The power stage is disabled during start-up with an external flipflop and only enabled after booting. All external signals are isolated with optocouplers so I'm sure there is no noise during booting.
Hello Mike and Phil,
thanks for the comment. But as I understand the propeller data sheet, having a 4.91MHz crystal and running the x16 PLL mode is fully legal for a range from -55 to +125°C junction temperature. The propeller draws less than 50mA at 3.3V, so assuming <100K/W thermal resistance, junction temperature should be well below 125°C (80°C + 0.165W*100K/W = 96.5°C). Or am I wrong somewhere?
Best reagrds
As I wrote in many of my threads/posts on decoupling PROPELLER.
In most situations needs at least 10uF-Tantalum Bulk Capacitor on Voltage pins near XTal. Optimal 33uF with fast recovery that can compensate Voltage drops on XTal internal driving system that comes from START/STOP/RESTART COG's and PIN output stages that are Power consuming devices with fast Power change.
Would you be willing to tell us a little about your stepper driver? Is the Propeller just acting as a step translator or does it also perform some other auxilliary functions?
Bruce
I had no real luck with the scope. I can't find a difference in the timing between hot and cold. The rising edges on the data line are quite slow (~1µs) but that should be ok for I²C operation. The propeller outputs aprox. 250kHz on SCL. The bi-directional data flow makes it somewhat difficult to analyze the timing. Because of the start/stop bits there are state changes at half the clock duration.
However, the symptoms are clear:
* added a 22µF tantalum and a 330µF aluminium low-ESR cap - no change
* power on with finger on the EEPROM (cooling to ~40°C) - successful boot
* frost spray to the propeller - problems persist
* re-programmed the EEPROM - problems persist
* exchanged EEPROM with new one - problems persist ??? :-(
Especially the last one makes me worry. Looks like there is something wrong with the PCB (micro cracks?) or the propeller timing in general, so that the EEPROM is just at its limit and fails when hot.
From picture You send ----> Very nice device
-Phil
the propeller performs all tasks of the stepper driver except for the safety functions (power stage disable if over/undervoltage or short circuit detected). Two cogs manage the current control loops (PWM), two generate the reference signal wave (interpolated sine/cosine wave), one processes the step dir input signals and the main cog (spin) takes care of the user interface (DIP switches, status LEDs, error handling etc.). Additionally there's an ADC (simple sawtooth comparator) that measures motor load for resonance damping.
Are you sure about the part number on that EEPROM? Atmel does not list an "LC" version.
-Phil
I just added a 2k2 pullup to the SDA line. Now, the propeller doesn't boot at all, even when cold.
Now I'm really confused! I removed the pullup and the booting is successful again, so I didn't damage anything.
If it was just one unit I'd say I zapped the pin PA29 output driver with static discharge or something which would then be too weak to pull SDA low reliably. But I think the chance is rather low that I managed to zap 5 devices in the exact same way. (5 out of 50 failed) What makes me also wonder is that I can't see a difference with the scope. If the pin was damaged statically I should see a rised logical low level.
Sorry about that. I looked it up and the marking on the part is
ATMLH930
2EB 1
Z9G0314A
"930" is the date code, september 2009
The data sheet for AT24C256C claims that the marking should be 2EC instead of 2EB. So this might be an older version. I bought it at a trustworthy european distributor (Arrow), so I hope these are no chinese fakes. I have to look up the bill for the exact type description.
-Phil
I use 4.7K pull ups typically ... 3.3K would work. 2.2K might be too strong.
The rise time is the key factor and could be influenced by your finger capacitance.
As Phil suggests, a new EEPROM is worth trying at some point.
If I had money to buy that one that fail I have be tested that intensively With all my experiences on PROPELLER over clocking It is good chance I can find what problem them have.
BUT as I can only see pictures without SCH and how all parts are placed in conjunction to others I can only have ides what happens.
BUT my mistake are still --- Missing in some place on PCB Some Flatbeds else some correctly dimensioned Capacitor Values --- And as if You use 10%-20% tolerance that can on some build PCB's give big differences from PCB to PCB
Have good look in finding problems
Sounds like a nice driver. Do you have any plans to start marketing these drivers? Anyhow, I hope you determine the source of your problems. Good luck with it.
Bruce
I already do. I sold ~200 of them last year and 300 more of a similar type but with 3 motor outputs, also propeller powered. Of course, if somebody would do serious marketing I could probably sell 1000s. But it would be a good idea to fix the problems first. ;-)
Hello Jazzed,
I normally use 10k. I added the 2k2 in parallel. If I add 2k2 to a good board it doesn't hang. Now I've put the 2k2 resistor back to the bad drive and ... it doesn't hang, too. Even more confusion :-(
Hello Sepieha,
sorry, I don't understand what you mean. I think overclocking has nothing to do with my problem. Cog startup, power stage driving, output pin enabling, all that happens after the hub memory is loaded from EEPROM. Once booted there is no problem at all even if the board gets really hot, even if there are massive spikes from shorting the motor leads. If my layout was bad I should have problems when the device is active.
In time I experimented with overclocking I had many similar problems You describe BUT if You say You dont understand -- I think I not have much to say more
SORRY
just to be sure I tried to replace the propeller chip. It didn't help, the problem sticks to the board. I think I have to try out a different brand of EEPROM. Unfortunatelly, the only different one I have at the moment is on the demo board and that has an TSSOP package that doesn't fit. I'll have to order some.
Sapieha, don't understand me wrong. If I don't believe you it doesn't mean that you're wrong. I'll also try out a different crystal (4.0 MHz maybe). An interesting thing to mention is that I have manage to re-animate a bad board by hitting one of the crystal pins with the scope tip. But that happened only once and I couldn't reproduce it.
For today I have to give up. I'll be back on Friday.
Thanks for your help
-Phil
Some more thoughts... Is there a possibility of reading/verifying the EEPROM with the propeller tool without re-writing it? I could do a verify (compare against known good image) in the hot state. Is there a tool to read out the hub RAM after a crash?
- Stuff expands
- caps de-rate
- Internal clocks run fast
- Vreg's turn off
It's hard to say without a schematic (and that might not help, anyway), but my list would be either something mechanical (something expands and opens under heat), something on the power stage, or something with the xtal. What is the Vreg rated for and how hot does it get? What are the specs for the xtal? Does the Prop work when hot without using the xtal? What's the Vin of the Vreg? boards were e-tested before assembly, right?
I've never had a problem when shorting a propeller output pin. My hunch is something wrong with the Vreg that causes it to drop out when the xtal switches on, or an out of spec xtal.
But I found out something different. When I touch one of the crystal pins with the probe tip of my multimeter the voltage changes from 0.3V (non oscillating) to around 1.5V (oscillating). When I hold the tip on the crystal all the time while switching power on, any of the bad boards boots sucessfully and is working perfectly even when I remove the tip later. So it seems to be a crystal problem. As the crystal is located directly next to the EEPROM, cooling the EEPROM also affected the crystal.
I don't use real quartz crystals but ceramic resonators from Murata (CSTCC 4.91MHz). Maybe they need a different load capacity. Or they require some noise to start up properly. I like the ceramic crystals because they are smaller than cheap standard crystals at 5MHz. From my experience they are much more forgiving than standard quartz crystals and start up more quickly even with only a 74HC00 gate as "amplifier". I never had problems with them and sold 1000s of Altera and AVR based products with ceramic crystals. But the propeller seems to be different. I tried different clock modes (XTAL1 and XTAL3) but it makes no difference.
I think that there have been discussions about suitable crystals on this forum.
I just exchanged 3 of the crystals on the bad boards with new ones. They immediately worked without problems. The bad ones need remarkably longer to startup oszillation (200..500µs) whereas the good ones startup <150µs. And I had a very strange phenomenon, again. One of the bad boards magicaly cured after pressing the probe tip several times on the pins. Maybe it has to do with mechanical stress (sometimes due to thermal expansion)?