Confirmed. This is what I have so far:
- Following a bad PLL setting
- with pin#62 (Tx) driven, either high or low
- and Prop2 is DTR reset
- the subsequent program download has a small chance of failing to complete due to the USB bus shutting down every device on the bus at the end of that download.
EDIT: In addition, although a USB shutdown (lock-up) can occur without a preceding port open error, the intermittent singular port open errors always foretell an impending USB shutdown. They are clearly another symptom. And part of this symptom is it always has a bump in the PC-USB 5 volt trace.
Does that mean a floating P62(tx) does not see any issues ?
These tests means the state of P62 across the Crazy-PLL-Lock time, right ?
So the previous download completes, then your test code runs from that download, but in doing so it disturbs the FTDI part enough, that the NEXT download fails ?
Does that mean a floating P62(tx) does not see any issues ?
These tests means the state of P62 across the Crazy-PLL-Lock time, right ?
So the previous download completes, then your test code runs from that download, but in doing so it disturbs the FTDI part enough, that the NEXT download fails ?
Yes.
No, just the pin drive at reset seems to matter. I've moved the pin float code to before the the HUBSET.
Yes.
No, just the pin drive at reset seems to matter. I've moved the pin float code to before the the HUBSET.
Hmm, so the P62 setting, at the moment DTR reset the prop matters, but after download it does not care ?
Does the FTDI part handshake during the download ?
Hmm, so the P62 setting, at the moment DTR reset the prop matters, but after download it does not care ?
Does the FTDI part handshake during the download ?
Well, when the Prop2 is reset the pin gets reset too. For the pin, it's a new sequence from then on. Only the FTDI chip can hold a state beyond the DTR pulse.
Here's a cleaner look at the power rails. This one is the "singular port open fail" case:
Blue trace hasn't changed other than equalising its filter.
Green trace is scaled up to 100 mA/div and 0 mA is right at bottom of display.
PC-USB 5 Volt rail is now on the Orange trace (this probe is missing its ground clip so may exhibit extra overshoot).
Pink trace shows 3v3 VIO switcher rail. It is same 200 mV/div but 3.3 Volts is at middle of display instead of two div from top.
Note the voltage bump on PC-USB occurs following the download of the cycle prior to port open failure.
At the same time there is a slight dip, maybe 10 mV, in the benchtop voltage without any measurable increase in its current. Which can probably only make sense by having a change in current in the ground leads of the scope. The pink trace has the same step down and back up over the duration of the PC-USB voltage bump.
Here's a compare of two identically aligned snapshots: First one is of regular non-faulting sequence. Second one is that same bump above. By flipping between them you can compare where there is deviation.
.. probably only make sense by having a change in current in the ground leads of the scope.
This would be the case simply by having a higher load on the PC-USB ground wire, which is probably to be expected from a raised 5 Volt supply voltage like that.
Here's a cleaner look at the power rails. This one is the "singular port open fail" case:
PC-USB 5 Volt rail is now on the Orange trace (this probe is missing its ground clip so may exhibit extra overshoot).
So the 5V pulse rises some ms after download completes, was that with an OK reply ?
What makes it then go low - some PC retry ?
Does this "singular port open fail" case eventually recover, without cable-removal intervention, as the trace seems to resume ok ?
Why are there two current levels during run ? Is that two PLL settings ? One is flat, one has a slight ramp ?
I'm taking the 3 narrow pulses going down, just visible on the zoomed plot as reset-signatures at start of download, then the big-spike tags the PLL/VCO lock moment.
Of course it recovers, it wouldn't be singular otherwise. The actual open fail is during that subsequent gap in cycling. Not when the bump occurs. And the cause of the bump will be the crazy PLL action from the cycle before the bump. The resumption must be due to some automatic recovery in the USB system.
When the USB bus does shutdown for good it never produces that bump. But the shutdown looks to be at the same phase in the cycle as that bump.
The ramp is the stairs I've mentioned. There's three pieces and it depends on the sysclock rate as to how pronounced the stairs look. The last ramping piece is when I/O clocking gets set. In this case I'm using 100 MHz.
The later higher and longer current level is when all eight cogs are running and also the cordic is 90%+. These get started after the crazy PLL operation.
I'm taking the 3 narrow pulses going down, just visible on the zoomed plot as reset-signatures at start of download, then the big-spike tags the PLL/VCO lock moment.
Correct. That's the normal boot up leading into three stage pin testing code (the stairs) that is still present at the start of the test code.
Here's some more with pin#62 (Pink trace) and pin#63 (Blue trace) included. Of note is there is no handshake. Data is all one way for the download.
A whole normal case:
Last four bytes of download at a normal case: (Notable attenuation forming at 2 Mbps. Will be the 3k9 resistor.)
Download burst at a normal case: (Maybe 3 ms data continuation after PC-USB 5 Volt returns to normal. Presumed to be from buffering in FTDI chip. Which would mean the voltage dip is only due to USB transactions. It is about 17 metres of cable with other devices.)
Last four bytes of download at a normal case: (Notable attenuation forming at 2 Mbps. Will be the 3k9 resistor.)
Yes, 3k9 looks too high for 2Mbd - I think it is only there as anti-contention and anti-phantom power, but P2 needs more power than P1, so lower values may be fine ?
I've just updated to latest loadp2, which uses a checksum so there is now a reply appearing back from the Prop2 upon download completion. There is a chance of receiving the reply before the USB bus shuts down ... or not! I can't get any problems now! I have to go back to trying to find a bad combination ...
Yes, 3k9 looks too high for 2Mbd - I think it is only there as anti-contention and anti-phantom power, but P2 needs more power than P1, so lower values may be fine ?
Yes, mainly for anti-phantom power- to prevent the FTDI being powered from P2 Tx.
It's likely those series resistors will be replaced with a pair of buffers in the next version, to ensure isolation and a clean signal at high speed.
.. There is a chance of receiving the reply before the USB bus shuts down ... or not! I can't get any problems now! I have to go back to trying to find a bad combination ...
Maybe the echo forces the SW to wait until the P2 replies, otherwise you are not sure where the buffers are ? Last byte loaded at PC end, is not last byte sent.
It's likely those series resistors will be replaced with a pair of buffers in the next version, to ensure isolation and a clean signal at high speed.
That sounds a good idea.
Also, allow for external UARTS, not just FTDI on-board, so someone can attach eg FT232H or FT2232H (or another prop, which could be running quite fast )
I've seen other Eval boards use solder-paste jumpers, where they have 2 versions a NC_JMPR and NO_JMPR - someone can change with just a soldering iron.
Uh-oh, just discovered something not nice that may affect the finished Prop2. Chip has done some work in the PLL but I doubt this issue will be covered.
I think there is a problem with switching between clock sources. In particular, any HUBSET #0 by itself can crash the chip. Once running on the crystal any attempt to return to RCFAST or RCSLOW is a potential crash. The behaviour of the crash varies from just a glitch to apparent lock up to running rampant.
I'm guessing the root cuase will be glitching in the clock source muxing since a workaround of two consecutive HUBSETs appears to be reliable. The catch is that one needs to know the prior clock mode config to do a clean switch back to RCFAST.
That reserved space in hubRAM for indicating the sysclock frequency maybe will need to include the whole clock mode config word. And we just extract the frequency from that if needed.
So I'm currently running a loop that jumps back and forth between a random PLL frequency and RCFAST. Here's the reliable fix for switching back to RCFAST: "set_freq" contains the prior clock mode config but with the %SS clock source select bits set to %00 (RCFAST)
hubset set_freq 'switch to RCFAST before dismantling the PLL
hubset #0 'shut down the oscillator/PLL
Yep, doing that when setting up the PLL has been the norm always. 10 ms pause on RCFAST before the switchover is the de-facto practise.
No one has been doing this going back to RCFAST though. It isn't meant to be needed either. In reality a single instruction barely counts as a pause but that is what's needed, albeit unintended.
Surprisingly, it seems reliable with only the final divider (%PPPP) matching the prior config. The remaining clock mode config bits can all be zeros. Eg: If that is a know fixed value then
Uh-oh, just discovered something not nice that may affect the finished Prop2. Chip has done some work in the PLL but I doubt this issue will be covered.
I think there is a problem with switching between clock sources. In particular, any HUBSET #0 by itself can crash the chip. Once running on the crystal any attempt to return to RCFAST or RCSLOW is a potential crash. The behaviour of the crash varies from just a glitch to apparent lock up to running rampant.
I'm guessing the root cuase will be glitching in the clock source muxing since a workaround of two consecutive HUBSETs appears to be reliable. The catch is that one needs to know the prior clock mode config to do a clean switch back to RCFAST.
.....
Surprisingly, it seems reliable with only the final divider (%PPPP) matching the prior config. The remaining clock mode config bits can all be zeros. Eg: If that is a know fixed value then
hubset #_XPPPP<<4
hubset #0
Hmm, that suggests the MUX that is related to the PPPP divider is not de-glitched ?
I have noticed that too-soon enable of Xtal (no PLL) as source, can hang the P2, so it seems narrow/partial clocks are not recovered from.
I think Chip has some forms of hand-over clock edge management in there, but maybe not in all the required places ?
.. but you also said 'Once running on the crystal any attempt to return to RCFAST or RCSLOW is a potential crash. ' while Xtal alone would not usually change PPPP ?
Does this mean software resets will/could fail ?
An external reset (as in external watchdog) should always recover, right ?
We had one chip once that needed a power-removal watchdog, but hopefully P2 does not require that.
Some may like to do that anyway, as it does catch effects a reset pin alone cannot, like latch-up of any element in the design.
Surprisingly, it seems reliable with only the final divider (%PPPP) matching the prior config. The remaining clock mode config bits can all be zeros. Eg: If that is a know fixed value then
hubset #_XPPPP<<4
hubset #0
I"m not sure if that is verilog-accessible - HUBSET may pass straight to the ring PLL/VCO registers ? ie a more selective write may not be possible ?
Maybe a delay-line based glitch filter is possible, in the verilog part ?
So all those programs that start with a HUBSET #0 are possibly at risk.
Could this explain the need for cold resets while testing SD cards? It seemed like the chip wasn't fully resetting, although I wrote it up to SD card lockup. For example, when trying to upload another program, while still running a previous program, the new program would never function correctly but would run fine from a freshly reset P2.
So, if we're running off xtal+pll and then reload a program that begins with hubset #0 it could crash?
So all those programs that start with a HUBSET #0 are possibly at risk.
Could this explain the need for cold resets while testing SD cards? It seemed like the chip wasn't fully resetting, although I wrote it up to SD card lockup. For example, when trying to upload another program, while still running a previous program, the new program would never function correctly but would run fine from a freshly reset P2.
So, if we're running off xtal+pll and then reload a program that begins with hubset #0 it could crash?
I think evanh is saying if the PPPP changes (VCO post divider), yes.
Did your test programs change the VCO divider away from the default /1 ? ie what was the VCO set to run at before/after your own system reload ?
I've just updated to latest loadp2, which uses a checksum so there is now a reply appearing back from the Prop2 upon download completion. There is a chance of receiving the reply before the USB bus shuts down ... or not! I can't get any problems now! I have to go back to trying to find a bad combination ...
Is this 'reply back from p2' flow proving to be solid in operation ?
If that reply is not a checksum ok, but instead the user asked for a additional prop_ver echo, after a non-checksum download is that also fully reliable ?
Is the simplest advice to just always use checksum download, in order to get some echo ?
There is a clock switch-over circuit in the clock pad. It needs a final transition from the source it is switching away from and then an initial transition from the source it is switching to. So, killing a clock source while simultaneously switching away from it could cause the P2 to hang.
If you are switching away from the crystal+PLL to the internal RCFAST oscillator, you'd want to first change the two LSB's to %00 via HUBSET to switch from the PLL to the RCFAST oscillator, then you could do a HUBSET #0 to shut off the crystal+PLL, while remaining in RCFAST.
On reset, RCFAST is always selected without any switch-over contingency.
This is a really long thread but just picking up on the tail end mostly, I did some checks in TAQOZ V2 (booted from SD) in switching to RCFAST back and forth (pun).
TAQOZ# CRUISE RCFAST TURBO RCSLOW COAST RCFAST CRUISE --- ok
TAQOZ# 300 40 DO I P2MHZ 50 ms RCFAST 50 ms 20 +LOOP CRUISE --- ok
Obviously no problem there but my clock words do have a delay. What happens if we hubset directly?
My clock config is stored in the _clk variable so I check it first and then use it.
TAQOZ# _clk @ .L --- $0100_0EFB ok
TAQOZ# 0 HUBSET _clk @ HUBSET --- ���
TAQOZ# --- ok
There's the garbage characters from when the baud rate shifted with the frequency but then it is all restored again.
Let's play a little more.
TAQOZ# .CLK --- 180MHz ok
TAQOZ# 240 P2MHZ _clk @ .L --- $0100_13FB ok
TAQOZ# 0 HUBSET _clk @ HUBSET --- �᳷�
TAQOZ# .CLK --- 240MHz ok
Still all good. How about a torture test and put it into a loop.
TAQOZ# 1000 FOR 0 HUBSET 10 ms _clk @ HUBSET NEXT --- �ٷ�
TAQOZ# .CLK --- 240MHz ok
Then also without any delay, plus measure the elapsed time it took to set and restore.
TAQOZ# 100 FOR 0 HUBSET _clk @ HUBSET NEXT --- �ٷ�
TAQOZ# LAP 0 HUBSET _clk @ HUBSET LAP .LAP --- 338 cycles= 1,408ns @240MHz ok
All good, no problems. Now admittedly I just did a quick check and it happened to be on my P2D2 so I will make sure I check this on the eval board too but I don't think it will be any different.
I don't know if this issue has been brought up but I can't emphasize enough how important it is to have a proper USB "charger" cable. Most USB cables are not designed for high currents so they will drop a lot of voltage (over a volt) and of course will glitch on current peaks so get one from your tablet or a short thick cable etc and use that rather than that long USB cable from 2009 that you happen to have sitting around.
Comments
Does that mean a floating P62(tx) does not see any issues ?
These tests means the state of P62 across the Crazy-PLL-Lock time, right ?
So the previous download completes, then your test code runs from that download, but in doing so it disturbs the FTDI part enough, that the NEXT download fails ?
No, just the pin drive at reset seems to matter. I've moved the pin float code to before the the HUBSET.
Yes.
And I've tried it both with and without the 10 ms wait. Makes no diff.
Hmm, so the P62 setting, at the moment DTR reset the prop matters, but after download it does not care ?
Does the FTDI part handshake during the download ?
Well, when the Prop2 is reset the pin gets reset too. For the pin, it's a new sequence from then on. Only the FTDI chip can hold a state beyond the DTR pulse.
Here's a cleaner look at the power rails. This one is the "singular port open fail" case:
Blue trace hasn't changed other than equalising its filter.
Green trace is scaled up to 100 mA/div and 0 mA is right at bottom of display.
PC-USB 5 Volt rail is now on the Orange trace (this probe is missing its ground clip so may exhibit extra overshoot).
Pink trace shows 3v3 VIO switcher rail. It is same 200 mV/div but 3.3 Volts is at middle of display instead of two div from top.
At the same time there is a slight dip, maybe 10 mV, in the benchtop voltage without any measurable increase in its current. Which can probably only make sense by having a change in current in the ground leads of the scope. The pink trace has the same step down and back up over the duration of the PC-USB voltage bump.
What makes it then go low - some PC retry ?
Does this "singular port open fail" case eventually recover, without cable-removal intervention, as the trace seems to resume ok ?
Why are there two current levels during run ? Is that two PLL settings ? One is flat, one has a slight ramp ?
I'm taking the 3 narrow pulses going down, just visible on the zoomed plot as reset-signatures at start of download, then the big-spike tags the PLL/VCO lock moment.
When the USB bus does shutdown for good it never produces that bump. But the shutdown looks to be at the same phase in the cycle as that bump.
The ramp is the stairs I've mentioned. There's three pieces and it depends on the sysclock rate as to how pronounced the stairs look. The last ramping piece is when I/O clocking gets set. In this case I'm using 100 MHz.
The later higher and longer current level is when all eight cogs are running and also the cordic is 90%+. These get started after the crazy PLL operation.
A whole normal case:
Last four bytes of download at a normal case: (Notable attenuation forming at 2 Mbps. Will be the 3k9 resistor.)
Download burst at a normal case: (Maybe 3 ms data continuation after PC-USB 5 Volt returns to normal. Presumed to be from buffering in FTDI chip. Which would mean the voltage dip is only due to USB transactions. It is about 17 metres of cable with other devices.)
Download burst at a bump case:
Yes, mainly for anti-phantom power- to prevent the FTDI being powered from P2 Tx.
It's likely those series resistors will be replaced with a pair of buffers in the next version, to ensure isolation and a clean signal at high speed.
Hmm, I thought P2 always could reply (eg if you send "?"), but the DOCs are worded so that only applies to checksum sends ? Seems an oversight ?
That means the host has no way of knowing when the P2 is ready from a simple download, and there is quite variable elasticity in download paths.
Maybe you have to append a Prop_Chk after send, and confirm that, for a no-checksum download proper complete verify ?
Maybe the echo forces the SW to wait until the P2 replies, otherwise you are not sure where the buffers are ? Last byte loaded at PC end, is not last byte sent.
Also, allow for external UARTS, not just FTDI on-board, so someone can attach eg FT232H or FT2232H (or another prop, which could be running quite fast )
I've seen other Eval boards use solder-paste jumpers, where they have 2 versions a NC_JMPR and NO_JMPR - someone can change with just a soldering iron.
I think there is a problem with switching between clock sources. In particular, any HUBSET #0 by itself can crash the chip. Once running on the crystal any attempt to return to RCFAST or RCSLOW is a potential crash. The behaviour of the crash varies from just a glitch to apparent lock up to running rampant.
I'm guessing the root cuase will be glitching in the clock source muxing since a workaround of two consecutive HUBSETs appears to be reliable. The catch is that one needs to know the prior clock mode config to do a clean switch back to RCFAST.
That reserved space in hubRAM for indicating the sysclock frequency maybe will need to include the whole clock mode config word. And we just extract the frequency from that if needed.
So I'm currently running a loop that jumps back and forth between a random PLL frequency and RCFAST. Here's the reliable fix for switching back to RCFAST: "set_freq" contains the prior clock mode config but with the %SS clock source select bits set to %00 (RCFAST)
(You might be doing that, and I'm not an expert on this... just came to mind though!)
No one has been doing this going back to RCFAST though. It isn't meant to be needed either. In reality a single instruction barely counts as a pause but that is what's needed, albeit unintended.
I have noticed that too-soon enable of Xtal (no PLL) as source, can hang the P2, so it seems narrow/partial clocks are not recovered from.
I think Chip has some forms of hand-over clock edge management in there, but maybe not in all the required places ?
.. but you also said 'Once running on the crystal any attempt to return to RCFAST or RCSLOW is a potential crash. ' while Xtal alone would not usually change PPPP ?
Does this mean software resets will/could fail ?
An external reset (as in external watchdog) should always recover, right ?
We had one chip once that needed a power-removal watchdog, but hopefully P2 does not require that.
Some may like to do that anyway, as it does catch effects a reset pin alone cannot, like latch-up of any element in the design.
I"m not sure if that is verilog-accessible - HUBSET may pass straight to the ring PLL/VCO registers ? ie a more selective write may not be possible ?
Maybe a delay-line based glitch filter is possible, in the verilog part ?
Could this explain the need for cold resets while testing SD cards? It seemed like the chip wasn't fully resetting, although I wrote it up to SD card lockup. For example, when trying to upload another program, while still running a previous program, the new program would never function correctly but would run fine from a freshly reset P2.
So, if we're running off xtal+pll and then reload a program that begins with hubset #0 it could crash?
Did your test programs change the VCO divider away from the default /1 ? ie what was the VCO set to run at before/after your own system reload ?
Is this 'reply back from p2' flow proving to be solid in operation ?
If that reply is not a checksum ok, but instead the user asked for a additional prop_ver echo, after a non-checksum download is that also fully reliable ?
Is the simplest advice to just always use checksum download, in order to get some echo ?
If you are switching away from the crystal+PLL to the internal RCFAST oscillator, you'd want to first change the two LSB's to %00 via HUBSET to switch from the PLL to the RCFAST oscillator, then you could do a HUBSET #0 to shut off the crystal+PLL, while remaining in RCFAST.
On reset, RCFAST is always selected without any switch-over contingency.
My clock config is stored in the _clk variable so I check it first and then use it. There's the garbage characters from when the baud rate shifted with the frequency but then it is all restored again.
Let's play a little more.
Still all good. How about a torture test and put it into a loop.
Then also without any delay, plus measure the elapsed time it took to set and restore.
All good, no problems. Now admittedly I just did a quick check and it happened to be on my P2D2 so I will make sure I check this on the eval board too but I don't think it will be any different.
I don't know if this issue has been brought up but I can't emphasize enough how important it is to have a proper USB "charger" cable. Most USB cables are not designed for high currents so they will drop a lot of voltage (over a volt) and of course will glitch on current peaks so get one from your tablet or a short thick cable etc and use that rather than that long USB cable from 2009 that you happen to have sitting around.