P2 Reset - possible problem

During my SD testing I noticed that I needed to repower my P2-EVAL board quite often between tests.
I realised it was when i had loaded wrong code which would cause the P2 to execute unknown code.
The reset done by pnut and pst dont reset the P2 properly. It’s effectively locked up and only repowering will unlock it.
Originally i just put it down to my SD testing. Having thought further, it cannot be the SD card even tho it could be also locked. You see, I cannot perform a pnut find p2. So it has to be the P2 which is failing to perform the reset.
Any thoughts???
My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
Website: www.clusos.com
Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
«13456

Comments

  • jmgjmg Posts: 13,901
    What sort of failure rates ?

    Checking Icc should show if RST is working, and there should also be a 'signature' in the reset currents, as it progresses.

    I've used mainly the RESET button, and not noticed that not work.

    I did notice that P2 freezes if you select Xtal/PLL too early, so there is some intolerance there to out-of-spec clocks.
    Usually, you would expect too-soon select of XTAL, to simply wait 1~3ms until the Osc starts.
  • Cluso,
    The problem will be USB. USB, on PC based hardware at least, is generally an unreliable system ... and always has been. It seems to be the bus controllers get themselves in a bind. Mostly unplugging the USB end device will recover the lockup, but sometimes the PC itself has to be powered down.

    Heh, we could promote the Prop2 as the first ever reliable USB host system.

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • jmgjmg Posts: 13,901
    evanh wrote: »
    Cluso,
    The problem will be USB. USB, on PC based hardware at least, is generally an unreliable system ... and always has been. It seems to be the bus controllers get themselves in a bind. Mostly unplugging the USB end device will recover the lockup, but sometimes the PC itself has to be powered down.
    In that case, the reset button should still work ? Not clear from above, if Cluso tried that ?

    I have seen one part in the past, where reset was more of a 'reset request' and failed in some cases, but I think the folly there was the designers made reset polarity fuse-defined, which itself has a ramp-read rule.
    We solved that with a power-removal watchdog.
    P2 has no reset polarity option, and I think reset is quite global in P2, and should? always revert to RCFAST as a clock source, on exit.
    I think P2 already has Hysteresis and a spike filter on the RST pin, which are other areas where vendors have found reset issues in the past.

  • evanhevanh Posts: 7,841
    edited 2019-02-25 - 01:11:02
    DTR is unable to toggle, and comms will be out of the picture too. Reset button still works but wouldn't really help here unless it's booting from the SD and not bothering using the FTDI comport.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • jmgjmg Posts: 13,901
    evanh wrote: »
    DTR is unable to toggle, and comms will be out of the picture too. Reset button still works but wouldn't really help here unless it's booting from the SD and not bothering using the FTDI comport.

    That's not quite what #1 is saying, so more testing and info will be needed, to isolate if it is an actual P2-RESET issue, or an USB-Bridge/FTDI link side effect.
  • I'm confident. USB is is so commonly prone to this.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • I don't think this is a USB problem.

    There is a correlation to executing random code and the lockup. If I don't execute random code (was due to bugs) then all seems to work fine.
    I know we shouldn't be executing random code, but I would expect the P2 to reset out of this.

    But, I didn't try the reset button. So when next this occurs, I'll try the reset button, and report back.

    The reason for posting this was to see if anyone else is experiencing a lockup that cannot be corrected with reset.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • jmgjmg Posts: 13,901
    Cluso99 wrote: »
    But, I didn't try the reset button. So when next this occurs, I'll try the reset button, and report back.

    The reason for posting this was to see if anyone else is experiencing a lockup that cannot be corrected with reset.

    I have seen lockups, but I don't think any of them needed more than a reset-button to fix.
    They mostly were around deliberately too-quick hand-over of RCFAST to Xtal, or RCFAST to PLL, and there, P2 can freeze.
  • Cluso99 wrote: »
    The reason for posting this was to see if anyone else is experiencing a lockup that cannot be corrected with reset.

    Yep, I get unresponsive Prop2 every time I spend a decent amount of time testing without re-socketing the USB at the EVAL board. Perfectly normal behaviour for USB.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • It's just like lithium-cobalt battery chemistry is the one type that can be counted on to self-ignite. And pretty much no others do this.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • cgraceycgracey Posts: 11,694
    edited 2019-02-25 - 07:53:25
    I believe this has to do with your software getting involved with uninitialized memory. On powerup, memory goes to some cleared state, but after download and program execution it can be in some other state which may not be copacetic to new downloads.

    The problem, I'm quite sure, is that your code is doing something that you don't understand and involving memory outside of what you have defined or initialized.

    Note that on reset, every single flop in the chip is cleared. The only things that are not cleared are the memories. They are unaffected by reset. An SD card would not be affected by reset, either.
  • Cluso99Cluso99 Posts: 15,396
    edited 2019-02-25 - 07:29:50
    cgracey wrote: »
    I believe this has to do with your software getting involved with uninitialized memory. On reset, memory goes to some cleared state, but after download and program execution it can be in some other state which may not be copacetic to new downloads.

    The problem, I'm quite sure, is that your code is doing something that you don't understand and involving memory outside of what you have defined or initialized.

    Note that on reset, every single flop in the chip is cleared. The only things that are not cleared are the memories. They are unaffected by reset. An SD card would not be affected by reset, either.

    I know that the code being executed is indeed garbage, not real code, when this problem happens. It is the direct result of buggy code.

    But I expected that a reset should restart the P2 and bring it out of an unknown locked state. So it should be reloading the ROM code into top of hub and executing it. This restart will not access the SD card as I have P59 switch pulled up. The P2 will not identify itself to pnut.

    As I said before, this is related to running garbage code. It's not a USB problem.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • cgraceycgracey Posts: 11,694
    edited 2019-02-25 - 07:58:49
    I made a mistake in my post a few back.

    It is on powerup that memory is cleared with some consistency. On subsequent resets, though, the memory retains what was written before the last reset. I think this is why power cycling is having some positive effect on your code.
  • jmgjmg Posts: 13,901
    cgracey wrote: »
    I made a mistake in my post a few back.

    It is on powerup that memory is cleared with some consistency. On subsequent resets, though, the memory retains what was written before the last reset. I think this is why power cycling is having some positive effect on your code.

    Which memory do you mean ?
    The config registers for the clock, for example, would not be memory, as such, but would reset every reset pulse, right ?

    Every reset also re-loads the serial ROM too, right ?
  • cgraceycgracey Posts: 11,694
    edited 2019-02-25 - 09:10:07
    jmg wrote: »
    cgracey wrote: »
    I made a mistake in my post a few back.

    It is on powerup that memory is cleared with some consistency. On subsequent resets, though, the memory retains what was written before the last reset. I think this is why power cycling is having some positive effect on your code.

    Which memory do you mean ?
    The config registers for the clock, for example, would not be memory, as such, but would reset every reset pulse, right ?

    Every reset also re-loads the serial ROM too, right ?

    All flops get cleared on reset. Cog RAM gets reloaded in cog 0 on boot, but what trails the defined code on subsequent downloads could be random hub contents. LUT RAM gets reloaded in cog 0 on boot, but only because the cog program from boot ROM does it. And the boot ROM gets reloaded on each reset.
  • It would have to be something quite specific in the code to be such a bricking like behaviour. I've tried a random fill of cog/lutRAM for cog#0 and jumping to #0. Repeatedly doing this hasn't generated any unresetable lockups. Not that I've had one from anything else today either.

    I've got a loop running on the desktop reloading a test program every second. The test program reports back a bunch of text as it runs, then does the random fill and jump of cog#0. So I can see by the rx/tx LEDs if it stops doing so.

    So far, out of hundreds, maybe thousands, of runs, I've had a single port open fail (not a symptom) and it recovered on the very next attempt.


    The PC's USB controller crashing still seems it to me. It's a universally known problem with USB and Cluso's reported symptoms match up well.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • jmgjmg Posts: 13,901
    cgracey wrote: »
    jmg wrote: »
    cgracey wrote: »
    I made a mistake in my post a few back.

    It is on powerup that memory is cleared with some consistency. On subsequent resets, though, the memory retains what was written before the last reset. I think this is why power cycling is having some positive effect on your code.

    Which memory do you mean ?
    The config registers for the clock, for example, would not be memory, as such, but would reset every reset pulse, right ?

    Every reset also re-loads the serial ROM too, right ?

    All flops get cleared on reset. Cog RAM gets reloaded in cog 0 on boot, but what trails the defined code on subsequent downloads could be random hub contents. LUT RAM gets reloaded in cog 0 on boot, but only because the cog program from boot ROM does it. And the boot ROM gets reloaded on each reset.

    Perhaps an uninitialized ROM variable could change between the types of reset. (ROM Is now unzipped, right?)
    I guess the test is to see if the next lock-out Cluso sees, clears via reset pin, or needs a power cycle ?
  • Two port open fails now. :)
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • jmg wrote: »
    cgracey wrote: »
    jmg wrote: »
    cgracey wrote: »
    I made a mistake in my post a few back.

    It is on powerup that memory is cleared with some consistency. On subsequent resets, though, the memory retains what was written before the last reset. I think this is why power cycling is having some positive effect on your code.

    Which memory do you mean ?
    The config registers for the clock, for example, would not be memory, as such, but would reset every reset pulse, right ?

    Every reset also re-loads the serial ROM too, right ?

    All flops get cleared on reset. Cog RAM gets reloaded in cog 0 on boot, but what trails the defined code on subsequent downloads could be random hub contents. LUT RAM gets reloaded in cog 0 on boot, but only because the cog program from boot ROM does it. And the boot ROM gets reloaded on each reset.

    Perhaps an uninitialized ROM variable could change between the types of reset. (ROM Is now unzipped, right?)
    I guess the test is to see if the next lock-out Cluso sees, clears via reset pin, or needs a power cycle ?

    Peter unzips data for TAQOZ, but before that, my and Cluso99's booters and Peter's TAQOZ just transfer from ROM to $FC000..$FFFFF on reset.
  • cgracey wrote: »
    I made a mistake in my post a few back.

    It is on powerup that memory is cleared with some consistency. On subsequent resets, though, the memory retains what was written before the last reset. I think this is why power cycling is having some positive effect on your code.

    No, it is NOT the random code in hub ram. I am loading unknown garbage from the SD and then executing it. This was caused by bugs in my SD test code. That's how I discovered the lockup :(

    BTW I actually like that hub stays consistent through resets. We were using that to advantage in the FPGA a loooong time ago ;)
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99 wrote: »
    ... I am loading unknown garbage from the SD and then executing it. This was caused by bugs in my SD test code. That's how I discovered the lockup :(
    But not repeatable, right?
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Could be something like accidently executing a hubset command and not giving sufficient time before cutting over source.

    I've been trapped before where the boot has failed and is in fact sitting in 'serial'. But that wouldn't explain the lack on control-G operation in P-nut.

    Which edge of DTR does your board reset on? Is it still set how it comes from Parallax? Or did you invert it using FT_Prog?
  • Cluso99Cluso99 Posts: 15,396
    edited 2019-02-25 - 10:16:28
    It was, with different bugs too. Dont know if i can repeat it now tho. I do keep lots of backups in my code but not sure which ones were the problems.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Wah-hoo! Something finally happened! 3.5 hours of cycling, ~12000 odd runs. Although maybe only the last 1000 runs count from when I made a change to the test code. The behaviour is different this time in that I am seeing repeated comport open fails. I'm pretty certain that wasn't the case last time.

    The PC-USB power LED has gone out! Reset button is working but it doesn't recover the USB. Power cycling the AUX-USB supply doesn't recover the USB either ...

    Only fix is unplugging the PC-USB plug. Also seemed to need turning off the AUX supply so as to remove all power sources, but this might just be a give-it-more-time case.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Right, there is still 3.0 volts on the 5V_USB rail with the PC_USB cable unplugged. So that explains why all power has to be removed to reset the FTDI device.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • evanhevanh Posts: 7,841
    edited 2019-02-25 - 12:51:17
    So I've duplicated the USB lockup now. Here's the recent code change as per the idea of engaging the PLL before lock is gained:
    		wrpin	#0, #56
    		drvl	#56
    		hubset	#0
    		waitx	##20_000_000/10		'~100ms
    		hubset	##_SETFREQ		'setup oscillator
    		waitx	#500
    		hubset	##_ENAFREQ		'enable oscillator (expected lock-up)
    
    		waitx	#500
    		wrpin	#0, #57
    		drvl	#57
    		loc	ptra, #\@zeros
    		rep	@.fill, ##$200
    
    		getrnd	pa
    		wrlong	pa, ptra++
    .fill
    		setq	#$1ff
    		rdlong	#0, ##@zeros
    		setq2	#$1ff
    		rdlong	#0, ##@zeros
    
    		wrpin	#0, #58
    		drvl	#58
    		jmp	#\0
    
    
    zeros		res	$800
    
    
    Amusingly, without the two WAITX #500's, the LED for pin#58 would light up on most runs. But with both WAITX's only #56 and #57 light up, mostly. Sometimes still get #58 as well. The first WAITX #500 seemed to make the biggest difference.

    PS: That's just a snippet of a larger test located in hubRAM.
    PPS: It takes a lot of tries for this code to trip things up but it definitely made a difference at knocking out the FT231 FTDI chip.
    PPPS: Target sysclock I had set was 350 MHz. AUX 5 volt peaks at over 500 mA.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Oh, how interesting, this second USB lockup has taken the PC's USB controller with it. I've moved to a different controller and it's back without needing to power down the PC just yet.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Yes, I’ve seen the pcs port lock and changing to another port fixes it. Changing back to the original port doesnt fix it - only a windows restart fixes the locked usb ports.

    USB still sucks compared to the old reliable serial.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • evanh wrote: »
    Wah-hoo! Something finally happened! 3.5 hours of cycling, ~12000 odd runs. Although maybe only the last 1000 runs count from when I made a change to the test code. The behaviour is different this time in that I am seeing repeated comport open fails. I'm pretty certain that wasn't the case last time.

    The PC-USB power LED has gone out! Reset button is working but it doesn't recover the USB. Power cycling the AUX-USB supply doesn't recover the USB either ...

    Only fix is unplugging the PC-USB plug. Also seemed to need turning off the AUX supply so as to remove all power sources, but this might just be a give-it-more-time case.

    " The PC-USB power LED has gone out! "

    At that point, the PC has suspended the USB port. That is the only reason the PWR LED can go off like this.

    I wondered if the issue is simply due to code that has P2 pulling over-current from the PC USB port; perhaps spikes that are faster than the USB circuit can protect against, or just some heavy load that forces too much voltage drop. But then you mentioned the board was powered by AUX at the same time, so the above won't make sense- as no power would be sourced from PC-USB whilst AUX is active.

    Still- the PC is sure suspending the port for some reason. Maybe you've found a weakness with all those test-repetitions.
    You may be able to disable/re-enable the HUB in device manager, rather than needing to power-cycle the PC.
  • evanh:
    Do you have a data-only cable or could you DIY one and run that same kind of test?

    VonSzarvas, would that be okay to only power the P2 Eval board from the P2 USB (aux) plug, and only have a data connection (snip the +5V, leave Data+ Data- and GND) on the PC USB plug?
    Cluso99 wrote: »
    USB still sucks compared to the old reliable serial.

    Getting nostalgic looking through rose-tinted glasses?
    Yeah it's really nice powering from a separate wall wart transformer because RS232 doesn't supply power.
    Yeah it's great having to put charge pumps in to swing the voltage and also receive a swing as high as ±15 V.
    It's great how flow control is a crapshoot, and probably is ignored by the master sending the data anyway.
    Dial everything down to 9600 baud because "at least that works". So the download takes longer than a laptop battery can last?
    Requiring gender benders and crossover adapters and crossover cables?
Sign In or Register to comment.