Shop OBEX P1 Docs P2 Docs Learn Events
Propscope hardware crashing — Parallax Forums

Propscope hardware crashing

BradCBradC Posts: 2,601
edited 2010-06-02 05:23 in Accessories
I've been chasing an obscure bug which I can't seem to reproduce on other hardware.

This program here will pop out messages to the serial terminal at 38000 baud. I never get more than about 10 seconds in before the Scope crashes.

CON
  _clkmode      = xtal1 + pll16x
  _xinfreq      = 6_250_000


OBJ
    Term  :       "FullDuplexSerial"

PUB Start | X, Y
  X~
  Term.Start(31,30,0,38400)
  Y := cognew(@fred, 0)
  repeat
    term.str(String("survived "))
    term.dec(X++)
    term.str(string(" loops",13))
    waitcnt(cnt+clkfreq)

DAT
              org       0
fred
              mov       da, #$40
              shl         da, #24
              mov       frqb, da             ' Set output freq
              mov       ctrb, ctrb_mode ' Enable counter
              mov       dira, ctrb_mask  ' Enable outputs
              mov       outa, pin           ' set pin 27 high
              mov       da, cnt
              add        da, frq
              waitcnt   da, frq               ' wait a second
              mov       dira, #0             ' Here is where it will crash
              waitcnt   da, frq               ' If it didn't crash this time, go around until it does.
              jmp       #fred

ctrb_mode     long      %00101 << 26 + 26
ctrb_mask     long      1<<26 | 1<<27
frq           long      80_000_000
pin           long      1<<27
da            res       1





The bit that is causing the crash is clearing dira while ctrb is still running at full tilt toggling the CLK net. I can't reproduce it on any other combination of pins, and if I disable the counter first (clear ctrb) then it does not happen.

It appears that this is the only way I can reproduce it.
pin 26&27 outputs
pin 27 high
counter toggling pin 26 faster than 6MHz

Now clear dira and it will crash hard but only intermittently.

The code I posted above is guaranteed to crash on my PropScope within 10 seconds.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"Are you suggesting coconuts migrate?"

Comments

  • David CarrierDavid Carrier Posts: 294
    edited 2010-05-28 20:19
    Brad,
    I've been out of town all week, so I wasn't able to look at this until now. I've tried running the code multiple times on my PropScope, and it works fine every time, but it does occasionally freeze on a coworker's PropScope. We are closing early today, so I don't have much time to look at it, but I'll check it out more in depth when we return on Tuesday.

    — David Carrier
    Parallax Inc.
  • BradCBradC Posts: 2,601
    edited 2010-05-29 01:21
    David Carrier (Parallax) said...
    Brad,
    I've been out of town all week, so I wasn't able to look at this until now. I've tried running the code multiple times on my PropScope, and it works fine every time, but it does occasionally freeze on a coworker's PropScope. We are closing early today, so I don't have much time to look at it, but I'll check it out more in depth when we return on Tuesday.

    — David Carrier
    Parallax Inc.

    No worries. I've worked around it by disabling ctrb before I zero dira. I'm just glad you guys can reproduce it (even if it's intermittent compared to mine). I'm baffled.

    On my scope it will crash within 10 seconds, _every_ time.

    I'll be interested to hear if you manage to figure out what's causing it.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
  • hover1hover1 Posts: 1,929
    edited 2010-05-29 01:41
    Brad,

    Using your code, I ran the PropScope for 6 hours with no lockups. I reloaded 6 times at 1 hour intervals with no lockups.

    The only manufacturing code is what was on the outside of the box "23225". I think I was the first or second order of this product. It's probably all the same manufacturing run.

    Jim
    BradC said...
    David Carrier (Parallax) said...
    Brad,
    I've been out of town all week, so I wasn't able to look at this until now. I've tried running the code multiple times on my PropScope, and it works fine every time, but it does occasionally freeze on a coworker's PropScope. We are closing early today, so I don't have much time to look at it, but I'll check it out more in depth when we return on Tuesday.

    — David Carrier
    Parallax Inc.

    No worries. I've worked around it by disabling ctrb before I zero dira. I'm just glad you guys can reproduce it (even if it's intermittent compared to mine). I'm baffled.

    On my scope it will crash within 10 seconds, _every_ time.

    I'll be interested to hear if you manage to figure out what's causing it.

  • BradCBradC Posts: 2,601
    edited 2010-05-29 12:14
    hover1 said...
    Brad,

    Using your code, I ran the PropScope for 6 hours with no lockups. I reloaded 6 times at 1 hour intervals with no lockups.

    The only manufacturing code is what was on the outside of the box "23225". I think I was the first or second order of this product. It's probably all the same manufacturing run.

    I'm baffled myself, but the thing nagging me is... the PropScope is running a Propeller outside of the specifications on the datasheet. Is it some very strange artifact that only appears on randomly unlucky chips (like mine?). I certainly can't repeat the behaviour on any of my other Propellers which run at 80Mhz..

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
  • BradCBradC Posts: 2,601
    edited 2010-05-29 12:41
    Further to the last post, I clocked it down by changing the clock line to

      _clkmode      = xtal1 + pll8x
    
    



    And kept the counter pin toggle rate the same with
                  mov       da, #$80
    
    



    and it still locks up reproducibly in under 10 seconds.. so it's not overclocking that's causing it.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
  • BradCBradC Posts: 2,601
    edited 2010-05-29 14:30
    More data points.

    Tried thermal variations (both inside and outside the metal tin)
    16 Deg C / 25 Deg C / 50 Deg C / 70 Deg C.. no change

    Tried with and without signal sources, with and without the -2V supply switched on...

    Directly connected to the USB port on my Mac Mini and via an externally powered 7 port USB hub.

    The only difference I've seen is with the timer set to $40_00_00_00, on pllx16 it locks up in ~5 seconds, and with pllx8 it can take up to 8. Set the timer to $80_00_00_00 at pllx8 and it's back to ~5 seconds.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
  • kuronekokuroneko Posts: 3,623
    edited 2010-05-30 00:14
    BradC said...
    The only difference I've seen is with the timer set to $40_00_00_00, on pllx16 it locks up in ~5 seconds, and with pllx8 it can take up to 8. Set the timer to $80_00_00_00 at pllx8 and it's back to ~5 seconds.
    This may be a red herring but you never know if you don't try [noparse]:)[/noparse] Can you change the first waitcnt to advance by frq1 which should be set to 80_000_003, i.e.

    DAT
                  org       0
    fred
                  mov       da, #$40
                  shl       da, #24
                  mov       frqb, da              ' Set output freq
                  mov       ctrb, ctrb_mode       ' Enable counter
                  mov       dira, ctrb_mask       ' Enable outputs
                  mov       outa, pin             ' set pin 27 high
                  mov       da, cnt
                  add       da, frq
                  waitcnt   da, frq1              ' wait a second             (##)
                  mov       dira, #0              ' Here is where it will crash
                  waitcnt   da, frq               ' If it didn't crash this time, go around until it does.
                  jmp       #fred
    
    ctrb_mode     long      %00101 << 26 + 26
    ctrb_mask     long      1<<26 | 1<<27
    frq           long      80_000_000
    frq1          long      80_000_003            '                           (##)
    pin           long      1<<27
    da            res       1
    


    The original code wasn't sync'd between the generated clock and switching off dira. With the modified delay the switch off happens at a specific point which I believe should be safe (as opposed to the one happening after 5 sec with the original setup). Thanks.
  • BradCBradC Posts: 2,601
    edited 2010-05-30 04:06
    kuroneko said...
    BradC said...
    The only difference I've seen is with the timer set to $40_00_00_00, on pllx16 it locks up in ~5 seconds, and with pllx8 it can take up to 8. Set the timer to $80_00_00_00 at pllx8 and it's back to ~5 seconds.
    This may be a red herring but you never know if you don't try [noparse]:)[/noparse] Can you change the first waitcnt to advance by frq1 which should be set to 80_000_003, i.e.

    That's no red herring. Making that change stops it crashing entirely.

    80_00_00_01 & 80_00_00_02 make it crash faster.

    Now the question is.. why ?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
  • kuronekokuroneko Posts: 3,623
    edited 2010-05-30 04:26
    BradC said...
    That's no red herring. Making that change stops it crashing entirely.

    80_00_00_01 & 80_00_00_02 make it crash faster.

    Now the question is.. why ?
    Those two values bring it out of sync again, i.e. the relation is not constant. Meaning it cycles and sooner or later you'll get critical behaviour again. For the original case after about 5 sec you get a switch off behaviour like this:

    CON
    {
      |             mov dira, #0              |
      |    S         D         e         R    |
    --+    +----+    +----+    +----+    +----+
      |    |    |    |    |    |    |    |    |
      +----+    +----+    +----+    +----+    +--
    
    --------+                   +---------+         high
     26     |                   |         +------   floating
            +-------------------+                   low
    
    --------------------------------------+         high
     27                                   +------   floating
                                                    low
    }
    


    i.e. the low cycle for the ctrb wave is still intact, the high cycle is only half completed when it gets cut off due to dira clearance. The data sheet for the analog circuit states a minimum of 5ns but that's only in a specific mode. One could speculate that this mutilated clock somehow affects this circuit which in turn would affect the prop. If it's just a simple reset one would assume that whatever is in EEPROM would run again (if it is a reset issue). Are you running from RAM only? I can't think of a reason that code like this would affect the prop on its own (after all, it's only two outputs which are being switched off).

    FWIW, the 80_000_003 syncs the clock so the cut off happens at 50% low cycle.

    Post Edited (kuroneko) : 5/30/2010 9:14:09 AM GMT
  • BradCBradC Posts: 2,601
    edited 2010-05-30 09:25
    kuroneko said...

    Are you running from RAM only?

    I put a test program into the prop to report if it has been reset. Yes, when it occurs the prop is being reset.

    Thanks for the explanation. I get it, but I still don't understand *why* it's a problem.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
  • kuronekokuroneko Posts: 3,623
    edited 2010-05-30 09:31
    I was wondering whether the PLL in the analog chip (assuming it's used ... to guarantee 50% duty on CLKA/CLKB) draws too much power occasionally which would cause the reset watchdog (NCP301LSN27T1) to bark. Also, the prop only seems to be fed with 3V, not sure if a minor fluctuation would cause it to reset.
  • BradCBradC Posts: 2,601
    edited 2010-05-30 10:05
    kuroneko said...
    I was wondering whether the PLL in the analog chip (assuming it's used ... to guarantee 50% duty on CLKA/CLKB)

    I'm pretty sure it's not. For the PLL to be enabled the MODE line needs to be pulled to an odd intermediate level using pull-up and pull-down resistors.
    kuroneko said...

    draws too much power occasionally which would cause the reset watchdog (NCP301LSN27T1) to bark. Also, the prop only seems to be fed with 3V, not sure if a minor fluctuation would cause it to reset.

    Interesting theory. I might have to dig out the DSO and see if I can find something peculiar happening.

    I note on the schematic, all the decoupling caps around the prop are 1.0uf.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
  • David CarrierDavid Carrier Posts: 294
    edited 2010-06-02 00:09
    Brad,
    I'm having problems reproducing it again. The PropScope that I could occasionally reproduce the problem on is now always behaving properly. When you say that the program crashes on your PropScope every time, does that involve power cycling it, or can you just press 'F10' and have it crash every time? Also, if you load it to the EEPROM does it crash just as readily?

    Thank you,
    David Carrier
  • BradCBradC Posts: 2,601
    edited 2010-06-02 04:45
    David Carrier (Parallax) said...
    does that involve power cycling it

    Nope.
    David Carrier (Parallax) said...
    or can you just press 'F10' and have it crash every time?

    Every time, like clockwork. I can go from having it running nicely sampling to the PC with working firmware, and load the test program directly to crash it.
    David Carrier (Parallax) said...
    Also, if you load it to the EEPROM does it crash just as readily?

    Yep.



    If I use Kuroneko's mod to the test program and change the 80_00_00_03 to 80_00_00_01 it will crash faster (usually within 3 seconds).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
  • kuronekokuroneko Posts: 3,623
    edited 2010-06-02 05:23
    For the record:

    80000000: critical switch-off after ~5sec  phsb sequence: $C0.$00.[u]$40[/u].$80
    80000001: critical switch-off after ~3sec                 $C0.[u]$40[/u].$C0.[u]$40[/u]
    80000002: critical switch-off after ~5sec                 $C0.$80.[u]$40[/u].$00
    80000003: stable                                          $C0.$C0.$C0.$C0
                                                              ~1s ~3s ~5s ~7s
    

    Post Edited (kuroneko) : 6/2/2010 6:01:00 AM GMT
Sign In or Register to comment.