+ Reply to Thread
Results 1 to 16 of 16

Thread: Propscope hardware crashing

  1. #1

    Default Propscope hardware crashing

    I've been chasing an obscure bug which I can't seem to reproduce on other hardware.

    This program here will pop out messages to the serial terminal at 38000 baud. I never get more than about 10 seconds in before the Scope crashes.

    Code:
    CON
      _clkmode      = xtal1 + pll16x
      _xinfreq      = 6_250_000
    
    
    OBJ
        Term  :       "FullDuplexSerial"
    
    PUB Start | X, Y
      X~
      Term.Start(31,30,0,38400)
      Y := cognew(@fred, 0)
      repeat
        term.str(String("survived "))
        term.dec(X++)
        term.str(string(" loops",13))
        waitcnt(cnt+clkfreq)
    
    DAT
                  org       0
    fred
                  mov       da, #$40
                  shl         da, #24
                  mov       frqb, da             ' Set output freq
                  mov       ctrb, ctrb_mode ' Enable counter
                  mov       dira, ctrb_mask  ' Enable outputs
                  mov       outa, pin           ' set pin 27 high
                  mov       da, cnt
                  add        da, frq
                  waitcnt   da, frq               ' wait a second
                  mov       dira, #0             ' Here is where it will crash
                  waitcnt   da, frq               ' If it didn't crash this time, go around until it does.
                  jmp       #fred
    
    ctrb_mode     long      %00101 << 26 + 26
    ctrb_mask     long      1<<26 | 1<<27
    frq           long      80_000_000
    pin           long      1<<27
    da            res       1




    The bit that is causing the crash is clearing dira while ctrb is still running at full tilt toggling the CLK net. I can't reproduce it on any other combination of pins, and if I disable the counter first (clear ctrb) then it does not happen.

    It appears that this is the only way I can reproduce it.
    pin 26&27 outputs
    pin 27 high
    counter toggling pin 26 faster than 6MHz

    Now clear dira and it will crash hard but only intermittently.

    The code I posted above is guaranteed to crash on my PropScope within 10 seconds.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  2. #2

    Default

    Brad,
    I've been out of town all week, so I wasn't able to look at this until now. I've tried running the code multiple times on my PropScope, and it works fine every time, but it does occasionally freeze on a coworker's PropScope. We are closing early today, so I don't have much time to look at it, but I'll check it out more in depth when we return on Tuesday.

    — David Carrier
    Parallax Inc.
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  3. #3

    Default

    David Carrier (Parallax) said...
    Brad,
    I've been out of town all week, so I wasn't able to look at this until now. I've tried running the code multiple times on my PropScope, and it works fine every time, but it does occasionally freeze on a coworker's PropScope. We are closing early today, so I don't have much time to look at it, but I'll check it out more in depth when we return on Tuesday.

    — David Carrier
    Parallax Inc.
    No worries. I've worked around it by disabling ctrb before I zero dira. I'm just glad you guys can reproduce it (even if it's intermittent compared to mine). I'm baffled.

    On my scope it will crash within 10 seconds, _every_ time.

    I'll be interested to hear if you manage to figure out what's causing it.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  4. #4

    Default

    Brad,

    Using your code, I ran the PropScope for 6 hours with no lockups. I reloaded 6 times at 1 hour intervals with no lockups.

    The only manufacturing code is what was on the outside of the box "23225". I think I was the first or second order of this product. It's probably all the same manufacturing run.

    Jim

    BradC said...
    David Carrier (Parallax) said...
    Brad,
    I've been out of town all week, so I wasn't able to look at this until now. I've tried running the code multiple times on my PropScope, and it works fine every time, but it does occasionally freeze on a coworker's PropScope. We are closing early today, so I don't have much time to look at it, but I'll check it out more in depth when we return on Tuesday.

    — David Carrier
    Parallax Inc.
    No worries. I've worked around it by disabling ctrb before I zero dira. I'm just glad you guys can reproduce it (even if it's intermittent compared to mine). I'm baffled.

    On my scope it will crash within 10 seconds, _every_ time.

    I'll be interested to hear if you manage to figure out what's causing it.

    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  5. #5

    Default

    hover1 said...
    Brad,

    Using your code, I ran the PropScope for 6 hours with no lockups. I reloaded 6 times at 1 hour intervals with no lockups.

    The only manufacturing code is what was on the outside of the box "23225". I think I was the first or second order of this product. It's probably all the same manufacturing run.
    I'm baffled myself, but the thing nagging me is... the PropScope is running a Propeller outside of the specifications on the datasheet. Is it some very strange artifact that only appears on randomly unlucky chips (like mine?). I certainly can't repeat the behaviour on any of my other Propellers which run at 80Mhz..

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  6. #6

    Default

    Further to the last post, I clocked it down by changing the clock line to

    Code:
      _clkmode      = xtal1 + pll8x



    And kept the counter pin toggle rate the same with
    Code:
                  mov       da, #$80



    and it still locks up reproducibly in under 10 seconds.. so it's not overclocking that's causing it.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  7. #7

    Default

    More data points.

    Tried thermal variations (both inside and outside the metal tin)
    16 Deg C / 25 Deg C / 50 Deg C / 70 Deg C.. no change

    Tried with and without signal sources, with and without the -2V supply switched on...

    Directly connected to the USB port on my Mac Mini and via an externally powered 7 port USB hub.

    The only difference I've seen is with the timer set to $40_00_00_00, on pllx16 it locks up in ~5 seconds, and with pllx8 it can take up to 8. Set the timer to $80_00_00_00 at pllx8 and it's back to ~5 seconds.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  8. #8

    Default

    BradC said...
    The only difference I've seen is with the timer set to $40_00_00_00, on pllx16 it locks up in ~5 seconds, and with pllx8 it can take up to 8. Set the timer to $80_00_00_00 at pllx8 and it's back to ~5 seconds.
    This may be a red herring but you never know if you don't try :) Can you change the first waitcnt to advance by frq1 which should be set to 80_000_003, i.e.

    Code:
    DAT
                  org       0
    fred
                  mov       da, #$40
                  shl       da, #24
                  mov       frqb, da              ' Set output freq
                  mov       ctrb, ctrb_mode       ' Enable counter
                  mov       dira, ctrb_mask       ' Enable outputs
                  mov       outa, pin             ' set pin 27 high
                  mov       da, cnt
                  add       da, frq
                  waitcnt   da, frq1              ' wait a second             (##)
                  mov       dira, #0              ' Here is where it will crash
                  waitcnt   da, frq               ' If it didn't crash this time, go around until it does.
                  jmp       #fred
    
    ctrb_mode     long      %00101 << 26 + 26
    ctrb_mask     long      1<<26 | 1<<27
    frq           long      80_000_000
    frq1          long      80_000_003            '                           (##)
    pin           long      1<<27
    da            res       1


    The original code wasn't sync'd between the generated clock and switching off dira. With the modified delay the switch off happens at a specific point which I believe should be safe (as opposed to the one happening after 5 sec with the original setup). Thanks.
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  9. #9

    Default

    kuroneko said...
    BradC said...
    The only difference I've seen is with the timer set to $40_00_00_00, on pllx16 it locks up in ~5 seconds, and with pllx8 it can take up to 8. Set the timer to $80_00_00_00 at pllx8 and it's back to ~5 seconds.
    This may be a red herring but you never know if you don't try :) Can you change the first waitcnt to advance by frq1 which should be set to 80_000_003, i.e.
    That's no red herring. Making that change stops it crashing entirely.

    80_00_00_01 & 80_00_00_02 make it crash faster.

    Now the question is.. why ?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  10. #10

    Default

    BradC said...
    That's no red herring. Making that change stops it crashing entirely.

    80_00_00_01 & 80_00_00_02 make it crash faster.

    Now the question is.. why ?
    Those two values bring it out of sync again, i.e. the relation is not constant. Meaning it cycles and sooner or later you'll get critical behaviour again. For the original case after about 5 sec you get a switch off behaviour like this:

    Code:
    CON
    {
      |             mov dira, #0              |
      |    S         D         e         R    |
    --+    +----+    +----+    +----+    +----+
      |    |    |    |    |    |    |    |    |
      +----+    +----+    +----+    +----+    +--
    
    --------+                   +---------+         high
     26     |                   |         +------   floating
            +-------------------+                   low
    
    --------------------------------------+         high
     27                                   +------   floating
                                                    low
    }


    i.e. the low cycle for the ctrb wave is still intact, the high cycle is only half completed when it gets cut off due to dira clearance. The data sheet for the analog circuit states a minimum of 5ns but that's only in a specific mode. One could speculate that this mutilated clock somehow affects this circuit which in turn would affect the prop. If it's just a simple reset one would assume that whatever is in EEPROM would run again (if it is a reset issue). Are you running from RAM only? I can't think of a reason that code like this would affect the prop on its own (after all, it's only two outputs which are being switched off).

    FWIW, the 80_000_003 syncs the clock so the cut off happens at 50% low cycle.

    Post Edited (kuroneko) : 5/30/2010 9:14:09 AM GMT
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  11. #11

    Default

    kuroneko said...

    Are you running from RAM only?
    I put a test program into the prop to report if it has been reset. Yes, when it occurs the prop is being reset.

    Thanks for the explanation. I get it, but I still don't understand *why* it's a problem.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  12. #12

    Default

    I was wondering whether the PLL in the analog chip (assuming it's used ... to guarantee 50% duty on CLKA/CLKB) draws too much power occasionally which would cause the reset watchdog (NCP301LSN27T1) to bark. Also, the prop only seems to be fed with 3V, not sure if a minor fluctuation would cause it to reset.
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  13. #13

    Default

    kuroneko said...
    I was wondering whether the PLL in the analog chip (assuming it's used ... to guarantee 50% duty on CLKA/CLKB)
    I'm pretty sure it's not. For the PLL to be enabled the MODE line needs to be pulled to an odd intermediate level using pull-up and pull-down resistors.

    kuroneko said...

    draws too much power occasionally which would cause the reset watchdog (NCP301LSN27T1) to bark. Also, the prop only seems to be fed with 3V, not sure if a minor fluctuation would cause it to reset.
    Interesting theory. I might have to dig out the DSO and see if I can find something peculiar happening.

    I note on the schematic, all the decoupling caps around the prop are 1.0uf.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  14. #14

    Default

    Brad,
    I'm having problems reproducing it again. The PropScope that I could occasionally reproduce the problem on is now always behaving properly. When you say that the program crashes on your PropScope every time, does that involve power cycling it, or can you just press 'F10' and have it crash every time? Also, if you load it to the EEPROM does it crash just as readily?

    Thank you,
    David Carrier
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  15. #15

    Default

    David Carrier (Parallax) said...
    does that involve power cycling it
    Nope.

    David Carrier (Parallax) said...
    or can you just press 'F10' and have it crash every time?
    Every time, like clockwork. I can go from having it running nicely sampling to the PC with working firmware, and load the test program directly to crash it.

    David Carrier (Parallax) said...
    Also, if you load it to the EEPROM does it crash just as readily?
    Yep.



    If I use Kuroneko's mod to the test program and change the 80_00_00_03 to 80_00_00_01 it will crash faster (usually within 3 seconds).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "Are you suggesting coconuts migrate?"
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

  16. #16

    Default

    For the record:

    Code:
    80000000: critical switch-off after ~5sec  phsb sequence: $C0.$00.$40.$80
    80000001: critical switch-off after ~3sec                 $C0.$40.$C0.$40
    80000002: critical switch-off after ~5sec                 $C0.$80.$40.$00
    80000003: stable                                          $C0.$C0.$C0.$C0
                                                              ~1s ~3s ~5s ~7s

    Post Edited (kuroneko) : 6/2/2010 6:01:00 AM GMT
    Last edited by ForumTools; 09-30-2010 at 09:51 AM. Reason: Forum Migration

+ Reply to Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts