Shop OBEX P1 Docs P2 Docs Learn Events
Propeller Chip - Apparent Cog Instability — Parallax Forums

Propeller Chip - Apparent Cog Instability

Paul VossPaul Voss Posts: 13
edited 2008-03-19 07:32 in Propeller 1
I am using a Propeller chip to control small research balloons in the Arctic next month - until recently, everything was going great - the parallel processors are a dream to work with.

However, I have found a very strange instability that I am worried could be a flaw in the chip (hopefully I've just made an idiotic mistake and someone will straighten me out).

I reduced the offending code to a short and simple program (attached). The problem is very fussy, depending on the exact timing of two coginits - change one little thing and the problem goes away. However, as written the problem reliably occurs on both raw and board-integrated Propsticks with differing power supplies and external connections. Although the symptom is garbled text on hyperterminal, all the serial code can be removed and the problem still persists - in this case, the LED (if enabled) will flash about 9-15 times and then go out - the main and led cog both lock up, so it appears to be a cog interaction. Also note that the debug cog is not specified here and could be stopped bya subsequent coginit - in other tests, I have specified the debug cog and got the same lockup problem.

If some of the smart people on this forum could take a look at the attached code excerpt, I would greatly appreciate it! The code is a bit unusual due to the complexity of the parent program it came from. Note that I am not looking for just a fix (there are many simple changes that miraculously fix the problem) - rather, I need to understand what is going on so that I don't fly unstable code. The balloons need to ship very soon - this problem was an unfortunately last-minute surprise.

Thanks

Paul
«1

Comments

  • Graham StablerGraham Stabler Posts: 2,510
    edited 2008-02-18 02:21
    The c := cnt should be before a waitcnt or do waitcnt(clkfreq*2+cnt) as it is the waitcnt becomes a function of the length of time taken to complete the other instructions or in other words if you expected a delay of 2 seconds you get a delay of 2 seconds - time for other commands.

    Might not be the problem but it's true [noparse]:)[/noparse]

    Graham
  • Mike GreenMike Green Posts: 23,101
    edited 2008-02-18 03:39
    I would echo Graham's concern. In your MAIN method, you save the system clock, then perform several complex operations and expect to be able to do a WAITCNT for a time 2 seconds later (which works) or 1 second later (which doesn't). You didn't say, but, if you miss the 1 second absolute time mark, your program will wait for several minutes until the absolute time wraps around in the 32 bit counter. That's quite possible.

    Another thing is that, by reinitializing the LED and GPS cogs, you're stopping them at an arbitrary place, then restarting them. Also, a COGINIT will take several milliseconds to perform at 10MHz.
  • mirrormirror Posts: 322
    edited 2008-02-18 04:11
    Mike,
    according to Paul the problem occurs at 2 seconds, but not at 1 second - which is opposite to your understanding - and somewhat changes the timing issues.

    Paul,
    I had a similar garbling of serial characters a while ago (8 to 10 months). It seemed to be dependent on how the code was laid out. Swapping lines of code and changing delays made the problem change - also I only seemed to have a problem when sending non-printable characters.
    The horrible answer is that as I wrote more code - a whole lot more - the problem mysteriously dissappeared. I never did find out what the problem was, but it hasn't bugged me since. I wasn't starting/stopping/interrupting the cogs like you are, but I did have 7 of the cogs occupied.
    I wish I could shed more light. I'm reasonably sure it's not a problem with the chip as when the problem occurred I'd only been using the chip for a very short amount of time (<1 month).

    Just·one thing out of curiosity, is this what you mean by your case statement?
    if (N == i) or (N == 1) 
      coginit(2,GPS_COGCODE,@GpsStack) 'COMMENT OUT COGINIT (NOT I,1) AND CODE RUNS 
    elseif (N == i+1) or (N == 2) 
      ' DO NOTHING
    
  • Paul VossPaul Voss Posts: 13
    edited 2008-02-18 05:00
    Thanks for the quick response. I'll take them in order..

    First, on the c:=cnt being reversed - I agree it is awkward, however, it was a deliberate reversal due to the multiple possible threads after c is initialized - there is simply no place to put the waitcnt at the end of the (real) program that works for all scenarios. Putting the waitcnt at the very beginning works well with the one exception that I need to be careful on the first pass through (before the c:=cnt line is executed)

    Second, on starting and stopping the cogs at arbitrary places - I thought this was ok - its only flashing an led and all pins revert to 0 when the cog is stopped (led off). Perhaps not the ideal way to do it, but it seems it should not be causing the lockup I am seeing.

    Mike - I think you and I may have had the same issues. The "mysteriously disappear" thing would normally work - however, with the flight issues, I need to be 100% certain what is going on. The code I posted seems very simple and should not be causing lockup problems. Hopefully, someone will prove me wrong - show me my error. This is what I am hoping for. And yes, the logic in your if-then statement is what is in the case I believe.

    Thanks again - please let me know if you have further thoughts - this remains a deep (and urgent) mystery.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-02-18 05:05
    Paul Vosk said...
    ... and all pins revert to 0 when the cog is stopped (led off).
    'Probably not relevent, but in need of correction just the same: all pins revert to inputs when the cog is stopped.

    -Phil
  • Mike GreenMike Green Posts: 23,101
    edited 2008-02-18 05:18
    Paul,
    It's impossible to be 100% sure with your program as written. You've written it with the potential for a fault ... missing a WAITCNT time. You need to do a timing analysis of the delays introduced by the COGINITs and the debug output or you need to rewrite the code to be independent of the delays introduced by them. Unfortunately, there's no execution time chart for the Spin operators to make it easy. You'll need to do some testing to determine the actual time involved. I'm a firm believer in designing programs to do what they need to do rather than relying on testing. You can't always do that or do it completely, but, to the extent you can do it, it improves your program's reliability and your faith in it.
  • Paul VossPaul Voss Posts: 13
    edited 2008-02-18 05:28
    Hi Phil,

    I agree timing waitcnt timing is critical. In this case though, the delay of the cog inits is a few 10s of milliseconds vs the 2 seconds of the waitcnt - so it's not even close to causing a problem in this instance. Also, the symptom I see (printing random characters continuously to hyperterminal or freezing forever all the cogs) is completely different that would be caused by a hanging waitcnt. Thanks for your reply.

    Paul
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-02-18 05:55
    Paul,

    I'm not a huge fan of using COGINIT in Spin. In my mind, it's too dangerous, presupposes too much, and ought to be banned from the language. My recommmendation: just don't use it! Here's a version of your code that uses COGNEW and COGSTOP and seems not to suffer the hangups that occur in the original code:

    ''PROGRAM TO DEMONSTRATE COG CONFLICTS
    
    [b]CON[/b]
      [b]_clkmode[/b] = [b]xtal[/b]1 + [b]pll[/b]2x                               ' phase lock loop multiplier for clock
      [b]_xinfreq[/b] = 5_000_000                                   ' base clock frequency is 5 MHz
      led      = 16
    
    [b]OBJ[/b]
      dbug : "FullDuplexSerial"                        'include serial object for hyperterm
    
    [b]VAR[/b]                                                     
      [b]long[/b]  LedStack[noparse][[/noparse]500&#093;                                   'led cog memory 
      [b]long[/b]  GpsStack[noparse][[/noparse]500&#093;                                   'gps cog memory
    
    [b]PUB[/b] MAIN | c,i,N,iGPS,iLED
    
     'PROGRAM COUNTS 1,2,3,4,5,6 AND THEN SPITS GARBAGE TO SCREEN OR FREEZES (RUN SEVERAL TIMES)
      
      dbug.start(31,30,%0000,19200)                    'start dbug serial on it own cog
      
      N:=6
      c:=[b]cnt[/b]
      iLED := -1
      iGPS := -1
      [b]repeat[/b]
        [b]repeat[/b] i [b]from[/b] 1 to N
          [b]waitcnt[/b](clkfreq*2+c)                         'PROB OCCURS FOR 2 SEC  DELAY, BUT NOT 1 SEC??
          c := [b]cnt[/b]                                     'WAITCNT MUST BE BEFORE C:=CNT FOR PROB TO OCCUR
          dbug.dec(i)                                  'prints loop counter              
          dbug.tx(10)
          dbug.tx(13)
          [b]case[/b] N
            i,1    :
              [b]if[/b] (iGPS => 0)
                [b]cogstop[/b](iGPS)
              iGPS := [b]cognew[/b](GPS_COGCODE, @GpsStack)
            i+1,2  :                                   'does nothing, but the delay is critical to problem
          [b]if[/b] (iLED => 0)
            [b]cogstop[/b](iLED)
          iLED := [b]cognew[/b](LED_COGCODE, @LedStack)
    
    [b]PRI[/b] GPS_COGCODE
      [b]repeat[/b]                             
      'nothing needed here
      
    [b]PRI[/b] LED_COGCODE | c                                'FLASH LED ON PIN 27 (DIABLED - SEE BELOW)
      [b]outa[/b][noparse][[/noparse]led&#093; := 0                          
      [b]dira[/b][noparse][[/noparse]led&#093; := 1                            
      [b]repeat[/b]
        c := [b]cnt[/b]             
        [b]outa[/b][noparse][[/noparse]led&#093; := 1                                  'outa=0, led off to eliminate power issues
        [b]waitcnt[/b](clkfreq/10+c)                             
        [b]outa[/b][noparse][[/noparse]led&#093; := 0   
        [b]waitcnt[/b](clkfreq+c)
    
    
    


    -Phil
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-18 08:11
    Paul, I have not looked into any of your code; others already have.....

    However the things you describe GENERALLY have one cause only - stack (or other memory) overflows.
    I notice in Phil's posting that you allocate 500 LONGs for them. This is curious. Either you need so much.... so are you sure you do not need even more???

    On the other hand this is considerable space... Are you sure the main COG has still got enough memory? You have no _STACK safety belt instruction in it...


    A second remark: It is generally a better technique to use WAITCNT as it is intended, as a "waiting upto a deadline", and refer to CNT once only, rather then twist it to a "delay" instruction....
  • mirrormirror Posts: 322
    edited 2008-02-18 10:19
    Phil,
    Did you try Paul's first version of code before writing your own? Were you able to reproduce the problem, or did you just write the most likely workaround?
    Why not use coginit - if it is part of the language then why not use it?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • hippyhippy Posts: 1,981
    edited 2008-02-18 11:02
    mirror said...
    Why not use coginit - if it is part of the language then why not use it?

    It's more error prone than CogNew, requiring that the Cog to be used isn't already in use. With
    the right precautions and checks it is okay. To know which Cogs are available is not always easy
    when using sub-objects, and, once working, hard-wired CogInits may cause program failure if sub-
    objects are changed or more sub-objects are added, or where the code is included as a sub-
    object itself.

    A CogInit stops whatever may be running in that Cog even if essential to the program, and there's
    no feedback on whether the Cog was previously in use or not.

    I wouldn't ban CogInit but would recommend CogNew in preference unless there were compelling
    reasons to use CogInit.

    Post Edited (hippy) : 2/18/2008 11:10:22 AM GMT
  • Martin HebelMartin Hebel Posts: 1,239
    edited 2008-02-18 11:08
    I had tested/recommended issuing at a minimum a cogstop prior to the Coginit (re-init) to Paul, while recommending he post here also. Hippy/all, would that make it a more stable use by ensuring it is shut down first prior to another coginit? Or is there a possibility counters may still be running leading to possible instability?

    -Martin

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    SelmaWare Solutions - StampPlot GUI for controllers, XBee and Propeller Application Boards

    Southern Illinois University Carbondale, Electronic Systems Technologies

    American Technical Educator's Assoc. Conference·- April, Biloxi, MS. -- PROPELLER WORKSHOP!
  • hippyhippy Posts: 1,981
    edited 2008-02-18 11:14
    @ Martin : I don't really see any advantage in a CogStop before a CogInit, and wouldn't expect that
    to result in any different stability.

    I personally wouldn't CogStop then CogNew/CogInit unless I had to. I prefer to get the Cogs running
    at the start of the program and then control them by updating 'shared variables', but appreciate that
    may not always be possible.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-02-18 17:07
    Mirror,

    Yes, I tried his original code and was able to reproduce the problem. I made the least modification possible to exorcize the COGINITs, and that seems to have fixed things. (I added the CON for the LED pin, only becasue pin 27 isn't available on my proto board.)

    Just bacause a feature is available is no reason to use it. Many otherwise structured languages provide GOTO, too, but good coding practice disdains its use.
    ___

    Hippy,

    I agree: stopping and restarting a cog is not the best practice. If the interval between stopping and restarting were long enough, I could see it as a way to save power, but that's about it.

    -Phil
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-18 20:44
    This COGNEW/COGSTOP/COGINIT discussion is not terribly helpful, and a lot of prejudices shine through.
    It is most unlikely that anything discussed above is the cause for the problems.

    Stack-Overflow...
    DeSilva now took a deeper look at the program.. No memory usage of any kind at all...
    Other funny things.. DeSilva had never seen such a CASE construct.... but there is nothing against it in the Manual... However:
    i+1,2  :                                   'does nothing, but the delay is critical
    


    No, that cannot be!!
    Notice that CASE is not used very often... many programmers shy this construct and there have been reports from time to time that it needs STACK of unclear amount...

    Some expreimenting shows that a problem occurs ONLY with a most spefic CASE match pattern...
    As soon as you omit ",1" or ",2" everything works fine.
    It also seems to run when ordering the values, i.e. "1,i:" and "2,i+1:"

    Conclusion: There is something weird with complex match patterns, disturbing the stack. This might or might not self-repair in a normal program, but COGINIT after some case labels seems to be very susceptible to it...

    Post Edited (deSilva) : 2/18/2008 8:50:42 PM GMT
  • mirrormirror Posts: 322
    edited 2008-02-18 22:15
    deSilva, The case statement is a little strange in any case, as half the conditions will never be true. N is a constant - during program execution - with a value of 6, which will never be equal to 1 or 2. So the original case statement is logically equivalent to the following pseudocode·:

    case i
      5 :     ' Do nothing
      6 : CogInit(GPS)
    CogInit(LED)
    

    When I mentioned that I had a problem some months ago, there was no stopping and starting of cogs but there was a huge case statement used to parse incoming comms messages. So maybe there is something with case statements!?

    Hippy, There is reason to start and stop cogs and Paul has identified the situation where it is of most benefit. He is trying to save power by running a slow clock (10Mhz) and by having the minimal number of tasks (cogs) running at a time.

    Phil, I'm not convinced of the error of using coginit. Once again Paul has identified the exact situation where it's needed, for complete control of the cog allocation. Maybe·it's possible that there's a problem if a cog is forcibly stopped while in the middle of a wait instruction or some other specific situation!?

    What Paul has so kindly given us is a minimal piece of code which demonstrates the problem. His logic and deduction are superior to have been able to give us such a concise piece of code with which even Phil was able to reproduce the problem (thanks Phil).

    Unfortunately I'm without usable hardware right at this moment. I'm interested in this thread because I've seen the instability Paul has spoken of but in a·different set of circumstances. It's not the sort of thing that's going to stop me from enjoying my work with the Propeller, but if we can identify the BEWARE then we will all write better code!
    ·
  • Paul VossPaul Voss Posts: 13
    edited 2008-02-18 22:31
    Thanks all for the good dicussion - All your good comments have convinced me that clobbering running cogs with a coginit is not good practice (perhaps allowed, but not a good idea). I have changed my code so that all cogs stop on their own before they are ever hit with a coginit. The code appears to be 100% stable now. Thank you!!

    I will also be careful using the case statement - it is efficient in my real code because there are many possible values of N and hence all checked cases do see action. None the less, if I continue to have any problems, it would be simple enough to replace case with some if-then lines.

    It the code behaves well over the next 24 hours, I will be able to ship the balloons. In a couple weeks, live flight data will be posted on www.science.smith.edu/cmet.

    Thanks to all.

    Paul
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-18 23:01
    Good to hear!
    But please, mirror et al. : Listen to what I posted, not to your prejudices wrt COGINIT, which is a very fine an reliable instruction

    And yes, there is something with case match patterns smile.gif

    Post Edited (deSilva) : 2/18/2008 11:10:36 PM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-02-18 23:09
    Mirror,

    'Sorry to disagree, but there's no good reason ever to have or want complete control over cog allocation. All the cogs are alike. Why insist on picking one over another? It doesn't save any time, and doing so can be a recipe for disaster, particularly when using third-party objects that spawn their own cogs. The Propeller provides a completely transparent cog allocation mechanism via COGNEW, which returns the cog number for those rare occasions when you need ot know it. It's simply the right tool for the job.

    Of course, the question of why Paul's original program fails is still open, and those into Propeller program pathology can perhaps dig up a cause. My approach is more like the doctor in the following conversation:

    ····Patient: "Doc, it hurts when I use COGINIT."
    ····Doctor: "Then don't use COGINIT."

    -Phil
  • Paul BakerPaul Baker Posts: 6,351
    edited 2008-02-18 23:16
    While we don't specifically check for it, we strongly recomend objects on the exchange to never use COGINIT, for the very reasons Phil has said. And in general unless you have a very specific reason for using it, it should be entirely avoided.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-18 23:24
    Patient: "Doc, I live under the misconception of being a dog and moving on my four limbs amazes my family."
    Doctor. "Just sit up and beg."
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-19 01:42
    Sorry, I lost all of my optimism.

    After some more testing it becomes curioser and curioser. It HAS TO DO with COGINIT, WAITCNT, but with many things more..... Maybe I shall start reading the bytecode....
    However I still do not think it is an issue of COGINIT but a compiler bug... I can produce the issue with IF as well, not needing a CASE.... But there is NEVER an issue with linear code and simple loops....

    I give up for the moment...
  • OzStampOzStamp Posts: 377
    edited 2008-02-19 01:52
    Hi

    It would be good if the creator " herr Chip " could step in here and sniff it out..

    cheers Ron
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-19 02:27
    This is the smallest piece of code I can produce the problem with. Note that when changing ANYTHING here it will work fine.
    E.g. removing the unused local variable in CHECK or changing the WAIT intervals...
    ''PROGRAM TO DEMONSTRATE COG CONFLICTS
    ' changed by deSilva
    
    CON
      _clkmode = xtal1 + pll8x                
      _xinfreq = 10_000_000     'attention: HYDRA settings                       
    
    OBJ
      dbug : "FullDuplexSerial"                      
    
    VAR                                                     
      long  aStack[noparse][[/noparse]100]                              
      long  bStack[noparse][[/noparse]100]                             
    
    PUB MAIN | c,i ,N
     
      dbug.start(31,30,%0000,19200)         
    
      c:=cnt
      repeat
        repeat i from 1 to 2   
           waitcnt(clkfreq/2+c)                    
           c := cnt
           dbug.dec(N++)                                      
           dbug.tx(10)                                            
    
     ' Remove the following two lines and it works fine for a longtime...
          IF N==10    
               cognew(check,@bStack) 
    
          repeat 3000 ' this will give enough time to settle COGNEW
          coginit(3, check, @aStack) 
          
    
    PRI CHECK | d
      outa[noparse][[/noparse]0] := 0                          
      dira[noparse][[/noparse]0] := 1                            
      repeat                             
        waitcnt(clkfreq/2+cnt)
    

    Post Edited (deSilva) : 2/19/2008 2:33:20 AM GMT
  • mirrormirror Posts: 322
    edited 2008-02-19 02:43
    deSilva said...
    Sorry, I lost all of my optimism.

    After some more testing it becomes curioser and curioser. It HAS TO DO with COGINIT, WAITCNT, but with many things more..... Maybe I shall start reading the bytecode....
    However I still do not think it is an issue of COGINIT but a compiler bug... I can produce the issue with IF as well, not needing a CASE.... But there is NEVER an issue with linear code and simple loops....

    I give up for the moment...
    We all hope that it might be a compiler bug . . .

    The other possibility is a bug in the interpreter - but I really really hope not.

    Chip is·pretty amazing, but to err is human.·It wouldn't be the first chip with an errata, so that in itself doesn't bother me.·What bothers me is that I possibly stumbled into and back out of it months ago without being able to extract a sufficiently compact piece of code to post to the forum at the time.

    ·
  • hippyhippy Posts: 1,981
    edited 2008-02-19 03:15
    I cannot see anything obviously wrong with the bytecode ...

    0020                         PINIT    ALIGN    SPIN 
    
    ====                                  ; PUB MAIN | c,i ,N
    ====                                  ;   dbug.start(31,30,%0000,19200)
    ====                                  ;   c:=cnt
    ====                                  ;   repeat
    ====                                  ;     repeat i from 1 to 2
    ====                                  ;        waitcnt(clkfreq/2+c)
    ====                                  ;        c := cnt
    ====                                  ;        dbug.dec(N++)
    ====                                  ;        dbug.tx(10)
    ====                                  ;  ' Remove the following two lines and it works fine for a longtime...
    ====                                  ;       IF N==10
    ====                                  ;            cognew(check,@bStack)
    ====                                  ;       repeat 3000 ' this will give enough time to settle COGNEW
    ====                                  ;       coginit(3, check, @aStack)
    
                                          ALIGN    STACK          ; For S5 
    
          +0000                           LONG     0              ; Unused Result Variable
          +0004                  VL1      LONG     0
          +0008                  VL2      LONG     0
          +000C                  VL3      LONG     0
    
                                          ALIGN    SPIN 
    
    0020         01              S5       FRAME    CALL WITHOUT RETURN VALUE
    0021         37 24                    PUSH     #$1F
    0023         38 1E                    PUSH     #$1E
    0025         35                       PUSH     #0
    0026         39 4B 00                 PUSH     #19200
    0029         06 03 01                 CALLOBJ  O11, +1
    002C         3F 91                    PUSH     CNT
    002E         65                       POP      VL1
    002F         36              J6       PUSH     #1
    0030         69                       POP      VL2
    0031         35              J7       PUSH     #0
    0032         C0                       PUSH     MEM[noparse][[/noparse]]
    0033         37 00                    PUSH     #2
    0035         F6                       DIV
    0036         64                       PUSH     VL1
    0037         EC                       ADD
    0038         23                       WAITCNT
    0039         3F 91                    PUSH     CNT
    003B         65                       POP      VL1
    003C         01                       FRAME    CALL WITHOUT RETURN VALUE
    003D         6E AE                    USING    VL3 PUSH POSTINC
    003F         06 03 09                 CALLOBJ  O11, +9
    0042         01                       FRAME    CALL WITHOUT RETURN VALUE
    0043         38 0A                    PUSH     #10
    0045         06 03 07                 CALLOBJ  O11, +7
    0048         6C                       PUSH     VL3
    0049         38 0A                    PUSH     #10
    004B         FC                       EQ
    004C         0A 07                    JPF      N8
    004E         37 00                    PUSH     #2
    0050         CB 81 90                 PUSH     #L43
    0053         15                       MARK
    0054         2C                       COGISUB
    0055         39 0B B8        N8       PUSH     #3000
    0058         08 02                    LOOPJPF  N10
    005A         09 7E           J9       LOOPRPT  J9
    005C         37 00           N10      PUSH     #2
    005E         43                       PUSH     #L42
    005F         15                       MARK
    0060         37 21                    PUSH     #3
    0062         3F 8F                    PUSH     MEM+15
    0064         37 61                    PUSH     #$FFFFFFFC
    0066         D1                       POP      MEM[noparse][[/noparse]][noparse][[/noparse]]
    0067         2C                       COGISUB
    0068         36                       PUSH     #1
    0069         37 00                    PUSH     #2
    006B         6A 02 43                 USING    VL2 RPTINCJ J7
    006E         04 FF BE                 GOTO     J6
    0071         32                       RETURN   
    
    
    
  • cgraceycgracey Posts: 14,244
    edited 2008-02-19 04:00
    deSilva said...
    This is the smallest piece of code I can produce the problem with. Note that when changing ANYTHING here it will work fine.
    E.g. removing the unused local variable in CHECK or changing the WAIT intervals...
    ''PROGRAM TO DEMONSTRATE COG CONFLICTS
    ' changed by deSilva
    
    CON
      _clkmode = xtal1 + pll8x                
      _xinfreq = 10_000_000     'attention: HYDRA settings                       
    
    OBJ
      dbug : "FullDuplexSerial"                      
    
    VAR                                                     
      long  aStack[noparse][[/noparse]100]                              
      long  bStack[noparse][[/noparse]100]                             
    
    PUB MAIN | c,i ,N
     
      dbug.start(31,30,%0000,19200)         
    
      c:=cnt
      repeat
        repeat i from 1 to 2   
           waitcnt(clkfreq/2+c)                    
           c := cnt
           dbug.dec(N++)                                      
           dbug.tx(10)                                            
    
     ' Remove the following two lines and it works fine for a longtime...
          IF N==10    
               cognew(check,@bStack) 
    
          repeat 3000 ' this will give enough time to settle COGNEW
          coginit(3, check, @aStack) 
          
    
    PRI CHECK | d
      outa[noparse][[/noparse]0] := 0                          
      dira[noparse][[/noparse]0] := 1                            
      repeat                             
        waitcnt(clkfreq/2+cnt)
    

    I believe the problem is due to the (undocumented) fact that when Spin instructions COGNEW and COGINIT are·used to launch·other Spin routines (as opposed to just assembly code), a special sequence of Spin bytecodes in ROM are called to build the initial stack frame for the soon-to-be-launched Spin routine. This process takes a little time, After it is completed, the·actual COGINIT is executed to kick off·the cog with the Spin interpreter pointed to the newly-initialized stack frame.

    The blow-up occurs when this new stack frame being built is in the same area that a Spin cog is already working in. This can cause nasty problems, but may not always, making it all the more dangerous.

    I'm confident that if the above example were modified so that the COGINIT used alternating stack areas (not always "@aStack"), there would be no problem, as the new stack frame being built wouldn't already be in active use.

    Also, a COGSTOP before the COGINIT, in this case, would solve this problem. However, it would introduce a new possible problem of allowing other cogs to grab that temporarily-stopped cog·by a COGNEW·of their own, before your own·COGINIT would actually execute.

    Perhaps this would be the simplest solution: have the Spin routine that you are referencing in the COGINIT ("check" in this case) consist of nothing but a call to another Spin routine, which would then form the loop, and never return. This would build the stack to a height that would exceed the·top-most long being modified by the re-launching COGINIT.

    Basically, relaunching Spin code into an already-being-used stack area is like playing Russian Roulette. You need to either do a COGSTOP first, use a different stack area, or know that the already-active stack is currently at a height which won't mind its bottom being modified.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 2/19/2008 4:17:03 AM GMT
  • OzStampOzStamp Posts: 377
    edited 2008-02-19 04:08
    Thanks Chip.

    So for all those accusations... there is no compiler error.. no hardware bug..

    No mystery.. just plain facts.. just what was needed
    Thanks Chip.. take care..

    Ron Nollet Mel OZ
  • cgraceycgracey Posts: 14,244
    edited 2008-02-19 04:41
    OzStamp said...
    Thanks Chip.

    So for all those accusations... there is no compiler error.. no hardware bug..

    No mystery.. just plain facts.. just what was needed
    Thanks Chip.. take care..

    Ron Nollet Mel OZ
    Ron,

    I hope this is it. DeSilva's example made it easy for me to see.

    Perhaps one of you can confirm the theory. I've just got Propeller II stuff in front of me.

    I could modify the compiler to generate roughly this sequence in response to a COGINIT(cognum, spinroutine, @stack):

    · COGINIT(cognum, @asmloop, 0)···········'sort of like COGSTOP, but keeps the cog tied up
    · COGINIT(cognum, spinroutine, @stack)···'do COGINIT as usual

    DAT

    asmloop·· jmp·· #0······················ 'an assembly-language·infinite loop

    Can any of you think of any related pitfall scenarios that might still be out there? Would this compiler modification be a good idea? It would only apply to the case of COGINIT being used to launch a Spin routine, and would always burden that sequence with perhaps·10 bytes of·code.

    And, if any of you can provide examples of problems with CASE, I'm very interested in addressing this. I don't know of any trouble, myself, but a few of you mentioned there might be some issues.

    Thanks.



    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 2/19/2008 4:49:14 AM GMT
  • Mike GreenMike Green Posts: 23,101
    edited 2008-02-19 05:07
    Chip,
    Please leave the compiler as it is (in regard to this case). This is an issue for documentation. This goes under Tricks and Traps and hopefully becomes one of many examples in an "Introduction to Multiprocessing with the Propeller" tutorial.
Sign In or Register to comment.