Help with suspected memory corruption / pointer mishandling?

Thomee · 2011-12-10 12:22

I'm building a board to control several strings of GE ColorEffects Christmas lights. These are LED based lights, with 3 LEDs (RGB) in each bulb, with each bulb individually addressable to control the brightness of each LED. The controller that comes with them from the factory does some nice patterns, but there's tons more potential if you build your own controller, which is what I'm attempting.

The communications protocol for the lights uses bits which are 30us wide, with the first 10us low, the middle 10us high or low depending on the value of the bit, and the last 10us high. This means my code needs to change state every 10us, so driving the pins needs to be done from PASM rather than SPIN code. Each message is 26 bits long, plus a 10us start pulse, plus 30-40us or more between messages, giving a message time of >800us. This is enough time for me to generate the messages in SPIN code, keeping the logic a lot prettier. Based on this, my design is to have a cog running SPIN code which generates messages, putting them in a ring buffer in hub ram. Another cog, running PASM code, monitors that ring buffer and handles actually sending the message when there's one available.

My first stab at things supported driving only a single string of lights, and I got it working fine. However, I want to be able to control several strings of lights from a single Propeller, and don't want to be overly constrained by the number of cogs. The 30-40us inter-message time is plenty for checking several ring buffers for new messages, and the 10us state change time is plenty to update the state of several output pins. So my real plan is to have a single PASM cog which can drive up to 8 strings of lights. This required a significant re-work of my output driver cog, as I could no longer block waiting for new messages to be available in the message buffer.

My new implementation works, sort of. It properly drives a single output pin (haven't yet tested the rest), but I think I'm just getting lucky, and it's not really quite correct. Trivial changes, like adding or removing diagnostic output sent over the serial port, or adding a new SPIN subroutine which is never called, or adding or removing unused variables, tend to break things. I suspect that I'm mishandling a pointer somewhere, and overwriting parts of memory I shouldn't, or reading from uninitialized memory. That would explain why my seemingly irrelevant changes break things; they move where things are laid out in ram, and I no longer get lucky.

Can anyone see anything obviously incorrect in this code? Or can someone suggest some good tools/approaches to trying to find and fix this bug? Alternatively, if I'm doing things the hard or strange way, and there's a more standard way of doing things, I'd be open to suggestions of better approaches to the problem and rewriting. A lot of my internal mental dialog and thought process are in comments at the end of the code; feel free to ignore or tell me I'm crazy ;-) .

''***** 
''*  Color Effects v0.7 
''*  Author: Thomee Wright 
''*  Copyright (c) 2011 
''*  All rights reserved 
''***** 
 
' v0.1 - 2011/09/22 - Initial version 
' v0.2 - 2011/09/26 - Basic light comms and reading from hub mem 
' v0.5 - 2011/09/30 - Full control of a single string running a single pattern 
' v0.7 - 2011/12/06 - Initial implementation of multi-string support sort of working 
 
 
CON 
 
  _clkmode = xtal1 + pll16x 
  _xinfreq = 5_000_000 
 
  ' P0..15 available; P16..23 VGA; P24..27 kb/mouse; P28..31 boot eeprom/serial 
  LightPin = 0 
  MsgBufLen = 100 
  StringCount = 8 
 
 
VAR 
 
  long MsgTmp 
  byte Buf, Bulb, Phase, Color 
  byte Blue, Green, Red 
  byte i 
 
  long t1, t2, t3 
 
 
' 
' 
' Arrays for passing data to ColorEffects driver 
' 
  long BufAddrs[8], BufEnds[8], BufHeads[8], BufTails[8], Messages[8] 
  long Buf0[MsgBufLen], Buf1[MsgBufLen], Buf2[MsgBufLen], Buf3[MsgBufLen] 
  long Buf4[MsgBufLen], Buf5[MsgBufLen], Buf6[MsgBufLen], Buf7[MsgBufLen] 
  'long MsgBuf[MsgBufLen * StringCount] 
    
 
OBJ 
  term : "Parallax Serial Terminal" 
 
 
PUB start 
 
  ' start serial terminal 
  term.start(115200) 
  waitcnt(clkfreq * 5 + cnt) 
  term.str(string("Start", $0D)) 
 
  ' we need to set our pin as an output and output a 0 
  ' so we pull it low when the driver is idling on it 
  dira := $0FF                  ' ouput on first 8 pins 
  outa := $00                   ' set them all low 
 
  term.str(string("Prepping for messages", $0D)) 
  BufAddrs[0] := @Buf0 
  BufAddrs[1] := @Buf1 
  BufAddrs[2] := @Buf2 
  BufAddrs[3] := @Buf3 
  BufAddrs[4] := @Buf4 
  BufAddrs[5] := @Buf5 
  BufAddrs[6] := @Buf6 
  BufAddrs[7] := @Buf7 
  repeat Buf from 0 to 7 
    'BufAddrs[Buf] := @MsgBuf[MsgBufLen * Buf] 
    Messages[Buf] := 0 
    repeat Bulb from 0 to 49 
      MsgTmp := 0 
      BYTE[@MsgTmp][3] := Bulb 
      BYTE[@MsgTmp][2] := $CC 
      BYTE[@MsgTmp][1] := $F0 
      LONG[BufAddrs[Buf]][Bulb] := MsgTmp 
    BufHeads[Buf] := BufAddrs[Buf] + 200 ' 50*4bytes (next space to write) 
    BufTails[Buf] := BufAddrs[Buf] + MsgBufLen * 4 
    BufEnds[Buf] := BufAddrs[Buf] + MsgBufLen * 4 
    term.hex(Buf, 1) 
    term.char(" ") 
    term.hex(BufAddrs[Buf], 8) 
    term.char(":") 
    term.hex(BufEnds[Buf], 8) 
    term.char($0D) 
 
{ 
  Pads[0] := @Pad0   
  Pads[1] := @Pad1   
  Pads[2] := @Pad2   
  Pads[3] := @Pad3   
  Pads[4] := @Pad4   
  Pads[5] := @Pad5   
  Pads[6] := @Pad6 
  }   
      
  term.str(string("Prepping DAT area", $0D)) 
  t_bit := 800                  ' length of bit part in clock counts 
  t_msg := 3200                 ' inter-message delay in clock counts 
  start_pin := 0                ' which output pin to start at                                 
  buf_addrs := @BufAddrs        ' Addresses of message queues 
  buf_ends := @BufEnds          ' Ends of message queues 
  buf_hptrs := @BufHeads        ' Array of head pointers (1 per queue) 
  buf_tptrs := @BufTails        ' Array of tail pointers (1 per queue) 
  buf_cnt := 1                  ' Count of queues (1-8) 
  msg_ptrs := @Messages         ' Array of buffers for driver (1 per queue); we never read or write here 
 
  term.str(string("Starting lights cog", $0D)) 
  cognew(@lights, 0) 
 
  term.str(string("Starting pattern generator", $0D)) 
  Walk1 
  repeat 
 
 
PRI Walk1 
  i := 0 
  repeat 
    repeat Phase from 0 to 7 
      repeat Bulb from 0 to 49 
        MsgTmp := 0 
        BYTE[@MsgTmp][3] := Bulb 
        BYTE[@MsgTmp][2] := $CC 
        Color := Bulb + Phase 
        if Color & $01 
          Byte[@MsgTmp][1] |= $F0 
        if Color & $02 
          Byte[@MsgTmp][1] |= $0F 
        if Color & $04 
          Byte[@MsgTmp][0] |= $F0 
        repeat while BufHeads[0] + 4 == BufTails[0] 
          { 
          term.hex(BufHeads[0], 8) 
          term.char("-") 
          term.hex(BufTails[0], 8) 
          term.char("(") 
          term.hex(i++, 2) 
          term.char(")") 
          term.char($0D) 
          } 
        LONG[BufHeads[0]] := MsgTmp 
        MsgTmp := BufHeads[0] + 4 
        if (MsgTmp == BufEnds[0]) 
          MsgTmp := BufAddrs[0] 
        BufHeads[0] := MsgTmp 
      repeat Red from 1 to 8 
        waitcnt(clkfreq / 10 + cnt)         
          'term.hex(BufHeads[0], 8) 
          'term.char(":") 
          'term.hex(BufTails[0], 8) 
          { 
          repeat i from 0 to 6 
            if LONG[Pads[i]] <> 0 
              term.str(string("Pad")) 
              term.dec(i) 
              term.char(":") 
              term.hex(LONG[Pads[i]], 8) 
              } 
          
 
DAT 
 
' 
' 
' Data tables for pattern generation 
' 
Gamma                   byte    $00, $00, $01, $01, $02, $02, $03, $03,  $04, $05, $06, $07, $09, $0B, $0D, $0F                             
GammaHigh               byte    $00, $00, $10, $10, $20, $20, $30, $30, $40, $50, $60, $70, $90, $B0, $D0, $F0 
 
 
DAT 
 
'******************************************************* 
'* Assembly language GE ColorEffects G-35 light driver * 
'******************************************************* 
 
                        org     0 
lights                  ' first, one time setup 
                        mov     time, cnt               ' get current sys counter 
                        add     time, t_msg             ' make sure we  have at least one full inter-message time before we start 
 
' inter message time; prep work for each string 
' future: handle pending control messages like wait 
' fetch head and tail, see if there's work to do 
' fetch next message if relevant 
' save message - for control messages 
' turn each string's output on or off as appropriate 
:sendmsg 
                        mov     strcnt, buf_cnt         ' we'll do prep work for each string 
                        mov     shiftbuf_addr, #shiftbuf' get address of first shiftbuf 
                        mov     str_bit, #$01           ' first string is at lowest order bit 
                        mov     buf_addr_ptr, buf_addrs 
                        mov     buf_end_ptr, buf_ends 
                        mov     buf_hptr, buf_hptrs 
                        mov     buf_tptr, buf_tptrs 
                        mov     msg_ptr, msg_ptrs 
                        mov     dirb, #0                ' we stage in dirb; disable all 
:prep_string 
                        rdlong  message, msg_ptr        ' fetch old message (for future use with control messages like wait) 
                        test    message, ctlmsg_flag wc ' if C flag is set, it's a control message 
              if_c      jmp     #:ctl_msg               ' if it's a control message, handle it differently 
                        rdlong  buf_addr, buf_addr_ptr  ' get the address of the buffer for the current string 
                        add     buf_addr_ptr, #4        ' increment for the next string 
                        movd    :mov_shiftbuf, shiftbuf_addr ' modify instruction below to store to appropriate shiftbuf 
                        rdlong  buf_end, buf_end_ptr    ' get the end ptr 
                        add     buf_end_ptr, #4         ' increment end_ptr to next string           
                        add     shiftbuf_addr, #1       ' point to next string's shiftbuf 
                        rdlong  buf_tail, buf_tptr      ' get the tail ptr 
                        cmp     buf_tail, buf_end   wz  ' check if we need to wrap (See Note 1) 
              if_z      mov     buf_tail, buf_addr      ' wrap to start if necessary 
                        rdlong  buf_head, buf_hptr      ' get the head ptr for the current string 
                        cmp     buf_head, buf_tail  wz  ' check if there's a message ready; if z flag is set, there isn't 
              if_z      jmp     #:msgready              ' output is  disabled by default, just don't enable                         
                        rdlong  message, buf_tail       ' fetch next message 
                        add     buf_tail, #4            ' increment our ring buffer pointer 
                        test    message, ctlmsg_flag wc ' if C flag is set, it's a control message 
                        wrlong  buf_tail, buf_tptr      ' update tail ptr in hub RAM so generator can write more (See Note 1) 
              if_nc     or      dirb, str_bit           ' if not control, enable output for this string 
              if_nc     jmp     #:msgready              ' if not control, jump below 
:ctl_msg 
                        ' stuff to do if it's a control message                         
:msgready 
                        wrlong  message, msg_ptr        ' write our message back (for future use; see above) 
                        add     msg_ptr, #4             ' update msg_ptr for next string 
                        add     buf_tptr, #4            ' update tptr for next string 
                        add     buf_hptr, #4            ' update hptr for next string 
                        shl     str_bit, #1 
:mov_shiftbuf           mov     shiftbuf, message       ' move the  message to send into correct shift buffer (address was set above) 
                        djnz    strcnt, #:prep_string   ' prep the next string, if any 
 
                        shl     dirb, start_pin         ' shift output mask over to appropriate pin 
                        mov     dira, dirb              ' set output pins 
                         
                        shl     shiftbuf+0, #2          ' shift out unused top 2 bits of address 
                        shl     shiftbuf+1, #2 
                        shl     shiftbuf+2, #2 
                        shl     shiftbuf+3, #2 
                        shl     shiftbuf+4, #2 
                        shl     shiftbuf+5, #2 
                        shl     shiftbuf+6, #2 
                        shl     shiftbuf+7, #2 
                        waitcnt time, t_bit             ' wait till the specified time, then add delay to our counter 
                        nop                             ' for consistent  timing, always one instruction between waitcnt and mov outa, ... 
                        mov     outa, #$01              ' message start bitpart - set pin 0 high 
 
' message start time 
' prep to loop through all bits 
                        mov     bitcnt, #26             ' 26 bits in a message                         
:sendbit                waitcnt time, t_bit             ' wait (either inter-message time or end bit part of last bit) 
                        nop                             ' for consistent  timing, always one instruction between waitcnt and mov outa, ... 
                        mov     outa, #$00              ' bit start bitpart - set pin 0 low 
 
' start of bit time 
' shift top bit out of each shiftbuf 
' stage output pattern in outb 
                        mov     outb, #0                ' default to all outputs low 
                        shl     shiftbuf+0, #1  wc      ' shift out top bit, put it in the carry flag                         
              if_nc     or      outb, #$001             ' if we're  sending a 0 bit, set our pin high  now                                        
                        shl     shiftbuf+1, #1  wc                         
              if_nc     or      outb, #$002                                        
                        shl     shiftbuf+2, #1  wc      ' looping  through these takes 9 instructions plus                          
              if_nc     or      outb, #$004             ' 1 register.  Hard coding it takes 16  instructions                                        
                        shl     shiftbuf+3, #1  wc      ' and no  registers. But the hard coded version takes                          
              if_nc     or      outb, #$008             ' 16*4 clock  cycles to run, while the loop  version                                        
                        shl     shiftbuf+4, #1  wc      ' takes  (4+strings*6)*4 clock cycles to run. For more                         
              if_nc     or      outb, #$010             ' than 2  strings, the hard coded version is  faster.                                        
                        shl     shiftbuf+5, #1  wc      ' The loop version is also self-modifying code                         
              if_nc     or      outb, #$020                                        
                        shl     shiftbuf+6, #1  wc                         
              if_nc     or      outb, #$040                                        
                        shl     shiftbuf+7, #1  wc 
              if_nc     or      outb, #$080                                        
                        waitcnt time, t_bit 
                        shl     outb, start_pin         ' shift output bits over to appropriate pin 
                        mov     outa, outb              ' middle bitpart - output our pattern 
 
{ 
' loop version of calculating output pattern 
                        mov     foo, #shiftbuf 
                        mov     str_bit, #$001 
                        mov     strcnt, buf_cnt 
:x                      movd    :y, foo 
                        add     foo, #1 
:y                      shl     shiftbuf, #1    wc 
              if_nc     or      outb, bar 
                        shl     str_bit, #1 
                        djnz    buf_cnt, :x 
}                           
 
' 
' middle of bit time 
' stage output (all high) in outb 
                        mov     outb, $0FF              ' output high on all 8 pins (as enabled) 
                        shl     outb, start_pin         ' shift output bits over to appropriate pin                         
                        waitcnt time, t_bit 
                        nop                             ' always one instruction, yadda, yadda 
                        mov     outa, outb              ' bit end bitpart - set pins high 
 
' end of bit time 
' check if there are more bits to send                         
                        djnz    bitcnt, #:sendbit       ' if there are still more bits, go back up and do the next one 
                        waitcnt time, t_msg             ' finish out our last bit, prep for inter-message delay 
                        nop                             ' 1 inst, blah, blah 
                        mov     outa, #$00              ' inter-message delay - set pins low 
                        jmp     #:sendmsg               ' get ready for the next message 
 
:end                    jmp     #:end                   ' shouldn't ever get here, but this keeps us from running wild 
 
 
 
' 
' 
' Initialized data 
' 
t_bit                   long    0 
t_msg                   long    0 
start_pin               long    0 
buf_cnt                 long    0 
buf_addrs               long    0 
buf_ends                long    0 
buf_hptrs               long    0 
buf_tptrs               long    0 
msg_ptrs                long    0 
 
' 
' 
' Constants 
' 
ctlmsg_flag             long    $80_00_00_00 
 
 
' 
' 
' Uninitialized data 
' 
time                    res     1 
strcnt                  res     1 
bitcnt                  res     1 
str_bit                 res     1 
buf_tail                res     1 
buf_head                res     1 
message                 res     1 
 
shiftbuf                res     8 
shiftbuf_addr           res     1 
 
buf_addr_ptr            res     1 
buf_end_ptr             res     1 
buf_addr                res     1 
buf_end                 res     1 
buf_hptr                res     1 
buf_tptr                res     1 
msg_ptr                 res     1 
 
 
' 
' 
' End of everything; make sure we're not too big 
' 
                        fit 
 
 
{ 
Note 1: 
We'll compare to buf_end to see if we need to wrap later. This means when it's 
time to wrap, we'll write buf_end, rather than buf_addr, to the tail pointer. 
This isn't a big deal as long as the generator knows to handle things the same way, 
and this makes the generator easier to write, as it can compare head+4 to tail without 
worrying about wrapping to see if it's safe to write more. 
} 

  
{ 
  multi string strategy 
  Unfortunately, we can't do things like buf_tptr[idx] or message[idx] in assembly, 
  as there's no way to do that sort of indirection. This makes looping through all 
  our strings to fetch the next message challenging. Luckily, if we handle shiftbuf 
  differently, there's only one section of code where we need to handle this: from 
  :sendmsg to the mov at :msgready. If we load temp versions of buf_addr, buf_end, 
  buf_hptr, buf_tptr, and message indexed from the arrays somehow, and write back 
  message from its temp copy at the end of that block, then we can loop over that 
  block using the temps and all will be good. For sendbit, we can hard code for all 
  eight at the point we shift and or into our output byte. This just leaves the 
  question of how to copy from our arrays into the temps. There are two pieces to 
  accomplishing this: 
  First, write a subroutine callable by call or jmpret which copies the first 
  element of each array into the temps, and label each line. 
  Second, write another subroutine which uses an index variable as a parameter 
    For each array, this routine stores the address of the array into a temp var 
    Next it adds the index parameter to the temp var, giving the address of the 
    appropriate value in the array. 
    Finally, the modified address is written into the appropriate instruction 
    in the first subroutine so it loads from the correct index, rather than 0. 
    This can be accomplished with MOVS and all the labels in the first subroutine 
  Now, we loop through the block which needs indexes, and at the start of each loop, 
  first call the rewriter subroutine passing the index as a parameter, then call the 
  copy to temp subroutine which will fetch the indexed values into the temp fields, 
  then operate on the temps. Finally, the temp of message can be written back in 
  a similar manner. 
  Actually, this could probably all be rolled into the block without the use of 
  subroutines: the block would have 4 sections: 
    1 - calculate indexed addresses and write to instructions in block 2 and 4 
    2 - copy from indexed addresses to temp variables 
    3 - current block contents, operating on temps 
    4 - write back necessary temps 
  REMEMBER!!: if an instruction modifies the instruction which immediately follows 
  it, then when the modified instruction executes, the old version will be run, because 
  it was already fetched into the pipeline before the new version was written back. 
  This is why parts 1 and 2 in the outline above aren't interleaved 
 
  While looping through that block, the value to write to DIRA should be staged in 
  DIRB, which is unused on our chip, then shifted approprieately and copied to DIRA 
  at the end.  The same shoudl be done when shifting out of our shiftbufs: the pattern 
  to output should be staged in OUTB, then shifted and copied to OUTA at the end 
 
  Another alternative: fetching from main mem can be accomplished in the time of 2 
  instructions, once we're sync'd with the hub. This is as fast as the self-modifying 
  code idea above, saves cog memory, and is probably more readable, and incrementing 
  an address through main RAM is easy.    
} 
{ 
Thoughts: 
One cog will drive up to 8 contiguous pins 
Driver cog will pull messages from hub ram, one message to a word 
  address in top top byte, brightness in second byte, color in bottom word 
  2 free bits in address allow for special message types; ideas: 
    turn off output for that pin for the next n message times 
      either need logic to constantly check whether the pin is disabled, 
      or disable output on that pin, but this requries another cog to have the 
      pins enabled and set low, or a pull down resistor 
        possible overall design: 
          cog 0: manager 
            spin 
            keeps all color effects pins pulled low 
            handles status output, command input, etc 
            in charge of starting/stopping pattern generators, color effects drivers 
          cog 1-4: drivers 
          this leaves 3 cogs as generators or  
    halt progression through the fifo 
      pattern generator could use this as a stop: fill the fifo past here, then rewrite 
      this location to allow progress again 
    noop 
    jump to new location 
    auto increment address - requires saving prev address 
    using jmpret can allow ongoing processing while looping through bits to 
      handle some of this overhead 
    wait - waits until the driver is signalled through some other means 
      then all waiting ringbufs resume 
Message words in main mem will be in ring buffers 
messages to lights can terminate after brightness when sending broadcast 
  allow driver to send this message type? 
  would only work if all strings being controlled were doing a truncated broadcast 
  at the same time - for coordinated strings being generated by a single generator, 
  this may not be unrealistic 
 
timing: bitpart means the time for 1/3 of a bit, 10us on factory controller 
as long as timing is accurate and consistent, bitpart times between 5us and 
15us have been reported to work 
10us gives 800 clock cycles, or ~200 instructions, per bitpart 
we need at least 3 bitparts between messages, during which time we're outputting 
low whether there's another message coming or not. This gives us time to fetch the 
next message from each of our ring buffers and disable output if necessary. Timing 
here may not be as critical. It's certainly safe to stretch this at a bitpart of 
10us, but may be more important at other timescales. 
Start bit lasts for 1 bitpart and should give time to prep messages as necessary 
each bit lasts for 3 bitparts: 
  first bitpart is always low 
    output 0 
    examine bit for each output pin, construct output byte for second bitpart 
  second bitpart depends on bit 
    output byte constructed during first bitpart 
    shift message words 
  third bitpart is always high 
    output 1 
    determine if message is done 
    jump back to first bitpart or 
    fall through to inter-message time 
 
launching: 
  spin code accessing data in dat blocks modifies hub memory 
  launching assembly copies hub memory to cog memory 
  hub memory is effectively a template for cog memory 
  spin code can modify this template prior to launching 
    newly launched cogs will receive the modifications 
    already running cogs are unaffected 
  spin launch code can fill in bitpart time, ringbuf addresses, output 
    masks (or number of output pins and start pin?), etc, then launch 
    a new assembly cog with the modified code filled in 
 
mechanics: 
  shl can put shifted out bit in C flag 
  this can be used with optional execution for putting the bit into the output byte 
  maybe use muxc to conditionally set bit in output byte? 
  can use outb and dirb to build values before writing 
 
communication: 
  driver needs to know whether continuing through ring buf is valid; pattern 
  generator needs to know when it's safe to overwrite old data in the buffer 
  I don't think I can get away from needing a head and tail pointer for each 
  buffer. Data is read from the tail and written to the head. Headptr is only 
  ever written by generator cog, so it doesn't need to read it, just write it 
  every time. It only needs to read tail when it catches up to the last value 
  it read, so just go into a spin lock until the tail moves; if it'd already 
  moved since the last time we read it, we just hit the loop once and move on 
  with the up to date value. The converse goes for the driver cog reading from 
  and moving the tail.  
}