Help with suspected memory corruption / pointer mishandling?
Thomee
Posts: 1
I'm building a board to control several strings of GE ColorEffects Christmas lights. These are LED based lights, with 3 LEDs (RGB) in each bulb, with each bulb individually addressable to control the brightness of each LED. The controller that comes with them from the factory does some nice patterns, but there's tons more potential if you build your own controller, which is what I'm attempting.
The communications protocol for the lights uses bits which are 30us wide, with the first 10us low, the middle 10us high or low depending on the value of the bit, and the last 10us high. This means my code needs to change state every 10us, so driving the pins needs to be done from PASM rather than SPIN code. Each message is 26 bits long, plus a 10us start pulse, plus 30-40us or more between messages, giving a message time of >800us. This is enough time for me to generate the messages in SPIN code, keeping the logic a lot prettier. Based on this, my design is to have a cog running SPIN code which generates messages, putting them in a ring buffer in hub ram. Another cog, running PASM code, monitors that ring buffer and handles actually sending the message when there's one available.
My first stab at things supported driving only a single string of lights, and I got it working fine. However, I want to be able to control several strings of lights from a single Propeller, and don't want to be overly constrained by the number of cogs. The 30-40us inter-message time is plenty for checking several ring buffers for new messages, and the 10us state change time is plenty to update the state of several output pins. So my real plan is to have a single PASM cog which can drive up to 8 strings of lights. This required a significant re-work of my output driver cog, as I could no longer block waiting for new messages to be available in the message buffer.
My new implementation works, sort of. It properly drives a single output pin (haven't yet tested the rest), but I think I'm just getting lucky, and it's not really quite correct. Trivial changes, like adding or removing diagnostic output sent over the serial port, or adding a new SPIN subroutine which is never called, or adding or removing unused variables, tend to break things. I suspect that I'm mishandling a pointer somewhere, and overwriting parts of memory I shouldn't, or reading from uninitialized memory. That would explain why my seemingly irrelevant changes break things; they move where things are laid out in ram, and I no longer get lucky.
Can anyone see anything obviously incorrect in this code? Or can someone suggest some good tools/approaches to trying to find and fix this bug? Alternatively, if I'm doing things the hard or strange way, and there's a more standard way of doing things, I'd be open to suggestions of better approaches to the problem and rewriting. A lot of my internal mental dialog and thought process are in comments at the end of the code; feel free to ignore or tell me I'm crazy ;-) .
The communications protocol for the lights uses bits which are 30us wide, with the first 10us low, the middle 10us high or low depending on the value of the bit, and the last 10us high. This means my code needs to change state every 10us, so driving the pins needs to be done from PASM rather than SPIN code. Each message is 26 bits long, plus a 10us start pulse, plus 30-40us or more between messages, giving a message time of >800us. This is enough time for me to generate the messages in SPIN code, keeping the logic a lot prettier. Based on this, my design is to have a cog running SPIN code which generates messages, putting them in a ring buffer in hub ram. Another cog, running PASM code, monitors that ring buffer and handles actually sending the message when there's one available.
My first stab at things supported driving only a single string of lights, and I got it working fine. However, I want to be able to control several strings of lights from a single Propeller, and don't want to be overly constrained by the number of cogs. The 30-40us inter-message time is plenty for checking several ring buffers for new messages, and the 10us state change time is plenty to update the state of several output pins. So my real plan is to have a single PASM cog which can drive up to 8 strings of lights. This required a significant re-work of my output driver cog, as I could no longer block waiting for new messages to be available in the message buffer.
My new implementation works, sort of. It properly drives a single output pin (haven't yet tested the rest), but I think I'm just getting lucky, and it's not really quite correct. Trivial changes, like adding or removing diagnostic output sent over the serial port, or adding a new SPIN subroutine which is never called, or adding or removing unused variables, tend to break things. I suspect that I'm mishandling a pointer somewhere, and overwriting parts of memory I shouldn't, or reading from uninitialized memory. That would explain why my seemingly irrelevant changes break things; they move where things are laid out in ram, and I no longer get lucky.
Can anyone see anything obviously incorrect in this code? Or can someone suggest some good tools/approaches to trying to find and fix this bug? Alternatively, if I'm doing things the hard or strange way, and there's a more standard way of doing things, I'd be open to suggestions of better approaches to the problem and rewriting. A lot of my internal mental dialog and thought process are in comments at the end of the code; feel free to ignore or tell me I'm crazy ;-) .
''***** ''* Color Effects v0.7 ''* Author: Thomee Wright ''* Copyright (c) 2011 ''* All rights reserved ''***** ' v0.1 - 2011/09/22 - Initial version ' v0.2 - 2011/09/26 - Basic light comms and reading from hub mem ' v0.5 - 2011/09/30 - Full control of a single string running a single pattern ' v0.7 - 2011/12/06 - Initial implementation of multi-string support sort of working CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 ' P0..15 available; P16..23 VGA; P24..27 kb/mouse; P28..31 boot eeprom/serial LightPin = 0 MsgBufLen = 100 StringCount = 8 VAR long MsgTmp byte Buf, Bulb, Phase, Color byte Blue, Green, Red byte i long t1, t2, t3 ' ' ' Arrays for passing data to ColorEffects driver ' long BufAddrs[8], BufEnds[8], BufHeads[8], BufTails[8], Messages[8] long Buf0[MsgBufLen], Buf1[MsgBufLen], Buf2[MsgBufLen], Buf3[MsgBufLen] long Buf4[MsgBufLen], Buf5[MsgBufLen], Buf6[MsgBufLen], Buf7[MsgBufLen] 'long MsgBuf[MsgBufLen * StringCount] OBJ term : "Parallax Serial Terminal" PUB start ' start serial terminal term.start(115200) waitcnt(clkfreq * 5 + cnt) term.str(string("Start", $0D)) ' we need to set our pin as an output and output a 0 ' so we pull it low when the driver is idling on it dira := $0FF ' ouput on first 8 pins outa := $00 ' set them all low term.str(string("Prepping for messages", $0D)) BufAddrs[0] := @Buf0 BufAddrs[1] := @Buf1 BufAddrs[2] := @Buf2 BufAddrs[3] := @Buf3 BufAddrs[4] := @Buf4 BufAddrs[5] := @Buf5 BufAddrs[6] := @Buf6 BufAddrs[7] := @Buf7 repeat Buf from 0 to 7 'BufAddrs[Buf] := @MsgBuf[MsgBufLen * Buf] Messages[Buf] := 0 repeat Bulb from 0 to 49 MsgTmp := 0 BYTE[@MsgTmp][3] := Bulb BYTE[@MsgTmp][2] := $CC BYTE[@MsgTmp][1] := $F0 LONG[BufAddrs[Buf]][Bulb] := MsgTmp BufHeads[Buf] := BufAddrs[Buf] + 200 ' 50*4bytes (next space to write) BufTails[Buf] := BufAddrs[Buf] + MsgBufLen * 4 BufEnds[Buf] := BufAddrs[Buf] + MsgBufLen * 4 term.hex(Buf, 1) term.char(" ") term.hex(BufAddrs[Buf], 8) term.char(":") term.hex(BufEnds[Buf], 8) term.char($0D) { Pads[0] := @Pad0 Pads[1] := @Pad1 Pads[2] := @Pad2 Pads[3] := @Pad3 Pads[4] := @Pad4 Pads[5] := @Pad5 Pads[6] := @Pad6 } term.str(string("Prepping DAT area", $0D)) t_bit := 800 ' length of bit part in clock counts t_msg := 3200 ' inter-message delay in clock counts start_pin := 0 ' which output pin to start at buf_addrs := @BufAddrs ' Addresses of message queues buf_ends := @BufEnds ' Ends of message queues buf_hptrs := @BufHeads ' Array of head pointers (1 per queue) buf_tptrs := @BufTails ' Array of tail pointers (1 per queue) buf_cnt := 1 ' Count of queues (1-8) msg_ptrs := @Messages ' Array of buffers for driver (1 per queue); we never read or write here term.str(string("Starting lights cog", $0D)) cognew(@lights, 0) term.str(string("Starting pattern generator", $0D)) Walk1 repeat PRI Walk1 i := 0 repeat repeat Phase from 0 to 7 repeat Bulb from 0 to 49 MsgTmp := 0 BYTE[@MsgTmp][3] := Bulb BYTE[@MsgTmp][2] := $CC Color := Bulb + Phase if Color & $01 Byte[@MsgTmp][1] |= $F0 if Color & $02 Byte[@MsgTmp][1] |= $0F if Color & $04 Byte[@MsgTmp][0] |= $F0 repeat while BufHeads[0] + 4 == BufTails[0] { term.hex(BufHeads[0], 8) term.char("-") term.hex(BufTails[0], 8) term.char("(") term.hex(i++, 2) term.char(")") term.char($0D) } LONG[BufHeads[0]] := MsgTmp MsgTmp := BufHeads[0] + 4 if (MsgTmp == BufEnds[0]) MsgTmp := BufAddrs[0] BufHeads[0] := MsgTmp repeat Red from 1 to 8 waitcnt(clkfreq / 10 + cnt) 'term.hex(BufHeads[0], 8) 'term.char(":") 'term.hex(BufTails[0], 8) { repeat i from 0 to 6 if LONG[Pads[i]] <> 0 term.str(string("Pad")) term.dec(i) term.char(":") term.hex(LONG[Pads[i]], 8) } DAT ' ' ' Data tables for pattern generation ' Gamma byte $00, $00, $01, $01, $02, $02, $03, $03, $04, $05, $06, $07, $09, $0B, $0D, $0F GammaHigh byte $00, $00, $10, $10, $20, $20, $30, $30, $40, $50, $60, $70, $90, $B0, $D0, $F0 DAT '******************************************************* '* Assembly language GE ColorEffects G-35 light driver * '******************************************************* org 0 lights ' first, one time setup mov time, cnt ' get current sys counter add time, t_msg ' make sure we have at least one full inter-message time before we start ' inter message time; prep work for each string ' future: handle pending control messages like wait ' fetch head and tail, see if there's work to do ' fetch next message if relevant ' save message - for control messages ' turn each string's output on or off as appropriate :sendmsg mov strcnt, buf_cnt ' we'll do prep work for each string mov shiftbuf_addr, #shiftbuf' get address of first shiftbuf mov str_bit, #$01 ' first string is at lowest order bit mov buf_addr_ptr, buf_addrs mov buf_end_ptr, buf_ends mov buf_hptr, buf_hptrs mov buf_tptr, buf_tptrs mov msg_ptr, msg_ptrs mov dirb, #0 ' we stage in dirb; disable all :prep_string rdlong message, msg_ptr ' fetch old message (for future use with control messages like wait) test message, ctlmsg_flag wc ' if C flag is set, it's a control message if_c jmp #:ctl_msg ' if it's a control message, handle it differently rdlong buf_addr, buf_addr_ptr ' get the address of the buffer for the current string add buf_addr_ptr, #4 ' increment for the next string movd :mov_shiftbuf, shiftbuf_addr ' modify instruction below to store to appropriate shiftbuf rdlong buf_end, buf_end_ptr ' get the end ptr add buf_end_ptr, #4 ' increment end_ptr to next string add shiftbuf_addr, #1 ' point to next string's shiftbuf rdlong buf_tail, buf_tptr ' get the tail ptr cmp buf_tail, buf_end wz ' check if we need to wrap (See Note 1) if_z mov buf_tail, buf_addr ' wrap to start if necessary rdlong buf_head, buf_hptr ' get the head ptr for the current string cmp buf_head, buf_tail wz ' check if there's a message ready; if z flag is set, there isn't if_z jmp #:msgready ' output is disabled by default, just don't enable rdlong message, buf_tail ' fetch next message add buf_tail, #4 ' increment our ring buffer pointer test message, ctlmsg_flag wc ' if C flag is set, it's a control message wrlong buf_tail, buf_tptr ' update tail ptr in hub RAM so generator can write more (See Note 1) if_nc or dirb, str_bit ' if not control, enable output for this string if_nc jmp #:msgready ' if not control, jump below :ctl_msg ' stuff to do if it's a control message :msgready wrlong message, msg_ptr ' write our message back (for future use; see above) add msg_ptr, #4 ' update msg_ptr for next string add buf_tptr, #4 ' update tptr for next string add buf_hptr, #4 ' update hptr for next string shl str_bit, #1 :mov_shiftbuf mov shiftbuf, message ' move the message to send into correct shift buffer (address was set above) djnz strcnt, #:prep_string ' prep the next string, if any shl dirb, start_pin ' shift output mask over to appropriate pin mov dira, dirb ' set output pins shl shiftbuf+0, #2 ' shift out unused top 2 bits of address shl shiftbuf+1, #2 shl shiftbuf+2, #2 shl shiftbuf+3, #2 shl shiftbuf+4, #2 shl shiftbuf+5, #2 shl shiftbuf+6, #2 shl shiftbuf+7, #2 waitcnt time, t_bit ' wait till the specified time, then add delay to our counter nop ' for consistent timing, always one instruction between waitcnt and mov outa, ... mov outa, #$01 ' message start bitpart - set pin 0 high ' message start time ' prep to loop through all bits mov bitcnt, #26 ' 26 bits in a message :sendbit waitcnt time, t_bit ' wait (either inter-message time or end bit part of last bit) nop ' for consistent timing, always one instruction between waitcnt and mov outa, ... mov outa, #$00 ' bit start bitpart - set pin 0 low ' start of bit time ' shift top bit out of each shiftbuf ' stage output pattern in outb mov outb, #0 ' default to all outputs low shl shiftbuf+0, #1 wc ' shift out top bit, put it in the carry flag if_nc or outb, #$001 ' if we're sending a 0 bit, set our pin high now shl shiftbuf+1, #1 wc if_nc or outb, #$002 shl shiftbuf+2, #1 wc ' looping through these takes 9 instructions plus if_nc or outb, #$004 ' 1 register. Hard coding it takes 16 instructions shl shiftbuf+3, #1 wc ' and no registers. But the hard coded version takes if_nc or outb, #$008 ' 16*4 clock cycles to run, while the loop version shl shiftbuf+4, #1 wc ' takes (4+strings*6)*4 clock cycles to run. For more if_nc or outb, #$010 ' than 2 strings, the hard coded version is faster. shl shiftbuf+5, #1 wc ' The loop version is also self-modifying code if_nc or outb, #$020 shl shiftbuf+6, #1 wc if_nc or outb, #$040 shl shiftbuf+7, #1 wc if_nc or outb, #$080 waitcnt time, t_bit shl outb, start_pin ' shift output bits over to appropriate pin mov outa, outb ' middle bitpart - output our pattern { ' loop version of calculating output pattern mov foo, #shiftbuf mov str_bit, #$001 mov strcnt, buf_cnt :x movd :y, foo add foo, #1 :y shl shiftbuf, #1 wc if_nc or outb, bar shl str_bit, #1 djnz buf_cnt, :x } ' ' middle of bit time ' stage output (all high) in outb mov outb, $0FF ' output high on all 8 pins (as enabled) shl outb, start_pin ' shift output bits over to appropriate pin waitcnt time, t_bit nop ' always one instruction, yadda, yadda mov outa, outb ' bit end bitpart - set pins high ' end of bit time ' check if there are more bits to send djnz bitcnt, #:sendbit ' if there are still more bits, go back up and do the next one waitcnt time, t_msg ' finish out our last bit, prep for inter-message delay nop ' 1 inst, blah, blah mov outa, #$00 ' inter-message delay - set pins low jmp #:sendmsg ' get ready for the next message :end jmp #:end ' shouldn't ever get here, but this keeps us from running wild ' ' ' Initialized data ' t_bit long 0 t_msg long 0 start_pin long 0 buf_cnt long 0 buf_addrs long 0 buf_ends long 0 buf_hptrs long 0 buf_tptrs long 0 msg_ptrs long 0 ' ' ' Constants ' ctlmsg_flag long $80_00_00_00 ' ' ' Uninitialized data ' time res 1 strcnt res 1 bitcnt res 1 str_bit res 1 buf_tail res 1 buf_head res 1 message res 1 shiftbuf res 8 shiftbuf_addr res 1 buf_addr_ptr res 1 buf_end_ptr res 1 buf_addr res 1 buf_end res 1 buf_hptr res 1 buf_tptr res 1 msg_ptr res 1 ' ' ' End of everything; make sure we're not too big ' fit { Note 1: We'll compare to buf_end to see if we need to wrap later. This means when it's time to wrap, we'll write buf_end, rather than buf_addr, to the tail pointer. This isn't a big deal as long as the generator knows to handle things the same way, and this makes the generator easier to write, as it can compare head+4 to tail without worrying about wrapping to see if it's safe to write more. } { multi string strategy Unfortunately, we can't do things like buf_tptr[idx] or message[idx] in assembly, as there's no way to do that sort of indirection. This makes looping through all our strings to fetch the next message challenging. Luckily, if we handle shiftbuf differently, there's only one section of code where we need to handle this: from :sendmsg to the mov at :msgready. If we load temp versions of buf_addr, buf_end, buf_hptr, buf_tptr, and message indexed from the arrays somehow, and write back message from its temp copy at the end of that block, then we can loop over that block using the temps and all will be good. For sendbit, we can hard code for all eight at the point we shift and or into our output byte. This just leaves the question of how to copy from our arrays into the temps. There are two pieces to accomplishing this: First, write a subroutine callable by call or jmpret which copies the first element of each array into the temps, and label each line. Second, write another subroutine which uses an index variable as a parameter For each array, this routine stores the address of the array into a temp var Next it adds the index parameter to the temp var, giving the address of the appropriate value in the array. Finally, the modified address is written into the appropriate instruction in the first subroutine so it loads from the correct index, rather than 0. This can be accomplished with MOVS and all the labels in the first subroutine Now, we loop through the block which needs indexes, and at the start of each loop, first call the rewriter subroutine passing the index as a parameter, then call the copy to temp subroutine which will fetch the indexed values into the temp fields, then operate on the temps. Finally, the temp of message can be written back in a similar manner. Actually, this could probably all be rolled into the block without the use of subroutines: the block would have 4 sections: 1 - calculate indexed addresses and write to instructions in block 2 and 4 2 - copy from indexed addresses to temp variables 3 - current block contents, operating on temps 4 - write back necessary temps REMEMBER!!: if an instruction modifies the instruction which immediately follows it, then when the modified instruction executes, the old version will be run, because it was already fetched into the pipeline before the new version was written back. This is why parts 1 and 2 in the outline above aren't interleaved While looping through that block, the value to write to DIRA should be staged in DIRB, which is unused on our chip, then shifted approprieately and copied to DIRA at the end. The same shoudl be done when shifting out of our shiftbufs: the pattern to output should be staged in OUTB, then shifted and copied to OUTA at the end Another alternative: fetching from main mem can be accomplished in the time of 2 instructions, once we're sync'd with the hub. This is as fast as the self-modifying code idea above, saves cog memory, and is probably more readable, and incrementing an address through main RAM is easy. } { Thoughts: One cog will drive up to 8 contiguous pins Driver cog will pull messages from hub ram, one message to a word address in top top byte, brightness in second byte, color in bottom word 2 free bits in address allow for special message types; ideas: turn off output for that pin for the next n message times either need logic to constantly check whether the pin is disabled, or disable output on that pin, but this requries another cog to have the pins enabled and set low, or a pull down resistor possible overall design: cog 0: manager spin keeps all color effects pins pulled low handles status output, command input, etc in charge of starting/stopping pattern generators, color effects drivers cog 1-4: drivers this leaves 3 cogs as generators or halt progression through the fifo pattern generator could use this as a stop: fill the fifo past here, then rewrite this location to allow progress again noop jump to new location auto increment address - requires saving prev address using jmpret can allow ongoing processing while looping through bits to handle some of this overhead wait - waits until the driver is signalled through some other means then all waiting ringbufs resume Message words in main mem will be in ring buffers messages to lights can terminate after brightness when sending broadcast allow driver to send this message type? would only work if all strings being controlled were doing a truncated broadcast at the same time - for coordinated strings being generated by a single generator, this may not be unrealistic timing: bitpart means the time for 1/3 of a bit, 10us on factory controller as long as timing is accurate and consistent, bitpart times between 5us and 15us have been reported to work 10us gives 800 clock cycles, or ~200 instructions, per bitpart we need at least 3 bitparts between messages, during which time we're outputting low whether there's another message coming or not. This gives us time to fetch the next message from each of our ring buffers and disable output if necessary. Timing here may not be as critical. It's certainly safe to stretch this at a bitpart of 10us, but may be more important at other timescales. Start bit lasts for 1 bitpart and should give time to prep messages as necessary each bit lasts for 3 bitparts: first bitpart is always low output 0 examine bit for each output pin, construct output byte for second bitpart second bitpart depends on bit output byte constructed during first bitpart shift message words third bitpart is always high output 1 determine if message is done jump back to first bitpart or fall through to inter-message time launching: spin code accessing data in dat blocks modifies hub memory launching assembly copies hub memory to cog memory hub memory is effectively a template for cog memory spin code can modify this template prior to launching newly launched cogs will receive the modifications already running cogs are unaffected spin launch code can fill in bitpart time, ringbuf addresses, output masks (or number of output pins and start pin?), etc, then launch a new assembly cog with the modified code filled in mechanics: shl can put shifted out bit in C flag this can be used with optional execution for putting the bit into the output byte maybe use muxc to conditionally set bit in output byte? can use outb and dirb to build values before writing communication: driver needs to know whether continuing through ring buf is valid; pattern generator needs to know when it's safe to overwrite old data in the buffer I don't think I can get away from needing a head and tail pointer for each buffer. Data is read from the tail and written to the head. Headptr is only ever written by generator cog, so it doesn't need to read it, just write it every time. It only needs to read tail when it catches up to the last value it read, so just go into a spin lock until the tail moves; if it'd already moved since the last time we read it, we just hit the loop once and move on with the up to date value. The converse goes for the driver cog reading from and moving the tail. }