Help with suspected memory corruption / pointer mishandling?
I'm building a board to control several strings of GE ColorEffects Christmas lights. These are LED based lights, with 3 LEDs (RGB) in each bulb, with each bulb individually addressable to control the brightness of each LED. The controller that comes with them from the factory does some nice patterns, but there's tons more potential if you build your own controller, which is what I'm attempting.
The communications protocol for the lights uses bits which are 30us wide, with the first 10us low, the middle 10us high or low depending on the value of the bit, and the last 10us high. This means my code needs to change state every 10us, so driving the pins needs to be done from PASM rather than SPIN code. Each message is 26 bits long, plus a 10us start pulse, plus 30-40us or more between messages, giving a message time of >800us. This is enough time for me to generate the messages in SPIN code, keeping the logic a lot prettier. Based on this, my design is to have a cog running SPIN code which generates messages, putting them in a ring buffer in hub ram. Another cog, running PASM code, monitors that ring buffer and handles actually sending the message when there's one available.
My first stab at things supported driving only a single string of lights, and I got it working fine. However, I want to be able to control several strings of lights from a single Propeller, and don't want to be overly constrained by the number of cogs. The 30-40us inter-message time is plenty for checking several ring buffers for new messages, and the 10us state change time is plenty to update the state of several output pins. So my real plan is to have a single PASM cog which can drive up to 8 strings of lights. This required a significant re-work of my output driver cog, as I could no longer block waiting for new messages to be available in the message buffer.
My new implementation works, sort of. It properly drives a single output pin (haven't yet tested the rest), but I think I'm just getting lucky, and it's not really quite correct. Trivial changes, like adding or removing diagnostic output sent over the serial port, or adding a new SPIN subroutine which is never called, or adding or removing unused variables, tend to break things. I suspect that I'm mishandling a pointer somewhere, and overwriting parts of memory I shouldn't, or reading from uninitialized memory. That would explain why my seemingly irrelevant changes break things; they move where things are laid out in ram, and I no longer get lucky.
Can anyone see anything obviously incorrect in this code? Or can someone suggest some good tools/approaches to trying to find and fix this bug? Alternatively, if I'm doing things the hard or strange way, and there's a more standard way of doing things, I'd be open to suggestions of better approaches to the problem and rewriting. A lot of my internal mental dialog and thought process are in comments at the end of the code; feel free to ignore or tell me I'm crazy ;-) .
The communications protocol for the lights uses bits which are 30us wide, with the first 10us low, the middle 10us high or low depending on the value of the bit, and the last 10us high. This means my code needs to change state every 10us, so driving the pins needs to be done from PASM rather than SPIN code. Each message is 26 bits long, plus a 10us start pulse, plus 30-40us or more between messages, giving a message time of >800us. This is enough time for me to generate the messages in SPIN code, keeping the logic a lot prettier. Based on this, my design is to have a cog running SPIN code which generates messages, putting them in a ring buffer in hub ram. Another cog, running PASM code, monitors that ring buffer and handles actually sending the message when there's one available.
My first stab at things supported driving only a single string of lights, and I got it working fine. However, I want to be able to control several strings of lights from a single Propeller, and don't want to be overly constrained by the number of cogs. The 30-40us inter-message time is plenty for checking several ring buffers for new messages, and the 10us state change time is plenty to update the state of several output pins. So my real plan is to have a single PASM cog which can drive up to 8 strings of lights. This required a significant re-work of my output driver cog, as I could no longer block waiting for new messages to be available in the message buffer.
My new implementation works, sort of. It properly drives a single output pin (haven't yet tested the rest), but I think I'm just getting lucky, and it's not really quite correct. Trivial changes, like adding or removing diagnostic output sent over the serial port, or adding a new SPIN subroutine which is never called, or adding or removing unused variables, tend to break things. I suspect that I'm mishandling a pointer somewhere, and overwriting parts of memory I shouldn't, or reading from uninitialized memory. That would explain why my seemingly irrelevant changes break things; they move where things are laid out in ram, and I no longer get lucky.
Can anyone see anything obviously incorrect in this code? Or can someone suggest some good tools/approaches to trying to find and fix this bug? Alternatively, if I'm doing things the hard or strange way, and there's a more standard way of doing things, I'd be open to suggestions of better approaches to the problem and rewriting. A lot of my internal mental dialog and thought process are in comments at the end of the code; feel free to ignore or tell me I'm crazy ;-) .
''*****
''* Color Effects v0.7
''* Author: Thomee Wright
''* Copyright (c) 2011
''* All rights reserved
''*****
' v0.1 - 2011/09/22 - Initial version
' v0.2 - 2011/09/26 - Basic light comms and reading from hub mem
' v0.5 - 2011/09/30 - Full control of a single string running a single pattern
' v0.7 - 2011/12/06 - Initial implementation of multi-string support sort of working
CON
_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000
' P0..15 available; P16..23 VGA; P24..27 kb/mouse; P28..31 boot eeprom/serial
LightPin = 0
MsgBufLen = 100
StringCount = 8
VAR
long MsgTmp
byte Buf, Bulb, Phase, Color
byte Blue, Green, Red
byte i
long t1, t2, t3
'
'
' Arrays for passing data to ColorEffects driver
'
long BufAddrs[8], BufEnds[8], BufHeads[8], BufTails[8], Messages[8]
long Buf0[MsgBufLen], Buf1[MsgBufLen], Buf2[MsgBufLen], Buf3[MsgBufLen]
long Buf4[MsgBufLen], Buf5[MsgBufLen], Buf6[MsgBufLen], Buf7[MsgBufLen]
'long MsgBuf[MsgBufLen * StringCount]
OBJ
term : "Parallax Serial Terminal"
PUB start
' start serial terminal
term.start(115200)
waitcnt(clkfreq * 5 + cnt)
term.str(string("Start", $0D))
' we need to set our pin as an output and output a 0
' so we pull it low when the driver is idling on it
dira := $0FF ' ouput on first 8 pins
outa := $00 ' set them all low
term.str(string("Prepping for messages", $0D))
BufAddrs[0] := @Buf0
BufAddrs[1] := @Buf1
BufAddrs[2] := @Buf2
BufAddrs[3] := @Buf3
BufAddrs[4] := @Buf4
BufAddrs[5] := @Buf5
BufAddrs[6] := @Buf6
BufAddrs[7] := @Buf7
repeat Buf from 0 to 7
'BufAddrs[Buf] := @MsgBuf[MsgBufLen * Buf]
Messages[Buf] := 0
repeat Bulb from 0 to 49
MsgTmp := 0
BYTE[@MsgTmp][3] := Bulb
BYTE[@MsgTmp][2] := $CC
BYTE[@MsgTmp][1] := $F0
LONG[BufAddrs[Buf]][Bulb] := MsgTmp
BufHeads[Buf] := BufAddrs[Buf] + 200 ' 50*4bytes (next space to write)
BufTails[Buf] := BufAddrs[Buf] + MsgBufLen * 4
BufEnds[Buf] := BufAddrs[Buf] + MsgBufLen * 4
term.hex(Buf, 1)
term.char(" ")
term.hex(BufAddrs[Buf], 8)
term.char(":")
term.hex(BufEnds[Buf], 8)
term.char($0D)
{
Pads[0] := @Pad0
Pads[1] := @Pad1
Pads[2] := @Pad2
Pads[3] := @Pad3
Pads[4] := @Pad4
Pads[5] := @Pad5
Pads[6] := @Pad6
}
term.str(string("Prepping DAT area", $0D))
t_bit := 800 ' length of bit part in clock counts
t_msg := 3200 ' inter-message delay in clock counts
start_pin := 0 ' which output pin to start at
buf_addrs := @BufAddrs ' Addresses of message queues
buf_ends := @BufEnds ' Ends of message queues
buf_hptrs := @BufHeads ' Array of head pointers (1 per queue)
buf_tptrs := @BufTails ' Array of tail pointers (1 per queue)
buf_cnt := 1 ' Count of queues (1-8)
msg_ptrs := @Messages ' Array of buffers for driver (1 per queue); we never read or write here
term.str(string("Starting lights cog", $0D))
cognew(@lights, 0)
term.str(string("Starting pattern generator", $0D))
Walk1
repeat
PRI Walk1
i := 0
repeat
repeat Phase from 0 to 7
repeat Bulb from 0 to 49
MsgTmp := 0
BYTE[@MsgTmp][3] := Bulb
BYTE[@MsgTmp][2] := $CC
Color := Bulb + Phase
if Color & $01
Byte[@MsgTmp][1] |= $F0
if Color & $02
Byte[@MsgTmp][1] |= $0F
if Color & $04
Byte[@MsgTmp][0] |= $F0
repeat while BufHeads[0] + 4 == BufTails[0]
{
term.hex(BufHeads[0], 8)
term.char("-")
term.hex(BufTails[0], 8)
term.char("(")
term.hex(i++, 2)
term.char(")")
term.char($0D)
}
LONG[BufHeads[0]] := MsgTmp
MsgTmp := BufHeads[0] + 4
if (MsgTmp == BufEnds[0])
MsgTmp := BufAddrs[0]
BufHeads[0] := MsgTmp
repeat Red from 1 to 8
waitcnt(clkfreq / 10 + cnt)
'term.hex(BufHeads[0], 8)
'term.char(":")
'term.hex(BufTails[0], 8)
{
repeat i from 0 to 6
if LONG[Pads[i]] <> 0
term.str(string("Pad"))
term.dec(i)
term.char(":")
term.hex(LONG[Pads[i]], 8)
}
DAT
'
'
' Data tables for pattern generation
'
Gamma byte $00, $00, $01, $01, $02, $02, $03, $03, $04, $05, $06, $07, $09, $0B, $0D, $0F
GammaHigh byte $00, $00, $10, $10, $20, $20, $30, $30, $40, $50, $60, $70, $90, $B0, $D0, $F0
DAT
'*******************************************************
'* Assembly language GE ColorEffects G-35 light driver *
'*******************************************************
org 0
lights ' first, one time setup
mov time, cnt ' get current sys counter
add time, t_msg ' make sure we have at least one full inter-message time before we start
' inter message time; prep work for each string
' future: handle pending control messages like wait
' fetch head and tail, see if there's work to do
' fetch next message if relevant
' save message - for control messages
' turn each string's output on or off as appropriate
:sendmsg
mov strcnt, buf_cnt ' we'll do prep work for each string
mov shiftbuf_addr, #shiftbuf' get address of first shiftbuf
mov str_bit, #$01 ' first string is at lowest order bit
mov buf_addr_ptr, buf_addrs
mov buf_end_ptr, buf_ends
mov buf_hptr, buf_hptrs
mov buf_tptr, buf_tptrs
mov msg_ptr, msg_ptrs
mov dirb, #0 ' we stage in dirb; disable all
:prep_string
rdlong message, msg_ptr ' fetch old message (for future use with control messages like wait)
test message, ctlmsg_flag wc ' if C flag is set, it's a control message
if_c jmp #:ctl_msg ' if it's a control message, handle it differently
rdlong buf_addr, buf_addr_ptr ' get the address of the buffer for the current string
add buf_addr_ptr, #4 ' increment for the next string
movd :mov_shiftbuf, shiftbuf_addr ' modify instruction below to store to appropriate shiftbuf
rdlong buf_end, buf_end_ptr ' get the end ptr
add buf_end_ptr, #4 ' increment end_ptr to next string
add shiftbuf_addr, #1 ' point to next string's shiftbuf
rdlong buf_tail, buf_tptr ' get the tail ptr
cmp buf_tail, buf_end wz ' check if we need to wrap (See Note 1)
if_z mov buf_tail, buf_addr ' wrap to start if necessary
rdlong buf_head, buf_hptr ' get the head ptr for the current string
cmp buf_head, buf_tail wz ' check if there's a message ready; if z flag is set, there isn't
if_z jmp #:msgready ' output is disabled by default, just don't enable
rdlong message, buf_tail ' fetch next message
add buf_tail, #4 ' increment our ring buffer pointer
test message, ctlmsg_flag wc ' if C flag is set, it's a control message
wrlong buf_tail, buf_tptr ' update tail ptr in hub RAM so generator can write more (See Note 1)
if_nc or dirb, str_bit ' if not control, enable output for this string
if_nc jmp #:msgready ' if not control, jump below
:ctl_msg
' stuff to do if it's a control message
:msgready
wrlong message, msg_ptr ' write our message back (for future use; see above)
add msg_ptr, #4 ' update msg_ptr for next string
add buf_tptr, #4 ' update tptr for next string
add buf_hptr, #4 ' update hptr for next string
shl str_bit, #1
:mov_shiftbuf mov shiftbuf, message ' move the message to send into correct shift buffer (address was set above)
djnz strcnt, #:prep_string ' prep the next string, if any
shl dirb, start_pin ' shift output mask over to appropriate pin
mov dira, dirb ' set output pins
shl shiftbuf+0, #2 ' shift out unused top 2 bits of address
shl shiftbuf+1, #2
shl shiftbuf+2, #2
shl shiftbuf+3, #2
shl shiftbuf+4, #2
shl shiftbuf+5, #2
shl shiftbuf+6, #2
shl shiftbuf+7, #2
waitcnt time, t_bit ' wait till the specified time, then add delay to our counter
nop ' for consistent timing, always one instruction between waitcnt and mov outa, ...
mov outa, #$01 ' message start bitpart - set pin 0 high
' message start time
' prep to loop through all bits
mov bitcnt, #26 ' 26 bits in a message
:sendbit waitcnt time, t_bit ' wait (either inter-message time or end bit part of last bit)
nop ' for consistent timing, always one instruction between waitcnt and mov outa, ...
mov outa, #$00 ' bit start bitpart - set pin 0 low
' start of bit time
' shift top bit out of each shiftbuf
' stage output pattern in outb
mov outb, #0 ' default to all outputs low
shl shiftbuf+0, #1 wc ' shift out top bit, put it in the carry flag
if_nc or outb, #$001 ' if we're sending a 0 bit, set our pin high now
shl shiftbuf+1, #1 wc
if_nc or outb, #$002
shl shiftbuf+2, #1 wc ' looping through these takes 9 instructions plus
if_nc or outb, #$004 ' 1 register. Hard coding it takes 16 instructions
shl shiftbuf+3, #1 wc ' and no registers. But the hard coded version takes
if_nc or outb, #$008 ' 16*4 clock cycles to run, while the loop version
shl shiftbuf+4, #1 wc ' takes (4+strings*6)*4 clock cycles to run. For more
if_nc or outb, #$010 ' than 2 strings, the hard coded version is faster.
shl shiftbuf+5, #1 wc ' The loop version is also self-modifying code
if_nc or outb, #$020
shl shiftbuf+6, #1 wc
if_nc or outb, #$040
shl shiftbuf+7, #1 wc
if_nc or outb, #$080
waitcnt time, t_bit
shl outb, start_pin ' shift output bits over to appropriate pin
mov outa, outb ' middle bitpart - output our pattern
{
' loop version of calculating output pattern
mov foo, #shiftbuf
mov str_bit, #$001
mov strcnt, buf_cnt
:x movd :y, foo
add foo, #1
:y shl shiftbuf, #1 wc
if_nc or outb, bar
shl str_bit, #1
djnz buf_cnt, :x
}
'
' middle of bit time
' stage output (all high) in outb
mov outb, $0FF ' output high on all 8 pins (as enabled)
shl outb, start_pin ' shift output bits over to appropriate pin
waitcnt time, t_bit
nop ' always one instruction, yadda, yadda
mov outa, outb ' bit end bitpart - set pins high
' end of bit time
' check if there are more bits to send
djnz bitcnt, #:sendbit ' if there are still more bits, go back up and do the next one
waitcnt time, t_msg ' finish out our last bit, prep for inter-message delay
nop ' 1 inst, blah, blah
mov outa, #$00 ' inter-message delay - set pins low
jmp #:sendmsg ' get ready for the next message
:end jmp #:end ' shouldn't ever get here, but this keeps us from running wild
'
'
' Initialized data
'
t_bit long 0
t_msg long 0
start_pin long 0
buf_cnt long 0
buf_addrs long 0
buf_ends long 0
buf_hptrs long 0
buf_tptrs long 0
msg_ptrs long 0
'
'
' Constants
'
ctlmsg_flag long $80_00_00_00
'
'
' Uninitialized data
'
time res 1
strcnt res 1
bitcnt res 1
str_bit res 1
buf_tail res 1
buf_head res 1
message res 1
shiftbuf res 8
shiftbuf_addr res 1
buf_addr_ptr res 1
buf_end_ptr res 1
buf_addr res 1
buf_end res 1
buf_hptr res 1
buf_tptr res 1
msg_ptr res 1
'
'
' End of everything; make sure we're not too big
'
fit
{
Note 1:
We'll compare to buf_end to see if we need to wrap later. This means when it's
time to wrap, we'll write buf_end, rather than buf_addr, to the tail pointer.
This isn't a big deal as long as the generator knows to handle things the same way,
and this makes the generator easier to write, as it can compare head+4 to tail without
worrying about wrapping to see if it's safe to write more.
}
{
multi string strategy
Unfortunately, we can't do things like buf_tptr[idx] or message[idx] in assembly,
as there's no way to do that sort of indirection. This makes looping through all
our strings to fetch the next message challenging. Luckily, if we handle shiftbuf
differently, there's only one section of code where we need to handle this: from
:sendmsg to the mov at :msgready. If we load temp versions of buf_addr, buf_end,
buf_hptr, buf_tptr, and message indexed from the arrays somehow, and write back
message from its temp copy at the end of that block, then we can loop over that
block using the temps and all will be good. For sendbit, we can hard code for all
eight at the point we shift and or into our output byte. This just leaves the
question of how to copy from our arrays into the temps. There are two pieces to
accomplishing this:
First, write a subroutine callable by call or jmpret which copies the first
element of each array into the temps, and label each line.
Second, write another subroutine which uses an index variable as a parameter
For each array, this routine stores the address of the array into a temp var
Next it adds the index parameter to the temp var, giving the address of the
appropriate value in the array.
Finally, the modified address is written into the appropriate instruction
in the first subroutine so it loads from the correct index, rather than 0.
This can be accomplished with MOVS and all the labels in the first subroutine
Now, we loop through the block which needs indexes, and at the start of each loop,
first call the rewriter subroutine passing the index as a parameter, then call the
copy to temp subroutine which will fetch the indexed values into the temp fields,
then operate on the temps. Finally, the temp of message can be written back in
a similar manner.
Actually, this could probably all be rolled into the block without the use of
subroutines: the block would have 4 sections:
1 - calculate indexed addresses and write to instructions in block 2 and 4
2 - copy from indexed addresses to temp variables
3 - current block contents, operating on temps
4 - write back necessary temps
REMEMBER!!: if an instruction modifies the instruction which immediately follows
it, then when the modified instruction executes, the old version will be run, because
it was already fetched into the pipeline before the new version was written back.
This is why parts 1 and 2 in the outline above aren't interleaved
While looping through that block, the value to write to DIRA should be staged in
DIRB, which is unused on our chip, then shifted approprieately and copied to DIRA
at the end. The same shoudl be done when shifting out of our shiftbufs: the pattern
to output should be staged in OUTB, then shifted and copied to OUTA at the end
Another alternative: fetching from main mem can be accomplished in the time of 2
instructions, once we're sync'd with the hub. This is as fast as the self-modifying
code idea above, saves cog memory, and is probably more readable, and incrementing
an address through main RAM is easy.
}
{
Thoughts:
One cog will drive up to 8 contiguous pins
Driver cog will pull messages from hub ram, one message to a word
address in top top byte, brightness in second byte, color in bottom word
2 free bits in address allow for special message types; ideas:
turn off output for that pin for the next n message times
either need logic to constantly check whether the pin is disabled,
or disable output on that pin, but this requries another cog to have the
pins enabled and set low, or a pull down resistor
possible overall design:
cog 0: manager
spin
keeps all color effects pins pulled low
handles status output, command input, etc
in charge of starting/stopping pattern generators, color effects drivers
cog 1-4: drivers
this leaves 3 cogs as generators or
halt progression through the fifo
pattern generator could use this as a stop: fill the fifo past here, then rewrite
this location to allow progress again
noop
jump to new location
auto increment address - requires saving prev address
using jmpret can allow ongoing processing while looping through bits to
handle some of this overhead
wait - waits until the driver is signalled through some other means
then all waiting ringbufs resume
Message words in main mem will be in ring buffers
messages to lights can terminate after brightness when sending broadcast
allow driver to send this message type?
would only work if all strings being controlled were doing a truncated broadcast
at the same time - for coordinated strings being generated by a single generator,
this may not be unrealistic
timing: bitpart means the time for 1/3 of a bit, 10us on factory controller
as long as timing is accurate and consistent, bitpart times between 5us and
15us have been reported to work
10us gives 800 clock cycles, or ~200 instructions, per bitpart
we need at least 3 bitparts between messages, during which time we're outputting
low whether there's another message coming or not. This gives us time to fetch the
next message from each of our ring buffers and disable output if necessary. Timing
here may not be as critical. It's certainly safe to stretch this at a bitpart of
10us, but may be more important at other timescales.
Start bit lasts for 1 bitpart and should give time to prep messages as necessary
each bit lasts for 3 bitparts:
first bitpart is always low
output 0
examine bit for each output pin, construct output byte for second bitpart
second bitpart depends on bit
output byte constructed during first bitpart
shift message words
third bitpart is always high
output 1
determine if message is done
jump back to first bitpart or
fall through to inter-message time
launching:
spin code accessing data in dat blocks modifies hub memory
launching assembly copies hub memory to cog memory
hub memory is effectively a template for cog memory
spin code can modify this template prior to launching
newly launched cogs will receive the modifications
already running cogs are unaffected
spin launch code can fill in bitpart time, ringbuf addresses, output
masks (or number of output pins and start pin?), etc, then launch
a new assembly cog with the modified code filled in
mechanics:
shl can put shifted out bit in C flag
this can be used with optional execution for putting the bit into the output byte
maybe use muxc to conditionally set bit in output byte?
can use outb and dirb to build values before writing
communication:
driver needs to know whether continuing through ring buf is valid; pattern
generator needs to know when it's safe to overwrite old data in the buffer
I don't think I can get away from needing a head and tail pointer for each
buffer. Data is read from the tail and written to the head. Headptr is only
ever written by generator cog, so it doesn't need to read it, just write it
every time. It only needs to read tail when it catches up to the last value
it read, so just go into a spin lock until the tail moves; if it'd already
moved since the last time we read it, we just hit the loop once and move on
with the up to date value. The converse goes for the driver cog reading from
and moving the tail.
}
