A WS2812 array driver in one line of code - no PASM required
Peter Jakacki
Posts: 10,193
I thought I'd share this as I like the simplicity of a driver that fits in one line of code. Even though I have a PASM module that can be loaded as required in Tachyon and there is probably no real need to have a one-liner replace, it is still a thing of beauty due to its simplicity.
The heart of this code are the fast pin operations H, L, and SHROUT where H and L set a pin specified by PIN high or low, and SHROUT outputs the next lsb of data via an I/O mask while preserving these parameters but shifting the data right. At 96MHz clock these operations each execute in 333ns while the FOR NEXT loop also executes just as fast. Using IC@ instead to fetch the next byte using the loop index but instead of pushing this onto the data stack it simply replaces what was there which initially was dummy data 0 (-ROT 0 -ROT) and the used data at the end of each eight shifts.
Setup sets the fast PIN while making sure the pin is low with L. Then a mask is created to be used by SHROUT, so there are two sources for setting a pin, either H or L, or by SHROUT ( iomask data -- iomask data/2 ). Since I want to insert dummy data into the parameter list I need the -ROT 0 -ROT where -ROT does this ( a b c -- c a b ) which is in the opposite direction to a normal ROT but equivalent to ROT ROT. The 3 * is just so we can pass the function an LED count and convert that to bytes required so that ( array bytes ) are passed to ADO which takes the starting address and count and also pushes the branch address for LOOP.
What's the problem with this driver? It needs the bytes reversed beforehand or just work with them as is. There's no time to do a 8 REV for each byte without losing the timing.
EDIT: Here's a decompilation listing - 21 instructions or 42 bytes - I've also added a column of Spin near equivalent code as a guide
: XLEDS ( array leds pin -- ) DUP PIN L MASK -ROT 0 -ROT 3 * ADO IC@ 8 FOR H SHROUT L NEXT LOOP 2DROP 50 us ;
The heart of this code are the fast pin operations H, L, and SHROUT where H and L set a pin specified by PIN high or low, and SHROUT outputs the next lsb of data via an I/O mask while preserving these parameters but shifting the data right. At 96MHz clock these operations each execute in 333ns while the FOR NEXT loop also executes just as fast. Using IC@ instead to fetch the next byte using the loop index but instead of pushing this onto the data stack it simply replaces what was there which initially was dummy data 0 (-ROT 0 -ROT) and the used data at the end of each eight shifts.
Setup sets the fast PIN while making sure the pin is low with L. Then a mask is created to be used by SHROUT, so there are two sources for setting a pin, either H or L, or by SHROUT ( iomask data -- iomask data/2 ). Since I want to insert dummy data into the parameter list I need the -ROT 0 -ROT where -ROT does this ( a b c -- c a b ) which is in the opposite direction to a normal ROT but equivalent to ROT ROT. The 3 * is just so we can pass the function an LED count and convert that to bytes required so that ( array bytes ) are passed to ADO which takes the starting address and count and also pushes the branch address for LOOP.
: XLEDS ( array leds pin -- ) OP <---- stack top DESC ------------------------------------------------------------------------------------ DUP array leds pin pin PIN array leds pin set fast pin L array leds pin fast pin low MASK array leds pinmask create mask for SHROUT ops -ROT pinmask array leds work dummy data into the correct position 0 pinmask array leds 0 -ROT pimask 0 array leds 3 pinmask 0 array leds 3 convert leds to byte count * pinmask 0 array bytes ADO pinmask 0 start a loop using start address and count IC@ pinmask data replace data with the next byte pointed to by loop index 8 pinmask data 8 for 8 times FOR pinmask data H pinmask data output a high SHROUT pinmask data>> output a data bit L pinmask data>> output a low NEXT " " next bit LOOP " " loop for another byte 2DROP discard parameters 50 50 delay for 50 us us ; return
What's the problem with this driver? It needs the bytes reversed beforehand or just work with them as is. There's no time to do a 8 REV for each byte without losing the timing.
EDIT: Here's a decompilation listing - 21 instructions or 42 bytes - I've also added a column of Spin near equivalent code as a guide
( 0084 $3002 ok ) SEE XLEDS --- Spin near equivalent 2D0E DUP 2D10 PIN pinreg := pin 2D12 L outa[pinreg] := 0 dira[pinreg] := 1 2D14 |< pinmask := 1 << pin 2D16 -ROT 2D18 $0000 (0) data := 0 2D1A -ROT 2D1C $0003 (3) bytes := leds*3 2D1E * 2D20 ADO repeat i from array for bytes 2D22 IC@ data := rdbyte[i] 2D24 $0008 (8) repeat 8 2D26 FOR 2D28 H outa[pinreg] := 1 dira[pinreg] := 1 2D2A SHROUT outa[pin] := data&1 data := data >> 1 2D2C L outa[pin] := 0 dira[pin] := 1 2D2E NEXT 2D30 LOOP 2D32 2DROP 2D34 $0032 (50) delayus(50) 2D36 GOTO us
Comments
I always seem to find new words I haven't used in my code yet!
IC@ has existed before but V3 didn't have room for it and many other functions. This version though does a replace rather than a push, as it is more useful and faster like that in a loop. Besides, it required only one extra PASM instruction to implement before falling through into C@.
Just by way of illustrating how useful it is to have words that use the stack as fixed registers rather than pushing or popping, especially in a loop, this is an SPI block write method written both with I C@ and IC@. Have a look at the timings.