WS2812 Fast Multi - Driving ~10,000 Neopixels using an $8 Micro
Tubular
Posts: 4,702
I've been working on an upgrade to the "laundry stargate" and looking for ways to drive more WS2812 LED strips with the Prop.
I've come up with some code that comes tantalisingly close to driving 5 WS2811/12/12B strips at full rate, so I thought I'd ask more experienced eyes to see if there's a way to finish this, or whether I should be less ambitious and settle for 4 strips per cog.
What I want to do is find a way to "advance" the 5 Ptrs to the next LED data, every 24 rather than 31 bits. I need an alternative to the 'test' instruction that makes the zero flag go high once every 24 bits, and advances the pointers to the next LED data. This allows data to be store in the simpler form $00 Gg Rr Bb (one led per long), otherwise I will have to pack 4 leds to 3 longs, which will work but perhaps isn't so user friendly.
No matter what we end up with, we should be able to use virtually all of hub ram for data; something like 7500~10,000 LEDs, split across 6 or 7 cogs, 28 output pins, at a refresh rate around 130 Hz. I think that'd be spectacular, for a $8 micro in a DIP package
Clock rate is 100 MHz so bit timing ends up at 1.28usec per bit.
I've come up with some code that comes tantalisingly close to driving 5 WS2811/12/12B strips at full rate, so I thought I'd ask more experienced eyes to see if there's a way to finish this, or whether I should be less ambitious and settle for 4 strips per cog.
What I want to do is find a way to "advance" the 5 Ptrs to the next LED data, every 24 rather than 31 bits. I need an alternative to the 'test' instruction that makes the zero flag go high once every 24 bits, and advances the pointers to the next LED data. This allows data to be store in the simpler form $00 Gg Rr Bb (one led per long), otherwise I will have to pack 4 leds to 3 longs, which will work but perhaps isn't so user friendly.
No matter what we end up with, we should be able to use virtually all of hub ram for data; something like 7500~10,000 LEDs, split across 6 or 7 cogs, 28 output pins, at a refresh rate around 130 Hz. I think that'd be spectacular, for a $8 micro in a DIP package
Clock rate is 100 MHz so bit timing ends up at 1.28usec per bit.
'Main tight bit loop starts here:- ' Bit output order looks like this  ' state: data low high ' delay: 0.36 0.56 0.36 usec :loop rdlong dat1, ptr1 '008 get first strip data, 00RRGGBB format test dat1, bmask wc '012 extract single data bit and copy it to C flag if_nc andn opmask, led1mask '016 if led1 data bit was 0, zero the mask at the led1 pin position rdlong dat2, ptr2 '024 get first strip and outa, out0mask '028 all led outputs low <- near 0.36 usec mark after data output test dat2, bmask wc '032 get data bit into Z if_nc andn opmask, led2mask '036 zero led2 pin if data bit was zero. test bitctr,#31 wz '040 check whether in last bit. store result in zero flag. if_z add ptr1,inc1 '044 move onto next led if last bit if_z add ptr2,inc2 '048 move onto next LED rdlong dat3, ptr3 '056 get third channel data test dat3, bmask wc '060 get data bit into C if_nc andn opmask, led3mask '064 if led3 data bit was 0, zero the mask at the led3 pin position rdlong dat4, ptr4 '072 get fourth channel data test dat4, bmask wc '076 get data bit into C if_nc andn opmask, led4mask '080 zero led4 pin if data bit was 0. or outa, out1mask '084 all led outputs high <- near 0.56us after previous transition. Hence out of order slightly if_z add ptr3,inc3 '088 move onto next LED if_z add ptr4,inc4 '092 move onto next LED ror bmask, #1 '096 ready to extract next bit rdlong dat5, ptr5 '104 get fifth strip led data test dat5, bmask wc '108 get data bit into C if_c andn opmask, led5mask '112 zero led5 pin only if data bit was 0. if_z add ptr5,inc5 '116 move onto next led5 data and outa, opmask '120 Clear the pin bits that must be zero. Others (1s) will be cleared later when all go to zero. <- 0.36us after previous transition mov opmask, #$1ff '124 reset the output mask for the next iteration djnz bitctr, #:loop '128 jump to back and do another 24/32 bits for the next LED
Comments
I settled on the 5 strip per cog format, which means the data from 4 leds is packed into 3 longs. The upside of this is that we should be able to drive around 10,000 leds from a single Prop.
Testing on the weekend consisted of driving a long string of 576 leds successfully from one output, moving that long string from output pin to output pin. The cog driver keeps up with Hub memory so there's no limit on number of strings, or number of leds per string, other than the 32K hub limit.
Will post some photos of the setup soon
here some ideas:
- Instead of a part of the bitcntr you can use the bmask to detect the wrap around from bit0 to bit23/31.
- Copy the bit directly into the opmask with muxc , instead of set them all and clear one after one with if_nc andn, this spares one Instruction.
It will then look like that:
Andy
That 24/32 bit flexibility is good for the RGBW strips
Thanks & regards
tubular