Manipulating Bits in a LONG

JonnyMac · 2013-07-16 12:37

For an RGB LED driver I'd like to keep the colors in the high-level interface as $RR_GG_BB but the device itself wants the order as $GG_RR_BB. In the world of Captain Obvioso (that would be me), I'm doing this:

' adjust colorbits
                        ' -- starts as $RR_GG_BB
                        ' -- ends as $GG_RR_BB

                        mov     t1, colorbits                   ' make copies
                        mov     t2, colorbits
                        and     t1, RED_BITS                    ' isolate colors
                        and     t2, GRN_BITS
                        and     colorbits, BLU_BITS
                        shr     t1, #8                          ' reposition red
                        shl     t2, #8                          ' reposition green
                        or      colorbits, t2                   ' reconstruct
                        or      colorbits, t1

                        ' shift out bits here
                        
                        
RED_BITS                long    $FF_00_00
GRN_BITS                long    $00_FF_00
BLU_BITS                long    $00_00_FF

I know there are some with wild-eyed assembly tricks for bit manipulations -- and I'm always game to learn. Timing is not a problem here, but faster tends to be better.

tonyp12 · 2013-07-16 15:01

the movi trick, works on up to 31bit data but not 32bit without handling that the last movi will trash bit22
step1: get the long in to position that the lower byte is correct, use shift or rol
step2: get the byte you want to merge in to position bit 8-1 bit, bit 0 is a do-not-care
step3: use movi
step4: rol the long 8bits
step5: repeat from step2 one more time for 24bit data

if you could adjust so shiftout uses 24bit msb left shiftout , could make the last rol colorbits,#16 and delete the and

                        mov     t1, colorbits                   ' 00_RR_GG_BB   copy
                        shr     colorbits,#8                    ' 00_00_RR_GG   step1
                        ror     t1,#15                          ' gG_BB_00_RRg  step2
                        movi    colorbits,t1                    ' RRg_00_RR_GG  step3
                        rol     colorbits,#8                    ' g00_RR_GG_RR  step4
                        ror     t1,#8                           ' 00_RR_GG_BB0  step2
                        movi    colorbits,t1                    ' BB0_RR_GG_RR  step3
                        rol     colorbits,#8                    ' 0RR_GG_RR_BB  step4
                        and     colorbits,_$FFFFFF              ' only needed if shiftout don't ignore                      

                        ' shift out bits here

kwinn · 2013-07-16 15:10

Your'e probably thinking of how data in memory can be swapped without using temporary locations by using XOR instructions. XOR a, b then XOR b, a followed by XOR a, b will swap the data in those two locations. What you have looks like it will be as fast as it gets.

skylight · 2013-07-16 15:23

I'm probably being daft here, usually am

but what about keeping the code straightforward and rewiring by swapping the red and green outputs? ie wire green output to red and vice versa.

kwinn · 2013-07-16 16:15

skylight wrote: »

I'm probably being daft here, usually am but what about keeping the code straightforward and rewiring by swapping the red and green outputs? ie wire green output to red and vice versa.

It's a lot simpler to write the 9 instructions to shift the data around than it would be to change a board layout or rewire a board/connector.

JonnyMac · 2013-07-16 16:44

I'm probably being daft here, usually am but what about keeping the code straightforward and rewiring by swapping the red and green outputs? ie wire green output to red and vice versa.

That is not an option. The device in question is a WS2812 RGB 5050 LED (with built-in driver) and it wants the data as $GG_RR_BB. Why? I have no idea. As I try to keep my code obvious I wan to pass colors as $RR_GG_BB. In the end, the easy code works, the driver works, and I'm going to put a fork in it.

jazzed · 2013-07-16 17:12

Essentially the same as tonyp's except no need for mask.

                        ' adjust colorbits
                        ' -- starts as $RR_GG_BB
                        ' -- ends as $GG_RR_BB

                        ' dots are 0's ...
                        mov    t1, colorbits  ' ........_rrrrrrrr_gggggggg_bbbbbbbb
                        ror    colorbits, #17 ' rggggggg_gbbbbbbb_b......._.rrrrrrr
                        ror    t1, #16        ' gggggggg_bbbbbbbb_........_rrrrrrrr
                        movi   colorbits, t1  ' .rrrrrrr_rbbbbbbb_b......._.rrrrrrr
                        shr    colorbits, #8  ' ........_.rrrrrrr_rbbbbbbb_b.......
                        shr    t1, #24        ' ........_........_........_gggggggg
                        movi   colorbits, t1  ' .ggggggg_grrrrrrr_rbbbbbbb_b.......
                        shr    colorbits,#7   ' ........_gggggggg_rrrrrrrr_bbbbbbbb

Alternative method with a HUB long variable.

                        mov    addr, someaddr
                        rdlong colorbits, addr ' ........_rrrrrrrr_gggggggg_bbbbbbbb
                        add    addr, #1        ' next byte 
                        ror    colorbits, #16  ' gggggggg_bbbbbbbb_........_rrrrrrrr
                        wrbyte colorbits, addr ' write RR to addr + 1
                        add    addr, #1        ' next byte 
                        rol    colorbits, #8   ' bbbbbbbb_........_rrrrrrrr_gggggggg
                        wrbyte colorbits, addr ' write GG to addr + 2

Cluso99 · 2013-07-16 17:53

Here is another solution... (8 instructions and 1 extra temp register)

                                ' x = 00000000 rrrrrrrr gggggggg bbbbbbbb
        mov   t, x              ' t = 00000000 rrrrrrrr gggggggg bbbbbbbb
        shl   x, #24            ' x = bbbbbbbb 00000000 00000000 00000000
        ror   t, #16            ' t = gggggggg bbbbbbbb 00000000 rrrrrrrr
        movs  x, t              ' x = bbbbbbbb 00000000 00000000 rrrrrrrr
        ror   x, #8             ' x = rrrrrrrr bbbbbbbb 00000000 00000000
        shr   t, #24            ' t = 00000000 00000000 00000000 gggggggg
        or    x, t              ' x = rrrrrrrr bbbbbbbb 00000000 gggggggg
        ror   x, #16            ' x = 00000000 gggggggg rrrrrrrr bbbbbbbb
x       long  0                 ' orig & final value
t       long  0                 ' temporary reg

tonyp12 · 2013-07-16 18:10

If you assume the long is stored in Hub in the first place, two first 2 is needed anyway so it will add 8 longs

mov    t1,par
rdbyte colorbits,t1  'get blue, gives you free "AND $FF"
add    t1,#1
rdbyte t2,t1         'get green
shl    t2,#16
or     colorbits,t2  'merge in green
add    t1,#1
rdbyte t2,t1         'get red
shl    t2,#8
or     colorbits,t2  'merge in red

Beau Schwabe · 2013-07-16 21:07

I almost feel as this should be one of Phil's Golf Challenges... :-)

Here is a method that uses 7 instructions

              mov       t1,   Data            '00_RR_GG_BB    copy
              shl       t1,   #8              'RR_GG_BB_00    shift t1 left one byte 
              xor       t1,   Data            'RR_^^_^^_BB    xor t1 and data
              and       t1,   Mask            '00_^^_00_00    Mask off only what we are interested in
              xor       Data, t1              '00_GG_GG_BB    GG moves into RR
              shr       t1,   #8              '00_00_^^_00    shift t1 right one byte
              xor       Data, t1              '00_GG_RR_BB    RR moves into GG's place

"Mask" is preset to = $00_FF_00_00

"Data" is both your input and output

Cluso99 · 2013-07-16 23:42

Nice work Beau!

Rayman · 2013-07-17 07:37

Was just looking at the datasheet for that device... I think maybe I'd do this in the shift out driver... Maybe create a shift out function that shifts out one byte. Then, just do a preshift before each call to that function...

JonnyMac · 2013-07-17 14:15

There's enough timing tolerance in the spec that you could probably get away with that. I rearrange the bits and then send the entire 24-bit packet. I've attached my "Captain Obvioso" code for the driver (will appear in my N&V column).

[Edit] Updated driver in post #25. Will move to ObEx shortly.

RS_Jim · 2013-07-18 07:27

Johnny,
I believe the color sequence comes from the video industry where they use a lot of color subtraction and it is from the green channel.
Jim

JonnyMac wrote: »

That is not an option. The device in question is a WS2812 RGB 5050 LED (with built-in driver) and it wants the data as $GG_RR_BB. Why? I have no idea. As I try to keep my code obvious I wan to pass colors as $RR_GG_BB. In the end, the easy code works, the driver works, and I'm going to put a fork in it.

kwinn · 2013-07-18 13:37

Rayman wrote: »

Was just looking at the datasheet for that device... I think maybe I'd do this in the shift out driver... Maybe create a shift out function that shifts out one byte. Then, just do a preshift before each call to that function...

Now that is the perfect answer when the bits are shifted out on a single pin, and it only adds 2 or 3 shift instructions to the driver.

Tubular · 2013-07-18 14:10

Hi Jonny

Good to see you're onto these. The prop is well suited to driving a bunch of these (several "universes")

I've attached some working code based on Gavin T Garner's obex code. The GRB vs RGB swap is a bit of a pain, I did it in the shift out driver. That way we can just change the driver according to the exact led

'new G-R swap required for Neopixel.   Delete this block for other drivers that use RGB format
              mov       GRBvalue, RGBvalue  'make copy to work on - swap RGB to GRB format for NeoPixels
              mov       BLUvalue, RGBvalue
              and       BLUvalue,#255
              shl       GRBvalue,#16        'GB00
              and       GRBvalue,mask1      'G000
              and       RGBvalue,mask3       0R00
              or        GRBvalue,RGBvalue   'GR00
              shr       GRBvalue,#8         '0GR0
              or        GRBvalue,BLUvalue   '0GRB

Btw my kids have had an absolute ball programming these, choosing the color sequences of a little "train" that runs along the strip, as well as how many leds in the "train" etc. Part of the fun is the flexible strip itself, that can be bent around the place to run up walls, loop the loop, weave between toys, etc. It has been good practice spelling colors (including "Chartreuse") and they asked me to add "brown" and "black" to the range of colors (I think we also added marone and crimson too)

One of the next projects we're going to try is a light version of a slot car set, with a red, green and blue "car", that can overtake each other, (you'll get yellow while red is passing green, for instance). Need to 3d print some hand throttles!

Will post some pics/vids of proto board soon

BTW the B version of the WS2812 (WS2812B) is extremely easy to solder, because it has 4 pins rather than 6 (one pin in each corner, spread relatively far apart). Its more like a 0.1" pitch. I have 1000 on order, hope to see them soon. I can send you some when they arrive if you like (and anyone else that wants a few samples)

regards
Lachlan

EDIT: Beware; I messed around with some of the color constants in the attached code - turned down the brightness by editing the constants, most are at "half brightness"

JonnyMac · 2013-07-18 14:50

Lachlan,

Thanks for the tip on the WS2812B -- I have a couple projects that they will be great for. I've already created a component and pattern for DipTrace. And, yes, I would love samples. Thanks!

Will have a look at the adjusted constants. As I admit in the code, I liberated those from another program. Will do likewise with yours!

Cluso99 · 2013-07-18 18:48

The WS2812B specs look great! I will email you Lachlan.

tonyp12 · 2013-07-18 19:51

I think I got everyone beat, 2 extra longs in shift routine vs the 12longs used now.

00 rr gg bb  'starting out with
bb 00 rr gg  'ror 8
gg bb 00 rr  'rol nbits (24)
bb 00 rr gg  'after 8 rol colorbits, #1
rr gg bb 00  'rol nbits (16)
gg bb 00 rr  'after 8 rol colorbits, #1
bb 00 rr gg  'rol nbits (8)

shiftout              [b]  ror[/b]     colorbits, #8                   ' to off set the rol 24 below
                        mov     nbits,#24                       ' shift 24 bits (3 x 8) 
                        
:loop                 [b]  test    nbits,#7                wz      ' at 24, 16 or 8?
        if_z            rol     colorbits,nbits [/b] 
                        [b]rol[/b]     colorbits, #1           wc      ' msb --> C

JonnyMac · 2013-07-18 20:37

Tony,

That looked intriguing so I plugged it into my code and.... no dice. It seems like it should work, but it doesn't.

tonyp12 · 2013-07-18 20:49

you have to change the RCL to a ROL, as you have no need to rol C in to the value anyway.

ROL D,S Rotate D left by S: C= D[31]
RCL D,S Rotate carry left into D by S: C= D[31]

:loop                   rcl     colorbits, #1           wc      ' [b]C--> lsb[/b] ,result msb --> C
:loop                  [b] rol  [/b]   colorbits, #1           wc      ' result msb --> C

I updated the code on page1

JonnyMac · 2013-07-18 21:31

That worked. I'm going to go ahead and leave it in the code and document it in my column -- you'll get full credit for the trick, of course!

RobertDyer · 2013-07-19 16:44

Glad these posts are here. I'm struggling to write my first PASM code trying to drive a stick of 8 WS2812s from Adafruit. The code appears to be working correctly as I step through the debugger, but doesn't manipulate the output pin when I run it at full-speed. Hopefully I'll find my problem after sifting through some of the code here. However, that's not what I'm writing about. Just wanted to kick in a correction for the timing on the chips. All the different timing numbers for the high and low times for the high and low bits made me suspicious. It also looked weird to me that the 0-bit and 1-bit total times were different. So I checked with Adafruit and they agreed that the datasheet probably "lost something in translation". The telling numbers are the "800kHz operation frequency" and the "1.25uS data transfer time" (which happen to match). They suggest keeping the 1.25uS bit-time for both 0 and 1, but keeping closer to a 1/3-2/3 and 2/3-1/3 timing ratios. So more like .4uS high with .85uS low for a 0-bit, and .8uS high with .45uS low for a 1-bit. Obviously you guys are successfully using the times in the errant datasheet, but maybe this'll help reliability??? Robert

tonyp12 · 2013-07-19 17:08

I also think it's some type of misprint.
It would simplify the code a little if you only need to supply two longs for timing, one for total length of 125us and one for 35us
The below code keeps the rising edge at the same interval time, independent of code length.

shiftout                ror     colorbits, #8                   ' to off set the rol 24 below
                        mov     nbits,#24                       ' shift 24 bits (3 x 8) 
                        mov     bittimer,_125us                 ' total bittime  
                        add     bittimer,cnt                    ' init the first one

:loop                   test    nbits,#7                wz      ' at 24, 16 or 8?
               if_z     rol     colorbits,nbits                 ' shuffle rrggbb to ggrrbb
                        rol     colorbits, #1           wc      ' msb --> C
                        mov     cnt,_35us                       ' sync bit timer for a 0
               if_c     add     cnt,_35us                       ' double it for a 1
                        add     cnt,cnt
                        or      outa, txmask                    ' tx line 1
                        waitcnt cnt,#0
                        andn    outa, txmask                    ' tx line 0 
                        waitcnt bittimer,_125us                 ' total bitime
                        djnz    nbits, #:loop                   ' next bit

JonnyMac · 2013-07-19 18:33

Robert,

My final code does in fact use values right out of the WS2812 data sheet. I tried other values but it stops the driver from working correctly. This could be due to 3.3v from the Propeller going into the module (technically, 3.3v is below the Vih spec for the chip). I've attached my final driver in the event it might be useful to you.

As you can see, I've got three Adafruit NeoPixels on an Activity Board and they're doing what I want them to do.

Tubular · 2013-07-19 19:46

Hi Robert,

I think the your suspicions are right about the datasheet. However we don't know how they decode the pulses - there's a good chance it involves some kind of voltage averaging (comparing the data in waveform against its own longer term average), may not even involve any type of clock (its a self clocking signal in after all).

I'm in the process of testing the WS2812B one parameter at a time ("WS2812B, the missing manual"). I've been looking at the V-I characteristics and thermal performance first (had Adafruits 40 neopixel shield up at 108 C so far). Next up I want to look at switching thresholds on the Din, and then I'll get onto timing. The prop is plenty fast enough to explore all the timing limits (I'm sure we can make a multi string controller in one cog, in fact).

As john says we all have slightly different code working fine, so happy to look at yours and offer suggestions if you get really stuck. Otherwise keep going!
Here's a vid of a propeller proto board driving a few forms http://youtu.be/z6vhASDaWIw

Let me get to those timing tests. I changed the way the "timing ticks" are calculated from Gavin's original way. I have been able to successfully run faster and slower than recommended, but I didn't record the limits. I think from memory 0.2us and 0.6us worked, but I'll revisit that soon and document it.

RobertDyer · 2013-07-22 14:54

Hey guys, Still plugging away at my coding problems. Want to make sure I exhaust my efforts before sending it to you guys. My problem may be that I don't understand how the WS2812s actually work. For synchronous timing purposes I thought I needed to read the cnt register once at the beginning of sending ALL the bits for ALL the LEDs. But if I understand some of the code I'm looking at, that's not what everyone is doing. I see multiple captures of the cnt register in the loops. Where am I going wrong in my thinking? Also, how do I insert a snippet of code into a forum post? I tried to "cut and paste" both out of notepad and the Propeller tool, and also tried typing it directly into the editing window, but every time I preview the post, the column spacing is gone whether I use tabs or spaces. How do you guys do it? Thanks. Robert

tonyp12 · 2013-07-22 15:25

Use

As the WS2812 have different timing for a 0,1 bit and also different (supposedly needed) for the total length if the bit was a 0 or 1.
That is why you need multiple waitcnt with different values.

What you have to learn with waitcnt, that you wait for a time to be equal but then you also add a value good towards the next wait (if not needed use #0)
cnt on the destination side (add cnt,cnt) is just a way to use the free shadow register and the real cnt always have to be on the source side.

RobertDyer · 2013-07-22 15:58

Sorry Tony, I wasn't clear with my question. I understand about using waitcnt with a future value in the target register, and then adding values to the target register for the next waitcnt instruction. (I use the label CntSave to save cnt as my target register, others use bittimer or counter as their labels). But what I noticed is that there are multiple times when people move cnt into their target register DURING the bit-streams for each LED. Unless you allow for it, doesn't reloading the target register with cnt (anywhere in the bit-stream other than the very beginning) throw off the synchronization of the subsequent bits for the LED? Also, doesn't the WS2812 require that there be no extra time between the 24-bits of each LED? Robert

JonnyMac · 2013-07-22 17:38

Robert,

Below you'll find the latest code from my driver. This uses Tony's register manipulation trick so that the $RR_GG_BB value passed to the routine gets transmitted as $GG_RR_BB.

The high side pulse timing is most critical (it establishes the bit value), so I load the timer, take the TX line high, then add in the cnt register; in my mind this helps the code keep the pulse timing as close as possible. Sure there's a little slop on the low side, but this is well within the tolerances of the spec.

shiftout                ror     colorbits, #8                   ' {1} to offset the rol 24 below
                        mov     nbits, #24                      ' shift 24 bits (3 x 8) 
                        
:loop                   test    nbits, #%111            wz      ' {2} nbits at 24, 16 or 8?
        if_z            rol     colorbits, nbits                ' if yes, modify colorbits 
                        rol     colorbits, #1           wc      ' msb --> C
        if_c            mov     bittimer, bit1hi                ' set bit timing  
        if_nc           mov     bittimer, bit0hi                
                        or      outa, txmask                    ' tx line 1  
                        add     bittimer, cnt                   ' sync bit timer  
        if_c            waitcnt bittimer, bit1lo                
        if_nc           waitcnt bittimer, bit0lo 
                        andn    outa, txmask                    ' tx line 0             
                        waitcnt bittimer, #0                    ' hold while low
                        djnz    nbits, #:loop                   ' next bit

tonyp12 · 2013-07-22 18:36

> doesn't reloading the target register with cnt (anywhere in the bit-stream other than the very beginning) throw off the synchronization
Yes it would as the length of the code would add around 0.6us, in my code in #24 post I do use fixed interval timer.
The length of the bit is more important as the LED will sync itself to the rising edge of each bit it get in.

Manipulating Bits in a LONG

Comments