Two pieces of code which should do the same, but they don't ?!?

Patrick1ab · 2010-11-07 16:32

Hi everyone,

I'm scratching my head, why these two pieces of Pasm code behave in a different way, although they are supposed to do exactly the same. My intention is to read the color information from a 960 byte array located at address t1 and to convert it into another format, so I can send it to my lcd.

for example: source bbbbbbbb_gggggggg_rrrrrrrr ...
result rrrrrrgg_ggggggbb_bbbbbb00 ...

1.

:loop         rdbyte arg0, t1                   ' create a long out of 3 bytes of valuechain...
              and    arg0, #$FC                 ' ...using the following pattern:
              shl    arg0, #12                  '((valuechain[x*3+0]&$FC)<<12)|((valuechain[x*3+1]&$FC)<<18)|((valuechain[x*3+2]&$FC)<<24)
              mov    rgb, arg0                  ' where x is the current pixel
              add    t1, #1
              rdbyte arg0, t1
              and    arg0, #$FC
              shl    arg0, #18
              or     rgb, arg0
              add    t1, #1
              rdbyte arg0, t1
              and    arg0, #$FC
              shl    arg0, #24
              or     rgb, arg0
              mov    arg0, rgb                  ' write this long sized value into arg0
                        
              call   #fast_data_out_            ' and pass it on to fast_data_out
              add    t1, #1                     ' increment t1 to get to the first byte of the next pixel
              djnz   t3, #:loop                 ' decrement the number of pixels by one and jump if there are further pixels

This one is working fine, but it's not very efficient since I have three rdbytes (worst case 3*22 clock cycles) for every pixel. So I created this one instead:

2.

:loop         rdlong rgb, t1
              and    rgb, mask24bit
              shr    rgb, #7
              movi   arg0, rgb
              shr    rgb, #8
              rol    arg0,#6
              movi   arg0, rgb
              shr    rgb, #8
              rol    arg0,#6
              movi   arg0, rgb
              ror    arg0, #12
              
              call   #fast_data_out_
              add    t1, #3
              djnz   t3, #:loop

...

mask24bit          long    $FCFCFC00

When I try to use this code, the picture gets a grayish look and there are some ugly vertical stripes in it. As if something was missing or shifted to the wrong position.

I already tried to figure it out by writing down the register contents after every step. I had no success so far.

Maybe it's because I'm cheating several times... reading long values from a byte array, reading one byte twice all the time?

kuroneko · 2010-11-07 16:40

rdlong always reads from a long aligned address, i.e. addr & -4. You increment by 3 for every loop cycle which doesn't add up for long based access. So if you start at 4n the second cycle reads from (4n+3) & -4 == 4n (again).

Re: hub sync, you could arrange your code so that the second and third rdbyte are in sync (it uses an extra arg1):

:loop         [COLOR="Red"]rdbyte rgb, t1[/COLOR]                    ' create a long out of 3 bytes of valuechain...
              add    t1, #1
              [COLOR="Red"]and    rgb, #$FC[/COLOR]                  ' ...using the following pattern:

              [COLOR="Green"]rdbyte arg1, t1[/COLOR]
              add    t1, #1
              [COLOR="Red"]shl    rgb, #12[/COLOR]                   '((valuechain[x*3+0]&$FC)<<12)|((valuechain[x*3+1]&$FC)<<18)|((valuechain[x*3+2]&$FC)<<24)

              [COLOR="Blue"]rdbyte arg0, t1[/COLOR]
              [COLOR="Green"]and    arg1, #$FC[/COLOR]
              [COLOR="Green"]shl    arg1, #18[/COLOR]

              [COLOR="Green"]or     rgb, arg1[/COLOR]
              [COLOR="Blue"]and    arg0, #$FC[/COLOR]
              [COLOR="Blue"]shl    arg0, #24[/COLOR]
              [COLOR="Blue"]or     arg0, rgb[/COLOR]

Patrick1ab · 2010-11-07 16:56

Hi kuroneko,

thanks for your reply.

I think I understand... So, now I'm in trouble, I guess.
I could do a rdword and a rdbyte afterwards, but that reduces the efficiency again.
Another possiblity would be to keep reading longs and store the rest of the long in case another pixel follows. Since this rest contains a different number of bytes each loop cycle, this will become a real mess.

kuroneko · 2010-11-07 17:00

Patrick1ab wrote: »

I think I understand... So, now I'm in trouble, I guess.

Not necessarily. Either you sync your code similar to what I listed above or you could unroll your loop a bit (space permitting) in that you use 3 rdlongs in a row with slightly different extraction code which would give you 4 pixels per loop cycle (as you pointed out, it doesn't have to be messy though).

Patrick1ab · 2010-11-07 17:10

kuroneko wrote: »

... you could unroll your loop a bit (space permitting) in that you use 3 rdlongs in a row with slightly different extraction code which would give you 4 pixels per loop cycle (as you pointed out, it doesn't have to be messy though).

The problem with this solution is, that if the picture does not consist of a number of pixels which can be divided by 4, I will get several lines with "random" color. So, I would have to insert three other abort mechanism.

kuroneko · 2010-11-07 17:17

Patrick1ab wrote: »

The problem with this solution is, that if the picture does not consist of a number of pixels which can be divided by 4, I will get several lines with "random" color. So, I would have to insert three other abort mechanism.

You also could load the picture with padding (line by line). Your call

Patrick1ab · 2010-11-07 17:35

Damn, the solution to this last problem is quite simple:
One loop reading 4 pixels at once and another loop calling fast_data_out_ according to the number of pixels left. So maybe I read too many values, but they are not being sent to the display ;-)

Patrick1ab · 2010-11-08 16:48

Edit:

I wrote a third piece of code which isn't working correctly. Address calculation should be fine now, because I'm incrementing the pointer by 4 and I'm only reading longs.
My idea was to change the order of the bits while pushing them out, instead of doing these "and, shl/r , rol/r, movi" register operations all the time.

:loop1        rdlong    arg0, t1
              call      #color_data_out_
              shl       arg0, #24
              add       t1, #4
              rdlong    arg2, t1
              mov       arg1, arg2
              shr       arg1, #8
              or        arg0, arg1
              call      #color_data_out_
              mov       arg1, arg2
              shl       arg1, #16
              add       t1, #4
              rdlong    arg2, t1
              mov       arg0, arg2
              shr       arg0, #16
              or        arg0, arg1
              call      #color_data_out_
              mov       arg0, arg2
              shl       arg0, #8
              call      #color_data_out_
              add       t1, #4 

              sub       t3, #4 wz
        if_nz jmp       #:loop1

color_data_out_  andn      outa, clkpin            ' used to send color data
              mov       phsb, arg0              ' data to write
              rol       phsb, #16
              mov       frqa, freqw             ' 20MHz write frequency
              mov       phsa, #0                ' start at clocking at 0
              
              mov       ctra, ctramode          ' send 18 bits of data @ 20MHz
              rol       phsb, #1
              rol       phsb, #1
              rol       phsb, #1 
              rol       phsb, #1
              rol       phsb, #1
              rol       phsb, #19
              rol       phsb, #1
              rol       phsb, #1
              rol       phsb, #1
              rol       phsb, #1
              rol       phsb, #1
              rol       phsb, #19
              rol       phsb, #1
              rol       phsb, #1
              rol       phsb, #1
              rol       phsb, #1
              rol       phsb, #1
              mov       ctra, #0                ' disable
              andn      outa, clkpin 

color_data_out__ret    ret

Now I'm getting some kind of ghost images, where red, green and blue are on different lines instead of being unified into one pixel.

kuroneko · 2010-11-08 17:41

How is your data stored? When you read your first long you expect the pixel data in the top 3 bytes. That looks suspiciously like big-endian. Is that correct?

I'm just checking because it looks slightly different from your initial rdbyte example where you read the first pixel components from addr+0, addr+1, addr+2, now (rdlong) you assume the pixel components are stored at addr+1, addr+2, addr+3.

Patrick1ab · 2010-11-08 18:04

The data is stored in the little-endian format:
bbbbbbbb gggggggg rrrrrrrr 00000000

This data is converted into big-endian (with some other modifications like cutting off the lowest bits, etc.). In the initial version it was converted into a format like this, before sending it:
rrrrrrgg ggggbbbb bb000000

Now I'm not doing the conversion anymore, but I'm changing the bit order while sending the data. This is why I wrote a new "color_data_out" method.
The output should still be rrrrrrgg ggggbbbb bb.

kuroneko · 2010-11-08 18:16

Patrick1ab wrote: »

The data is stored in the little-endian format:
bbbbbbbb gggggggg rrrrrrrr 00000000

Packed I assume, i.e. the 00000000 should be the blue component of the next pixel (and addresses are increasing from left to right)?

Anyway, in your rdbyte implementation you read blue, green and red in that order (first pixel). Now with rdlong you end up with a value like this (using above data):

$B1_R0_G0_B0

And in your code, after sending it out you shift everything left by #24, i.e. you keep the blue component of pixel 0 and throw away blue of pixel 1.

Patrick1ab · 2010-11-08 18:32

kuroneko wrote:

Packed I assume, i.e. the 00000000 should be the blue component of the next pixel (and addresses are increasing from left to right)?

Oops, yes, the 00000000 should be bbbbbbbb of the next pixel.

kuroneko wrote:

Anyway, in your rdbyte implementation you read blue, green and red in that order (first pixel). Now with rdlong you end up with a value like this (using above data):
Code:
$B1_R0_G0_B0
And in your code, after sending it out you shift everything left by #24, i.e. you keep the blue component of pixel 0 and throw away blue of pixel 1.

I think I'm getting it now. The Propeller is working in big-endian. This has no effect if I just read byte by byte, but when it comes to reading longs I have to swap most and least significant bytes. Okay, I'll try to modify the code tomorrow.
Thanks for your help ;-)

kuroneko · 2010-11-08 18:36

Patrick1ab wrote: »

I think I'm getting it now. The Propeller is working in big-endian. This has no effect if I just read byte by byte, but when it comes to reading longs I have to swap most and least significant bytes. Okay, I'll try to modify the code tomorrow.

The prop is little endian, the register format is bit 31 left, bit 0 right (byte 3 (MSB) left, byte 0 (LSB) right). And the byte read from the lowest address (addr+0) ends up in the lowest byte (bits 7..0).

Two pieces of code which should do the same, but they don't ?!?

Comments