PDA

View Full Version : Two pieces of code which should do the same, but they don't ?!?



Patrick1ab
11-08-2010, 12:32 AM
Hi everyone,

I'm scratching my head, why these two pieces of Pasm code behave in a different way, although they are supposed to do exactly the same. My intention is to read the color information from a 960 byte array located at address t1 and to convert it into another format, so I can send it to my lcd.

for example: source bbbbbbbb_gggggggg_rrrrrrrr ...
result rrrrrrgg_ggggggbb_bbbbbb00 ...

1.


:loop rdbyte arg0, t1 ' create a long out of 3 bytes of valuechain...
and arg0, #$FC ' ...using the following pattern:
shl arg0, #12 '((valuechain[x*3+0]&$FC)<<12)|((valuechain[x*3+1]&$FC)<<18)|((valuechain[x*3+2]&$FC)<<24)
mov rgb, arg0 ' where x is the current pixel
add t1, #1
rdbyte arg0, t1
and arg0, #$FC
shl arg0, #18
or rgb, arg0
add t1, #1
rdbyte arg0, t1
and arg0, #$FC
shl arg0, #24
or rgb, arg0
mov arg0, rgb ' write this long sized value into arg0

call #fast_data_out_ ' and pass it on to fast_data_out
add t1, #1 ' increment t1 to get to the first byte of the next pixel
djnz t3, #:loop ' decrement the number of pixels by one and jump if there are further pixels


This one is working fine, but it's not very efficient since I have three rdbytes (worst case 3*22 clock cycles) for every pixel. So I created this one instead:


2.


:loop rdlong rgb, t1
and rgb, mask24bit
shr rgb, #7
movi arg0, rgb
shr rgb, #8
rol arg0,#6
movi arg0, rgb
shr rgb, #8
rol arg0,#6
movi arg0, rgb
ror arg0, #12

call #fast_data_out_
add t1, #3
djnz t3, #:loop

...

mask24bit long $FCFCFC00


When I try to use this code, the picture gets a grayish look and there are some ugly vertical stripes in it. As if something was missing or shifted to the wrong position.

I already tried to figure it out by writing down the register contents after every step. I had no success so far.

Maybe it's because I'm cheating several times... reading long values from a byte array, reading one byte twice all the time?

kuroneko
11-08-2010, 12:40 AM
rdlong always reads from a long aligned address, i.e. addr & -4. You increment by 3 for every loop cycle which doesn't add up for long based access. So if you start at 4n the second cycle reads from (4n+3) & -4 == 4n (again).

Re: hub sync, you could arrange your code so that the second and third rdbyte are in sync (it uses an extra arg1):

:loop rdbyte rgb, t1 ' create a long out of 3 bytes of valuechain...
add t1, #1
and rgb, #$FC ' ...using the following pattern:

rdbyte arg1, t1
add t1, #1
shl rgb, #12 '((valuechain[x*3+0]&$FC)<<12)|((valuechain[x*3+1]&$FC)<<18)|((valuechain[x*3+2]&$FC)<<24)

rdbyte arg0, t1
and arg1, #$FC
shl arg1, #18

or rgb, arg1
and arg0, #$FC
shl arg0, #24
or arg0, rgb

Patrick1ab
11-08-2010, 12:56 AM
Hi kuroneko,

thanks for your reply.

I think I understand... So, now I'm in trouble, I guess.
I could do a rdword and a rdbyte afterwards, but that reduces the efficiency again.
Another possiblity would be to keep reading longs and store the rest of the long in case another pixel follows. Since this rest contains a different number of bytes each loop cycle, this will become a real mess.

kuroneko
11-08-2010, 01:00 AM
I think I understand... So, now I'm in trouble, I guess.

Not necessarily. Either you sync your code similar to what I listed above or you could unroll your loop a bit (space permitting) in that you use 3 rdlongs in a row with slightly different extraction code which would give you 4 pixels per loop cycle (as you pointed out, it doesn't have to be messy though).

Patrick1ab
11-08-2010, 01:10 AM
... you could unroll your loop a bit (space permitting) in that you use 3 rdlongs in a row with slightly different extraction code which would give you 4 pixels per loop cycle (as you pointed out, it doesn't have to be messy though).

The problem with this solution is, that if the picture does not consist of a number of pixels which can be divided by 4, I will get several lines with "random" color. So, I would have to insert three other abort mechanism.

kuroneko
11-08-2010, 01:17 AM
The problem with this solution is, that if the picture does not consist of a number of pixels which can be divided by 4, I will get several lines with "random" color. So, I would have to insert three other abort mechanism.

You also could load the picture with padding (line by line). Your call ;)

Patrick1ab
11-08-2010, 01:35 AM
Damn, the solution to this last problem is quite simple:
One loop reading 4 pixels at once and another loop calling fast_data_out_ according to the number of pixels left. So maybe I read too many values, but they are not being sent to the display ;-)

Patrick1ab
11-09-2010, 12:48 AM
Edit:

I wrote a third piece of code which isn't working correctly. Address calculation should be fine now, because I'm incrementing the pointer by 4 and I'm only reading longs.
My idea was to change the order of the bits while pushing them out, instead of doing these "and, shl/r , rol/r, movi" register operations all the time.



:loop1 rdlong arg0, t1
call #color_data_out_
shl arg0, #24
add t1, #4
rdlong arg2, t1
mov arg1, arg2
shr arg1, #8
or arg0, arg1
call #color_data_out_
mov arg1, arg2
shl arg1, #16
add t1, #4
rdlong arg2, t1
mov arg0, arg2
shr arg0, #16
or arg0, arg1
call #color_data_out_
mov arg0, arg2
shl arg0, #8
call #color_data_out_
add t1, #4

sub t3, #4 wz
if_nz jmp #:loop1



color_data_out_ andn outa, clkpin ' used to send color data
mov phsb, arg0 ' data to write
rol phsb, #16
mov frqa, freqw ' 20MHz write frequency
mov phsa, #0 ' start at clocking at 0

mov ctra, ctramode ' send 18 bits of data @ 20MHz
rol phsb, #1
rol phsb, #1
rol phsb, #1
rol phsb, #1
rol phsb, #1
rol phsb, #19
rol phsb, #1
rol phsb, #1
rol phsb, #1
rol phsb, #1
rol phsb, #1
rol phsb, #19
rol phsb, #1
rol phsb, #1
rol phsb, #1
rol phsb, #1
rol phsb, #1
mov ctra, #0 ' disable
andn outa, clkpin

color_data_out__ret ret


Now I'm getting some kind of ghost images, where red, green and blue are on different lines instead of being unified into one pixel.

kuroneko
11-09-2010, 01:41 AM
How is your data stored? When you read your first long you expect the pixel data in the top 3 bytes. That looks suspiciously like big-endian. Is that correct?

I'm just checking because it looks slightly different from your initial rdbyte example where you read the first pixel components from addr+0, addr+1, addr+2, now (rdlong) you assume the pixel components are stored at addr+1, addr+2, addr+3.

Patrick1ab
11-09-2010, 02:04 AM
The data is stored in the little-endian format:
bbbbbbbb gggggggg rrrrrrrr 00000000

This data is converted into big-endian (with some other modifications like cutting off the lowest bits, etc.). In the initial version it was converted into a format like this, before sending it:
rrrrrrgg ggggbbbb bb000000

Now I'm not doing the conversion anymore, but I'm changing the bit order while sending the data. This is why I wrote a new "color_data_out" method.
The output should still be rrrrrrgg ggggbbbb bb.

kuroneko
11-09-2010, 02:16 AM
The data is stored in the little-endian format:
bbbbbbbb gggggggg rrrrrrrr 00000000

Packed I assume, i.e. the 00000000 should be the blue component of the next pixel (and addresses are increasing from left to right)?

Anyway, in your rdbyte implementation you read blue, green and red in that order (first pixel). Now with rdlong you end up with a value like this (using above data):

$B1_R0_G0_B0
And in your code, after sending it out you shift everything left by #24, i.e. you keep the blue component of pixel 0 and throw away blue of pixel 1.

Patrick1ab
11-09-2010, 02:32 AM
Packed I assume, i.e. the 00000000 should be the blue component of the next pixel (and addresses are increasing from left to right)?

Oops, yes, the 00000000 should be bbbbbbbb of the next pixel.



Anyway, in your rdbyte implementation you read blue, green and red in that order (first pixel). Now with rdlong you end up with a value like this (using above data):
Code:
$B1_R0_G0_B0
And in your code, after sending it out you shift everything left by #24, i.e. you keep the blue component of pixel 0 and throw away blue of pixel 1.

I think I'm getting it now. The Propeller is working in big-endian. This has no effect if I just read byte by byte, but when it comes to reading longs I have to swap most and least significant bytes. Okay, I'll try to modify the code tomorrow.
Thanks for your help ;-)

kuroneko
11-09-2010, 02:36 AM
I think I'm getting it now. The Propeller is working in big-endian. This has no effect if I just read byte by byte, but when it comes to reading longs I have to swap most and least significant bytes. Okay, I'll try to modify the code tomorrow.

The prop is little endian, the register format is bit 31 left, bit 0 right (byte 3 (MSB) left, byte 0 (LSB) right). And the byte read from the lowest address (addr+0) ends up in the lowest byte (bits 7..0).