Any shorter way to reverse bytes in long using only 1 extra long (in pasm)?

Cluso99 · 2013-04-24 17:34

Here is how I am reversing the byte order in a long.
I only use 1 additional cog long.

Can anyone find a shorter way?

                                                        ' x: b0  b1  b2  b3
                movs    lmm_w, lmm_x                    '    x   x   x   b3
                rol     lmm_x, #8                       ' x: b1  b2  b3  b0 
                ror     lmm_w, #8                       '    b3  x   x   x
                movs    lmm_w, lmm_x                    '    b3  x   x   b0
                rol     lmm_x, #8                       ' x: b2  b3  b0  b1
                ror     lmm_w, #8                       '    b0  b3  x   x
                movs    lmm_w, lmm_x                    '    b0  b3  x   b1
                rol     lmm_x, #8                       ' x: b3  b0  b1  b2
                and     lmm_x, #$FF                     ' x: 0   0   0   b2
                ror     lmm_w, #8                       '    b1  b0  b3  x
                andn    lmm_w, #$FF                     '    b1  b0  b3  0
                or      lmm_x, lmm_w                    ' x: b1  b0  b3  b2
                ror     lmm_x, #16                      ' x: b3  b2  b1  b0

kuroneko · 2013-04-24 17:43

I believe 6 insn is the official solution (not counting counter nightmares).

http://forums.parallax.com/showthread.php/89910-Endian-Puzzle

Cluso99 · 2013-04-25 06:26

Thanks for the link kuroneko. I will take a good look at the solutions in the morning.

Perhaps I should explain my restrictions some more. It's for my debugger which runs in LMM mode. It can only use normal P1 instructions that are also on the P2. I am only allowing 2 actual cog locations of which one has both the original and final value. If 3 cog locations are required, then I may incur a 4 instruction penalty because I have to push and pop on my hubstack. This also means that a constant greater than 9 bits (an immediate source value) canno be uusd without a large penalty.

Mark_T · 2013-04-25 07:41

Cluso99 wrote: »

Thanks for the link kuroneko. I will take a good look at the solutions in the morning.

Perhaps I should explain my restrictions some more. It's for my debugger which runs in LMM mode. It can only use normal P1 instructions that are also on the P2. I am only allowing 2 actual cog locations of which one has both the original and final value. If 3 cog locations are required, then I may incur a 4 instruction penalty because I have to push and pop on my hubstack. This also means that a constant greater than 9 bits (an immediate source value) canno be uusd without a large penalty.

Well if you want to avoid constants the 00FF00FF mask can be built into the temporary register thus:

                mov   t, #$FF   ' build $00FF00FF
                shl   t, #16
                or    t, #$FF
                ror   x, #8      ' do the byte swap
                and   t, x
                xor   x, t
                rol   t, #16
                or    x, t   ' result in x

Cluso99 · 2013-04-25 08:31

mark_t, you missed the 2nd copy.

I think the following will work (after looking on the other thread) in 8 instructions plus 2 cog locations x & y.

                                'x = b3 b2 b1 b0
              mov       y,x     'y = b3 b2 b1 b0
              shr       y,#8    'y = 0  b3 b2 b1
              ror       y,#8    'y = b1 0  b3 b2
              shr       y,#8    'y = 0  b1 0  b3
              rol       x,#8    'x = b2 b1 b0 b3
              xor       x,y     'x = b2 0  b0 0
              ror       x,#16   'x = b0 0  b2 0
              or        x,y     'x = b0 b1 b2 b3

Mark_T · 2013-04-25 11:53

Cluso99 wrote: »

mark_t, you missed the 2nd copy.

2nd copy? My code works on Prop 1...

Cluso99 · 2013-04-25 17:59

My apologies mark_t. I was looking for a duplicate copy in another register. But you nicely obtained that straight into t using the AND.
Here is your code with bytes shown...

                                'x = b3 b2 b1 b0
                mov   t, #$FF                           ' build $00FF00FF
                shl   t, #16
                or    t, #$FF   't = 00 FF 00 FF
                ror   x, #8     'x = b0 b3 b2 b1        ' do the byte swap
                and   t, x      't = 0  b3 0  b1
                xor   x, t      'x = b0 0  b2 0
                rol   t, #16    't = 0  b1 0  b3
                or    x, t      'x = b0 b1 b2 b3        ' result in x

Cluso99 · 2013-04-25 19:24

Here is my working LMM version...

'----- on entry...                                      ' x= b0 b1 b2 b3
                rdlong  lmm_w, lmm_pc                   '\ lmm_w = next long
                long    $FF00FF00                       '/ (executes as 'nop' on both P1 & P2)
                                                        ' w: FF 0  FF 0
                and     lmm_w, lmm_x                    '    b0 0  b2 0
                xor     lmm_x, lmm_w                    ' x: 0  b1 0  b3
                ror     lmm_x, #8                       '    b3 0  b1 0
                rol     lmm_w, #8                       ' w: 0  b2 0  b0
                or      lmm_x, lmm_w                    ' x: b3 b2 b1 b0
'----- on exit...                                       ' x= b3 b2 b1 b0

kuroneko · 2013-04-25 19:28

Cluso99 wrote: »

Here is my working LMM version...

'----- on entry...                                      ' x= b0 b1 b2 b3
                rdlong  lmm_w, lmm_pc                   '\ lmm_w = next long
                long    $FF00FF00                       '/ (executes as 'nop' on both P1 & P2)
                                                        ' w: FF 0  FF 0
                and     lmm_w, lmm_x                    '    b0 0  b2 0
                xor     lmm_x, lmm_w                    ' x: 0  b1 0  b3
                ror     lmm_x, #8                       '    b3 0  b1 0
                rol     lmm_w, #8                       ' w: 0  b2 0  b0
                or      lmm_x, lmm_w                    ' x: b3 b2 b1 b0
'----- on exit...                                       ' x= b3 b2 b1 b0

Where do you initialise lmm_x (for the and)? Ah, scratch that, it seems to be the input parameter (was hard to read initially).

vic20man · 2013-04-26 09:14

Hi Cluso99,
I have written a small spin program,that reverses the order of the bytes, and you don't need any shifts or masks and you should be able to code it in 6 PASM instructions, and it only uses 1 Long..

Vic20Man

CON
_clkmode = xtal1 + pll8x 'Standard clock mode * crystal frequency = 80 MHz
_xinfreq = 10_000_000

VAR
long lng
long lng1

OBJ

pst : "Parallax Serial Terminal"

PUB main
pst.start(250_000)
pst.clear
' pst.str(string("test"))
lng:=000011_00000010_00000001_00000000
pst.dec(lng.byte[0])
pst.Tab
pst.dec(lng.byte[1])
pst.tab
pst.dec(lng.byte[2])
pst.tab
pst.dec(lng.byte[3])

'********* code below does the byte swapping *************
lng1.byte[0]:=lng.byte[1]
lng1.byte[1]:=lng.byte[0]
lng1.byte[2]:=lng.byte[3]
lng1.byte[3]:=lng.byte[2]
lng.word[0]:=lng1.word[1]
lng.word[1]:=lng1.word[0]
'********************* end of code ******************************

pst.newline
pst.dec(lng.byte[0])
pst.Tab
pst.dec(lng.byte[1])
pst.tab
pst.dec(lng.byte[2])
pst.tab
pst.dec(lng.byte[3])

Mark_T · 2013-04-26 09:38

vic20man wrote: »

Hi Cluso99,
I have written a small spin program,that reverses the order of the bytes, and you don't need any shifts or masks and you should be able to code it in 6 PASM instructions, and it only uses 1 Long..

Alas consecutive hub ops take 16 cycles each, rather than 4 for normal cog ops, so your 6 PASM instructions will take as long as 21 non-hub ops (on Prop 1 that is) - and you'd need to store a hub address constant in a register too.

vic20man · 2013-04-26 09:46

Hi Mark_T,

Thank you for pointing that out about the extra clocks taken to write to the hub.

vic20man

Cluso99 · 2013-04-26 16:13

vic20man wrote: »

Hi Cluso99,
I have written a small spin program,that reverses the order of the bytes, and you don't need any shifts or masks and you should be able to code it in 6 PASM instructions, and it only uses 1 Long..

It is actually much worse than Mark_T stated. You are running spin which is an interpreted language. Each bytecode (you may have a number of them for each instruction) takes more than 25 pasm instructions, and more likely ~50 or more. So we say spin is typically 50-100 times slower than pasm.

Another point is that pasm and spin cannot really be mixed in the same cog. In spin, the cog is running pasm code that is the interpreter, and it fills the cog. It is not impossible because a few of us have done this.

When we run LMM, the code is actually in hub, and copied instruction by instruction to cog for execution. Therefore in reality a rdlong takes no more than a mov in LMM mode because each LMM instruction takes at least 4 (in my case 5) instructions in a loop. That is why LMM mode runs at least 4-8 times slower than real pasm (i.e. pasm running within the cog).

vic20man · 2013-04-26 19:56

Hi Cluso99,

Thank you for pointing out a few things about Spin and LMM code, below is a PASM program that should work.

vic20man

CON
_clkmode = xtal1 + pll8x 'Standard clock mode * crystal frequency = 80 MHz
_xinfreq = 10_000_000

VAR
long cogReady

OBJ

pst : "Parallax Serial Terminal"

PUB main
pst.start(250_000)
pst.clear
cognew(@reverseBytesLng,@cogReady)
repeat while cogReady== 0
pst.hex(cogReady,8)

dat

org $0000
reverseBytesLng cogid idCog
mov addrssCogReady,par
'*****************code performs the long byte(s) reversal******************
rol lngValue,#16
mov lngValueCpy,lngValue
shl lngValue,#8
and lngValue,bitMsk1
shr lngValueCpy,#8
and lngValueCpy,bitMsk2
or lngValue,lngValueCpy
'****************************end of code********************************
wrlong lngValue,addrssCogReady
cogstop idCog
lngValue long $01_02_03_04
bitMsk1 long %11111111_00000000_11111111_00000000
bitMsk2 long %00000000_11111111_00000000_11111111
idCog res 1
addrssCogReady res 1
parCpy res 1
lngValueCpy res 1

Cluso99 · 2013-04-26 22:27

The bottom one is quite nice. These were all basically posted in parts by others.

'-------------------------------------------------------
' on entry                      'x = b3 b2 b1 b0
              mov       y,x     'y = b3 b2 b1 b0
              shr       y,#8    'y = 0  b3 b2 b1
              ror       y,#8    'y = b1 0  b3 b2
              shr       y,#8    'y = 0  b1 0  b3
              rol       x,#8    'x = b2 b1 b0 b3
              xor       x,y     'x = b2 0  b0 0
              ror       x,#16   'x = b0 0  b2 0
              or        x,y     'x = b0 b1 b2 b3
'-------------------------------------------------------
' on entry                      'x = b3 b2 b1 b0
              mov       y,#$FF  'y = 0  0  0  FF        '\ preset y=$00FF00FF
              shl       y,#16   'y = 0  FF 0  0         '|  (y can be DIRB or OUTB on P1? - untested)
              or        y,#$FF  'y = 0  FF 0  FF        '/

              and       y,x     'y = 0  b2 0  b0
              xor       x,y     'x = b3 0  b1 0 
              ror       y,#8    'y = b0 0  b2 0 
              rol       x,#8    'x = 0  b1 0  b3
              or        x,y     'x = b0 b1 b2 b3
'-------------------------------------------------------
' on entry                      'x = b3 b2 b1 b0
              mov       y,mask  'y = 00 FF 00 FF
              and       y,x     'y = 0  b2 0  b0
              xor       x,y     'x = b3 0  b1 0 
              ror       y,#8    'y = b0 0  b2 0 
              rol       x,#8    'x = 0  b1 0  b3
              or        x,y     'x = b0 b1 b2 b3
' exit
mask          long      $00FF00FF
'-------------------------------------------------------

Beau Schwabe · 2013-04-27 10:45

Would this be cheating? ... it reduces the code by two longs that would otherwise store 'y' and the 'Mask' ... the caveat is that it destroys the mask, which would be ok if you called the PASM routine every time you needed to swap the BYTES. In this way the mask would be restored upon calling the cog containing the routine below.

'-------------------------------------------------------
' on entry                      'x = b3 b2 b1 b0
y          long      $00FF00FF  'y = 00 FF 00 FF
           and       y,x        'y = 0  b2 0  b0
           xor       x,y        'x = b3 0  b1 0 
           ror       y,#8       'y = b0 0  b2 0 
           rol       x,#8       'x = 0  b1 0  b3
           or        x,y        'x = b0 b1 b2 b3
'-------------------------------------------------------

EDIT: Reworked as mentioned in post #17

'-------------------------------------------------------
' on entry                      'x = b3 b2 b1 b0
y          long      $FF00FF00  'y = FF 00 FF 00
           and       y,x        'y = b3 0  b1 0
           xor       x,y        'x = 0  b2 0  b0 
           ror       x,#8       'x = b0 0  b2 0 
           rol       y,#8       'y = 0  b1 0  b3
           or        x,y        'x = b0 b1 b2 b3
'-------------------------------------------------------

Cluso99 · 2013-04-27 15:48

Beau Schwabe (Parallax) wrote: »
Would this be cheating? ... it reduces the code by two longs that would otherwise store 'y' and the 'Mask' ... the caveat is that it destroys the mask, which would be ok if you called the PASM routine every time you needed to swap the BYTES. In this way the mask would be restored upon calling the cog containing the routine below.
'-------------------------------------------------------
' on entry                      'x = b3 b2 b1 b0
y          long      $00FF00FF  'y = 00 FF 00 FF
           and       y,x        'y = 0  b2 0  b0
           xor       x,y        'x = b3 0  b1 0 
           ror       y,#8       'y = b0 0  b2 0 
           rol       x,#8       'x = 0  b1 0  b3
           or        x,y        'x = b0 b1 b2 b3
'-------------------------------------------------------

The problem here Beau is that the long is decoded to a
rdbyte $180,#0

However, if you rework the code to use
y long $FF00FF00
this instruction becomes a nop because cccc=0000 which never executes (both in P1 & P2 - its a waitvid in P1 and a waitxxx in P2)
This is what I ultimately did with my LMM posted in #8

I presume you mean by "called the pasm routine" a cognew/coginit to reload the cog. Otherwise, as you said, y is corrupted from the previous use.

Beau Schwabe · 2013-04-27 16:09

Cluso99,

Post #16 reworked

Any shorter way to reverse bytes in long using only 1 extra long (in pasm)?

Comments