Shop OBEX P1 Docs P2 Docs Learn Events
Any shorter way to reverse bytes in long using only 1 extra long (in pasm)? — Parallax Forums

Any shorter way to reverse bytes in long using only 1 extra long (in pasm)?

Cluso99Cluso99 Posts: 18,069
edited 2013-04-27 16:09 in Propeller 1
Here is how I am reversing the byte order in a long.
I only use 1 additional cog long.

Can anyone find a shorter way?
                                                        ' x: b0  b1  b2  b3
                movs    lmm_w, lmm_x                    '    x   x   x   b3
                rol     lmm_x, #8                       ' x: b1  b2  b3  b0 
                ror     lmm_w, #8                       '    b3  x   x   x
                movs    lmm_w, lmm_x                    '    b3  x   x   b0
                rol     lmm_x, #8                       ' x: b2  b3  b0  b1
                ror     lmm_w, #8                       '    b0  b3  x   x
                movs    lmm_w, lmm_x                    '    b0  b3  x   b1
                rol     lmm_x, #8                       ' x: b3  b0  b1  b2
                and     lmm_x, #$FF                     ' x: 0   0   0   b2
                ror     lmm_w, #8                       '    b1  b0  b3  x
                andn    lmm_w, #$FF                     '    b1  b0  b3  0
                or      lmm_x, lmm_w                    ' x: b1  b0  b3  b2
                ror     lmm_x, #16                      ' x: b3  b2  b1  b0

Comments

  • kuronekokuroneko Posts: 3,623
    edited 2013-04-24 17:43
    I believe 6 insn is the official solution (not counting counter nightmares).

    http://forums.parallax.com/showthread.php/89910-Endian-Puzzle
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-04-25 06:26
    Thanks for the link kuroneko. I will take a good look at the solutions in the morning.

    Perhaps I should explain my restrictions some more. It's for my debugger which runs in LMM mode. It can only use normal P1 instructions that are also on the P2. I am only allowing 2 actual cog locations of which one has both the original and final value. If 3 cog locations are required, then I may incur a 4 instruction penalty because I have to push and pop on my hubstack. This also means that a constant greater than 9 bits (an immediate source value) canno be uusd without a large penalty.
  • Mark_TMark_T Posts: 1,981
    edited 2013-04-25 07:41
    Cluso99 wrote: »
    Thanks for the link kuroneko. I will take a good look at the solutions in the morning.

    Perhaps I should explain my restrictions some more. It's for my debugger which runs in LMM mode. It can only use normal P1 instructions that are also on the P2. I am only allowing 2 actual cog locations of which one has both the original and final value. If 3 cog locations are required, then I may incur a 4 instruction penalty because I have to push and pop on my hubstack. This also means that a constant greater than 9 bits (an immediate source value) canno be uusd without a large penalty.

    Well if you want to avoid constants the 00FF00FF mask can be built into the temporary register thus:
                    mov   t, #$FF   ' build $00FF00FF
                    shl   t, #16
                    or    t, #$FF
                    ror   x, #8      ' do the byte swap
                    and   t, x
                    xor   x, t
                    rol   t, #16
                    or    x, t   ' result in x
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-04-25 08:31
    mark_t, you missed the 2nd copy.

    I think the following will work (after looking on the other thread) in 8 instructions plus 2 cog locations x & y.
                                    'x = b3 b2 b1 b0
                  mov       y,x     'y = b3 b2 b1 b0
                  shr       y,#8    'y = 0  b3 b2 b1
                  ror       y,#8    'y = b1 0  b3 b2
                  shr       y,#8    'y = 0  b1 0  b3
                  rol       x,#8    'x = b2 b1 b0 b3
                  xor       x,y     'x = b2 0  b0 0
                  ror       x,#16   'x = b0 0  b2 0
                  or        x,y     'x = b0 b1 b2 b3
    
  • Mark_TMark_T Posts: 1,981
    edited 2013-04-25 11:53
    Cluso99 wrote: »
    mark_t, you missed the 2nd copy.

    2nd copy? My code works on Prop 1...
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-04-25 17:59
    My apologies mark_t. I was looking for a duplicate copy in another register. But you nicely obtained that straight into t using the AND.
    Here is your code with bytes shown...
                                    'x = b3 b2 b1 b0
                    mov   t, #$FF                           ' build $00FF00FF
                    shl   t, #16
                    or    t, #$FF   't = 00 FF 00 FF
                    ror   x, #8     'x = b0 b3 b2 b1        ' do the byte swap
                    and   t, x      't = 0  b3 0  b1
                    xor   x, t      'x = b0 0  b2 0
                    rol   t, #16    't = 0  b1 0  b3
                    or    x, t      'x = b0 b1 b2 b3        ' result in x              
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-04-25 19:24
    Here is my working LMM version...
    '----- on entry...                                      ' x= b0 b1 b2 b3
                    rdlong  lmm_w, lmm_pc                   '\ lmm_w = next long
                    long    $FF00FF00                       '/ (executes as 'nop' on both P1 & P2)
                                                            ' w: FF 0  FF 0
                    and     lmm_w, lmm_x                    '    b0 0  b2 0
                    xor     lmm_x, lmm_w                    ' x: 0  b1 0  b3
                    ror     lmm_x, #8                       '    b3 0  b1 0
                    rol     lmm_w, #8                       ' w: 0  b2 0  b0
                    or      lmm_x, lmm_w                    ' x: b3 b2 b1 b0
    '----- on exit...                                       ' x= b3 b2 b1 b0
    
  • kuronekokuroneko Posts: 3,623
    edited 2013-04-25 19:28
    Cluso99 wrote: »
    Here is my working LMM version...
    '----- on entry...                                      ' x= b0 b1 b2 b3
                    rdlong  lmm_w, lmm_pc                   '\ lmm_w = next long
                    long    $FF00FF00                       '/ (executes as 'nop' on both P1 & P2)
                                                            ' w: FF 0  FF 0
                    and     lmm_w, lmm_x                    '    b0 0  b2 0
                    xor     lmm_x, lmm_w                    ' x: 0  b1 0  b3
                    ror     lmm_x, #8                       '    b3 0  b1 0
                    rol     lmm_w, #8                       ' w: 0  b2 0  b0
                    or      lmm_x, lmm_w                    ' x: b3 b2 b1 b0
    '----- on exit...                                       ' x= b3 b2 b1 b0
    
    Where do you initialise lmm_x (for the and)? Ah, scratch that, it seems to be the input parameter (was hard to read initially).
  • vic20manvic20man Posts: 3
    edited 2013-04-26 09:14
    Hi Cluso99,
    I have written a small spin program,that reverses the order of the bytes, and you don't need any shifts or masks and you should be able to code it in 6 PASM instructions, and it only uses 1 Long..

    Vic20Man


    CON
    _clkmode = xtal1 + pll8x 'Standard clock mode * crystal frequency = 80 MHz
    _xinfreq = 10_000_000

    VAR
    long lng
    long lng1

    OBJ

    pst : "Parallax Serial Terminal"

    PUB main
    pst.start(250_000)
    pst.clear
    ' pst.str(string("test"))
    lng:=000011_00000010_00000001_00000000
    pst.dec(lng.byte[0])
    pst.Tab
    pst.dec(lng.byte[1])
    pst.tab
    pst.dec(lng.byte[2])
    pst.tab
    pst.dec(lng.byte[3])

    '********* code below does the byte swapping *************
    lng1.byte[0]:=lng.byte[1]
    lng1.byte[1]:=lng.byte[0]
    lng1.byte[2]:=lng.byte[3]
    lng1.byte[3]:=lng.byte[2]
    lng.word[0]:=lng1.word[1]
    lng.word[1]:=lng1.word[0]
    '********************* end of code ******************************

    pst.newline
    pst.dec(lng.byte[0])
    pst.Tab
    pst.dec(lng.byte[1])
    pst.tab
    pst.dec(lng.byte[2])
    pst.tab
    pst.dec(lng.byte[3])
  • Mark_TMark_T Posts: 1,981
    edited 2013-04-26 09:38
    vic20man wrote: »
    Hi Cluso99,
    I have written a small spin program,that reverses the order of the bytes, and you don't need any shifts or masks and you should be able to code it in 6 PASM instructions, and it only uses 1 Long..

    Alas consecutive hub ops take 16 cycles each, rather than 4 for normal cog ops, so your 6 PASM instructions will take as long as 21 non-hub ops (on Prop 1 that is) - and you'd need to store a hub address constant in a register too.
  • vic20manvic20man Posts: 3
    edited 2013-04-26 09:46
    Hi Mark_T,

    Thank you for pointing that out about the extra clocks taken to write to the hub.

    vic20man
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-04-26 16:13
    vic20man wrote: »
    Hi Cluso99,
    I have written a small spin program,that reverses the order of the bytes, and you don't need any shifts or masks and you should be able to code it in 6 PASM instructions, and it only uses 1 Long..

    It is actually much worse than Mark_T stated. You are running spin which is an interpreted language. Each bytecode (you may have a number of them for each instruction) takes more than 25 pasm instructions, and more likely ~50 or more. So we say spin is typically 50-100 times slower than pasm.

    Another point is that pasm and spin cannot really be mixed in the same cog. In spin, the cog is running pasm code that is the interpreter, and it fills the cog. It is not impossible because a few of us have done this.

    When we run LMM, the code is actually in hub, and copied instruction by instruction to cog for execution. Therefore in reality a rdlong takes no more than a mov in LMM mode because each LMM instruction takes at least 4 (in my case 5) instructions in a loop. That is why LMM mode runs at least 4-8 times slower than real pasm (i.e. pasm running within the cog).
  • vic20manvic20man Posts: 3
    edited 2013-04-26 19:56
    Hi Cluso99,

    Thank you for pointing out a few things about Spin and LMM code, below is a PASM program that should work.

    vic20man

    CON
    _clkmode = xtal1 + pll8x 'Standard clock mode * crystal frequency = 80 MHz
    _xinfreq = 10_000_000

    VAR
    long cogReady

    OBJ

    pst : "Parallax Serial Terminal"

    PUB main
    pst.start(250_000)
    pst.clear
    cognew(@reverseBytesLng,@cogReady)
    repeat while cogReady== 0
    pst.hex(cogReady,8)

    dat

    org $0000
    reverseBytesLng cogid idCog
    mov addrssCogReady,par
    '*****************code performs the long byte(s) reversal******************
    rol lngValue,#16
    mov lngValueCpy,lngValue
    shl lngValue,#8
    and lngValue,bitMsk1
    shr lngValueCpy,#8
    and lngValueCpy,bitMsk2
    or lngValue,lngValueCpy
    '****************************end of code********************************
    wrlong lngValue,addrssCogReady
    cogstop idCog
    lngValue long $01_02_03_04
    bitMsk1 long %11111111_00000000_11111111_00000000
    bitMsk2 long %00000000_11111111_00000000_11111111
    idCog res 1
    addrssCogReady res 1
    parCpy res 1
    lngValueCpy res 1
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-04-26 22:27
    The bottom one is quite nice. These were all basically posted in parts by others.
    '-------------------------------------------------------
    ' on entry                      'x = b3 b2 b1 b0
                  mov       y,x     'y = b3 b2 b1 b0
                  shr       y,#8    'y = 0  b3 b2 b1
                  ror       y,#8    'y = b1 0  b3 b2
                  shr       y,#8    'y = 0  b1 0  b3
                  rol       x,#8    'x = b2 b1 b0 b3
                  xor       x,y     'x = b2 0  b0 0
                  ror       x,#16   'x = b0 0  b2 0
                  or        x,y     'x = b0 b1 b2 b3
    '-------------------------------------------------------
    ' on entry                      'x = b3 b2 b1 b0
                  mov       y,#$FF  'y = 0  0  0  FF        '\ preset y=$00FF00FF
                  shl       y,#16   'y = 0  FF 0  0         '|  (y can be DIRB or OUTB on P1? - untested)
                  or        y,#$FF  'y = 0  FF 0  FF        '/
    
                  and       y,x     'y = 0  b2 0  b0
                  xor       x,y     'x = b3 0  b1 0 
                  ror       y,#8    'y = b0 0  b2 0 
                  rol       x,#8    'x = 0  b1 0  b3
                  or        x,y     'x = b0 b1 b2 b3
    '-------------------------------------------------------
    ' on entry                      'x = b3 b2 b1 b0
                  mov       y,mask  'y = 00 FF 00 FF
                  and       y,x     'y = 0  b2 0  b0
                  xor       x,y     'x = b3 0  b1 0 
                  ror       y,#8    'y = b0 0  b2 0 
                  rol       x,#8    'x = 0  b1 0  b3
                  or        x,y     'x = b0 b1 b2 b3
    ' exit
    mask          long      $00FF00FF
    '-------------------------------------------------------
    
    
  • Beau SchwabeBeau Schwabe Posts: 6,566
    edited 2013-04-27 10:45
    Would this be cheating? ... it reduces the code by two longs that would otherwise store 'y' and the 'Mask' ... the caveat is that it destroys the mask, which would be ok if you called the PASM routine every time you needed to swap the BYTES. In this way the mask would be restored upon calling the cog containing the routine below.
    '-------------------------------------------------------
    ' on entry                      'x = b3 b2 b1 b0
    y          long      $00FF00FF  'y = 00 FF 00 FF
               and       y,x        'y = 0  b2 0  b0
               xor       x,y        'x = b3 0  b1 0 
               ror       y,#8       'y = b0 0  b2 0 
               rol       x,#8       'x = 0  b1 0  b3
               or        x,y        'x = b0 b1 b2 b3
    '-------------------------------------------------------
    




    EDIT: Reworked as mentioned in post #17
    '-------------------------------------------------------
    ' on entry                      'x = b3 b2 b1 b0
    y          long      $FF00FF00  'y = FF 00 FF 00
               and       y,x        'y = b3 0  b1 0
               xor       x,y        'x = 0  b2 0  b0 
               ror       x,#8       'x = b0 0  b2 0 
               rol       y,#8       'y = 0  b1 0  b3
               or        x,y        'x = b0 b1 b2 b3
    '-------------------------------------------------------
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-04-27 15:48
    Would this be cheating? ... it reduces the code by two longs that would otherwise store 'y' and the 'Mask' ... the caveat is that it destroys the mask, which would be ok if you called the PASM routine every time you needed to swap the BYTES. In this way the mask would be restored upon calling the cog containing the routine below.
    '-------------------------------------------------------
    ' on entry                      'x = b3 b2 b1 b0
    y          long      $00FF00FF  'y = 00 FF 00 FF
               and       y,x        'y = 0  b2 0  b0
               xor       x,y        'x = b3 0  b1 0 
               ror       y,#8       'y = b0 0  b2 0 
               rol       x,#8       'x = 0  b1 0  b3
               or        x,y        'x = b0 b1 b2 b3
    '-------------------------------------------------------
    
    The problem here Beau is that the long is decoded to a
    rdbyte $180,#0

    However, if you rework the code to use
    y long $FF00FF00
    this instruction becomes a nop because cccc=0000 which never executes (both in P1 & P2 - its a waitvid in P1 and a waitxxx in P2)
    This is what I ultimately did with my LMM posted in #8

    I presume you mean by "called the pasm routine" a cognew/coginit to reload the cog. Otherwise, as you said, y is corrupted from the previous use.
  • Beau SchwabeBeau Schwabe Posts: 6,566
    edited 2013-04-27 16:09
    Cluso99,

    Post #16 reworked
Sign In or Register to comment.