Shop OBEX P1 Docs P2 Docs Learn Events
Easier way to big-endian longs out of assembly? — Parallax Forums

Easier way to big-endian longs out of assembly?

RaymanRayman Posts: 14,793
edited 2007-12-07 23:56 in Propeller 1
I'm not terribly happy with the way I'm exporting 3 longs from assembly to HUB RAM...· I need to have them in big-endian order in HUB RAM, but wrlong puts them out little-endian...· Here's what I have now:

··········
······· 'store 3 longs to output buffer·······
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
·······
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1

Anybody know a better way?

Comments

  • Mark SwannMark Swann Posts: 124
    edited 2007-12-07 16:31
    Rayman,

    You should still be able to use wrlong, if you pre-build the long.
    This should give you fewer instructions and fewer hub writes.

    Lucidman
  • Mark SwannMark Swann Posts: 124
    edited 2007-12-07 16:33
    Rayman,

    You may be able to take advantage of the "movi" instructions to facilitate this.
    Edit[noparse]:([/noparse]actually, "movs" may be better. I haven't thought this through.)

    Lucidman
  • CardboardGuruCardboardGuru Posts: 443
    edited 2007-12-07 16:51
    As a matter of interest, why do you want a long in Big Endian format in Hub RAM?

    I can't think of a better way than what you are doing, other than putting the whole thing in a loop to make it shorter.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Help to build the Propeller wiki - propeller.wikispaces.com
    Play Defender - Propeller version of the classic game
    Prop Room Robotics - my web store for Roomba spare parts in the UK
  • RaymanRayman Posts: 14,793
    edited 2007-12-07 16:57
    I'm creating a bitmap, pixel-by-pixel. Each pixel takes 3 nibbles. Maybe I could construct the data differently, but it hurts my brain to think about it...
  • BaggersBaggers Posts: 3,019
    edited 2007-12-07 17:32
    Hi Rayman,
    Can you give more details on what format the bitmap is?
    is it a word per pixel? RRRRGGGGBBBB???? or ????BBBBGGGGRRRR
    as if we know a little more on what you're input and output data is, we can help you re-arrange it. [noparse];)[/noparse]

    Baggers.
  • Nick MuellerNick Mueller Posts: 815
    edited 2007-12-07 17:34
    wrbyte is the most expensive HUB-access you can have.
    It's getting even more expensive if you don't interleave it with 2 OpCodes (you did that).

    You have to assemble longs in COG-RAM (shift, mask and OR to a scratch register) and then send it out.


    Nick

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Never use force, just go for a bigger hammer!

    The DIY Digital-Readout for mills, lathes etc.:
    YADRO
  • RaymanRayman Posts: 14,793
    edited 2007-12-07 18:52
    The format is (Juicebox .jbp) 12-bits per pixel RRRRGGGG_BBBB. So, I'm using 3 longs because that makes 8 pixels.
    I was hoping somebody knew an easy way to do it... But, this has absolutely no impact on the performace of what I'm doing, because the bitmap is being output to an SD card while being generated, and the SD card is slower than this assembly...

    So, I guess I won't worry about it...
  • Mark SwannMark Swann Posts: 124
    edited 2007-12-07 19:04
    Nick Mueller said...
    wrbyte is the most expensive HUB-access you can have.
    It's getting even more expensive if you don't interleave it with 2 OpCodes (you did that).

    You have to assemble longs in COG-RAM (shift, mask and OR to a scratch register) and then send it out.


    Nick

    I just said that.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2007-12-07 19:59
    I couldn't help seeing this as a golf challenge: Given a long in little endian form, create from it another long in big endian form (i.e. reverse the byte order), leaving the original long intact. Constrained to just the original and result longs, with no other temporaries, this is the best I've come up with so far:

                               '               temp                                 data                carry
                               'xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx  AAAAAAAA|BBBBBBBB|CCCCCCCC|DDDDDDDD  x
            ror    data,#31 wc 'xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx  AAAAAAAB|BBBBBBBC|CCCCCCCD|DDDDDDDA  D
            movi   temp,data   'DDDDDDDD|Axxxxxxx|xxxxxxxx|xxxxxxxx  AAAAAAAB|BBBBBBBC|CCCCCCCD|DDDDDDDA  D
            rol    data,#7     'DDDDDDDD|Axxxxxxx|xxxxxxxx|xxxxxxxx  BBBBBBBB|CCCCCCCC|DDDDDDDD|AAAAAAAA  D
            movs   temp,data   'DDDDDDDD|Axxxxxxx|xxxxxxxD|AAAAAAAA  BBBBBBBB|CCCCCCCC|DDDDDDDD|AAAAAAAA  D
            ror    temp,#8     'AAAAAAAA|DDDDDDDD|Axxxxxxx|xxxxxxxD  BBBBBBBB|CCCCCCCC|DDDDDDDD|AAAAAAAA  D
            rol    data,#8     'AAAAAAAA|DDDDDDDD|Axxxxxxx|xxxxxxxD  CCCCCCCC|DDDDDDDD|AAAAAAAA|BBBBBBBB  D
            movs   temp,data   'AAAAAAAA|DDDDDDDD|AxxxxxxA|BBBBBBBB  CCCCCCCC|DDDDDDDD|AAAAAAAA|BBBBBBBB  D
            ror    temp,#8     'BBBBBBBB|AAAAAAAA|DDDDDDDD|AxxxxxxA  CCCCCCCC|DDDDDDDD|AAAAAAAA|BBBBBBBB  D
            rol    data,#8     'BBBBBBBB|AAAAAAAA|DDDDDDDD|AxxxxxxA  DDDDDDDD|AAAAAAAA|BBBBBBBB|CCCCCCCC  D
            movs   temp,data   'BBBBBBBB|AAAAAAAA|DDDDDDD[b]B[/b]|CCCCCCCC  DDDDDDDD|AAAAAAAA|BBBBBBBB|CCCCCCCC  D
            muxc   temp,#$100  'BBBBBBBB|AAAAAAAA|DDDDDDD[b]D[/b]|CCCCCCCC  DDDDDDDD|AAAAAAAA|BBBBBBBB|CCCCCCCC  D
            rol    temp,#16    'DDDDDDDD|CCCCCCCC|BBBBBBBB|AAAAAAAA  DDDDDDDD|AAAAAAAA|BBBBBBBB|CCCCCCCC  D
            rol    data,#8     'DDDDDDDD|CCCCCCCC|BBBBBBBB|AAAAAAAA  AAAAAAAA|BBBBBBBB|CCCCCCCC|DDDDDDDD  D
    
    
    


    Instruction count: 13.

    -Phil
  • BaggersBaggers Posts: 3,019
    edited 2007-12-07 20:41
      mov    tmp1,data1        '4
      and      tmp1,#$ff       '8
      ror       tmp1,#8           '12
    
      mov     tmp2,data1       '16
      and      tmp2,byte2       '20
      rol        tmp2,#8           '24
      or         tmp1,tmp2       '28
    
      mov      tmp2,data1      '32
      and       tmp2,byte3      '36
      ror        tmp2,#8          '40
      or         tmp1,tmp2       '44
    
      and       data1,byte4     '48
      rol         data1,#8         '52
      or          tmp1,data1     '56
    
      wrlong    tmp1,y           '64
      add        y,#4               '68
    
    byte2 long $0000ff00
    byte3 long $00ff0000
    byte4 long $ff000000
    
    
    



    as you can see, that's 68 clocks per long compared to your original which again is 68 clocks. yet doesn't take up as many longs in cog ram, so it's totally up to you which one you use, but I'd use the wrbyte to save cog space [noparse]:)[/noparse]

    rol data1,#8 '4
    wrbyte data1,y '16
    add y,#1 '20
    rol data1,#8 '24
    wrbyte data1,y '32
    add y,#1 '36
    rol data1,#8 '40
    wrbyte data1,y '48
    add y,#1 '52
    rol data1,#8 '56
    wrbyte data1,y '64
    add y,#1 '68



    So again, although wrbyte can be slow, it can be fast when you compare it to shifting and ORing [noparse];)[/noparse]

    Also, on a side note, can't you just generate the data in it's correct orientation initially? ( ie is the propeller generating the data initially? )

    Hope this helps,
    Baggers.

    edit: Just saw Phil's usage of movs, forgot about using that instruction, [noparse]:D[/noparse] well done Phil.

    Post Edited (Baggers) : 12/7/2007 8:46:43 PM GMT
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2007-12-07 21:34
    Phil,
    Now you know I can't resist a Golf Challenge... smilewinkgrin.gif·· ... 6 instructions
     
                  mov       Original, Data          '<--- Optional (to preserve original data)
                  
                                                    'Data          Temp
                                                    'AA_BB_CC_DD   00_00_00_00
                  ror       Data,   #8              'DD_AA_BB_CC   00_00_00_00
                  mov       temp,   Data            'DD_AA_BB_CC   DD_AA_BB_CC
                  andn      temp,   Mask            'DD_AA_BB_CC   DD_00_BB_00
                  ror       Data,   #16             'BB_CC_DD_AA   DD_00_BB_00
                  and       Data,   Mask            '00_CC_00_AA   DD_00_BB_00
                  or        Data,   temp            'DD_CC_BB_AA   DD_00_BB_00
     
     
    Data          long      $AA_BB_CC_DD
    Original      long      0
    temp          long      0
    Mask          long      $FF_00_FF
     
    

    ·

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.

    Post Edited (Beau Schwabe (Parallax)) : 12/7/2007 11:51:52 PM GMT
  • rokickirokicki Posts: 1,000
    edited 2007-12-07 21:47
    This challenge has already been presented, and some of the solutions are very interesting:

    http://forums.parallax.com/forums/default.aspx?f=25&m=156267
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2007-12-07 21:51
    Beau, that's truly inspired! smile.gif

    -Phil
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2007-12-07 22:00
    Rockiki,

    Wow, I'd forgotten about that — and it was only a year ago, too! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh!

    -Phil
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2007-12-07 22:09
    Phil,

    lol - I forgot about that thread also... It seems as though the magic number of instructions for several of the given solutions was six.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.
  • OzStampOzStamp Posts: 377
    edited 2007-12-07 22:19
    Hey Guy's
    Impressive bit bashing..
    You all crack me up sometimes with your wicked sense of humor or is it humour..a
    Keep those asm snippets coming.
    Can somebdoy make these code samples a sticky or get them in a library..that would be awesome..

    Okay I logged on to find out about.. what do you guy's mean with a "GOLF" challenge..
    Please explain..

    Ron Mel OZ
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2007-12-07 22:24
    GOLF Challenge - It's just a reference to a competitive·challenge to one-up or gain the lead of your opponent, usually under friendly terms.····· ... errr right Phil? wink.gif


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2007-12-07 22:24
    Ron,

    In real golf, the lowest score wins. In computer golf ... well, I'm sure you get it now! smile.gif

    -Phil
  • OzStampOzStamp Posts: 377
    edited 2007-12-07 22:46
    Phill + Beau.. yes understood...

    A bit like being down under (lower... less)..does that mean we in OZ are better...
    Only joking ...

    Really enjoying that asm bit stuff... it reminds me.. of way back.. Z80 stuff.
    ... i see people that totally resort to ASM .. is that odd ?
    No C compiler...you know what I mean...

    cheers guy's

    Ron Mel Oz
  • deSilvadeSilva Posts: 2,967
    edited 2007-12-07 23:17
    There is also a discussion of that thing in deSilva's tutorial (ex08)...
    deSilva got bored of these ever repeating topics ... He's waiting for the C-Compiler
  • RaymanRayman Posts: 14,793
    edited 2007-12-07 23:37
    Thanks guys! It turns out I really do need a faster method because the assembly is slowing down the refresh rate.... So, I'll give the 6-line shuffle a try...
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2007-12-07 23:43
    deSilva,

    I've been wondering ... this other "deSilva" that you refer to in the third person: is he a brother? Cousin, perhaps? smile.gif

    Well, at least he has a good memory. Would that my alter-ego were as retentive!

    -Phil

    (Sorry — just couldn't resist. "Ich" ist nicht hier verboten.)
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2007-12-07 23:44
    Raymon,

    How many extra COGs do you have?

    With one extra COG, you could interleave the Longs, where one COG swapped the ODD_Longs, while the other COG swapped the EVEN_Longs.
    Unless you wanted or needed to, the COG's wouldn't even need to be synced 180 out of Phase.·· ....· Then at least you would have the throughput of
    doing it in 3 instructions.







    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.
  • RaymanRayman Posts: 14,793
    edited 2007-12-07 23:50
    Beau,

    My assembly routine uses one cog to output one horizontal line of 240 pixels in groups of 8 pixels to make 3 longs.
    I suppose I could use another cog to do every other group of 8 pixels.
    But, I think I just need a bit more speed to make it so that the SD card writing is the choke point...
    I think all those wrbytes were slowing me down just a hair too much...

    Ray
  • RaymanRayman Posts: 14,793
    edited 2007-12-07 23:56
    Actually, what I have already is overkill for my target app (photo frame weather station...) I'm planning on a 5-second slideshow mode.... I just need to be able to write out the file in between photo frame reads... Right now, it works with as fast as 2 second updates...

    On the other hand, my new Coby photo frames have more pixels and maybe I will have to use more cogs...

    I'm using 1 cog for Rockiki's SD card interface
    1 cog to generate bitmap lines in assembly
    1 cog in SPIN to monitor the SD card CLK signal and control the writing to the SD card
    1 cog for main SPIN routine
    1 or 2 cogs (I can't remember) for VGA (even though I'm not really using it)...
    1 cog for fullduplexserial

    So, if I don't start VGA, I should have 3 cogs left....

    Post Edited (Rayman) : 12/8/2007 12:03:08 AM GMT
Sign In or Register to comment.