Easier way to big-endian longs out of assembly?
Rayman
Posts: 14,793
I'm not terribly happy with the way I'm exporting 3 longs from assembly to HUB RAM...· I need to have them in big-endian order in HUB RAM, but wrlong puts them out little-endian...· Here's what I have now:
··········
······· 'store 3 longs to output buffer·······
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
·······
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
Anybody know a better way?
··········
······· 'store 3 longs to output buffer·······
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
······· rol data1,#8
······· wrbyte data1,y
······· add y,#1
·······
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data2,#8
······· wrbyte data2,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
······· rol data3,#8
······· wrbyte data3,y
······· add y,#1
Anybody know a better way?
Comments
You should still be able to use wrlong, if you pre-build the long.
This should give you fewer instructions and fewer hub writes.
Lucidman
You may be able to take advantage of the "movi" instructions to facilitate this.
Edit[noparse]:([/noparse]actually, "movs" may be better. I haven't thought this through.)
Lucidman
I can't think of a better way than what you are doing, other than putting the whole thing in a loop to make it shorter.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Help to build the Propeller wiki - propeller.wikispaces.com
Play Defender - Propeller version of the classic game
Prop Room Robotics - my web store for Roomba spare parts in the UK
Can you give more details on what format the bitmap is?
is it a word per pixel? RRRRGGGGBBBB???? or ????BBBBGGGGRRRR
as if we know a little more on what you're input and output data is, we can help you re-arrange it. [noparse];)[/noparse]
Baggers.
It's getting even more expensive if you don't interleave it with 2 OpCodes (you did that).
You have to assemble longs in COG-RAM (shift, mask and OR to a scratch register) and then send it out.
Nick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!
The DIY Digital-Readout for mills, lathes etc.:
YADRO
I was hoping somebody knew an easy way to do it... But, this has absolutely no impact on the performace of what I'm doing, because the bitmap is being output to an SD card while being generated, and the SD card is slower than this assembly...
So, I guess I won't worry about it...
Instruction count: 13.
-Phil
as you can see, that's 68 clocks per long compared to your original which again is 68 clocks. yet doesn't take up as many longs in cog ram, so it's totally up to you which one you use, but I'd use the wrbyte to save cog space [noparse]:)[/noparse]
rol data1,#8 '4
wrbyte data1,y '16
add y,#1 '20
rol data1,#8 '24
wrbyte data1,y '32
add y,#1 '36
rol data1,#8 '40
wrbyte data1,y '48
add y,#1 '52
rol data1,#8 '56
wrbyte data1,y '64
add y,#1 '68
So again, although wrbyte can be slow, it can be fast when you compare it to shifting and ORing [noparse];)[/noparse]
Also, on a side note, can't you just generate the data in it's correct orientation initially? ( ie is the propeller generating the data initially? )
Hope this helps,
Baggers.
edit: Just saw Phil's usage of movs, forgot about using that instruction, [noparse]:D[/noparse] well done Phil.
Post Edited (Baggers) : 12/7/2007 8:46:43 PM GMT
Now you know I can't resist a Golf Challenge... ·· ... 6 instructions
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe
IC Layout Engineer
Parallax, Inc.
Post Edited (Beau Schwabe (Parallax)) : 12/7/2007 11:51:52 PM GMT
http://forums.parallax.com/forums/default.aspx?f=25&m=156267
-Phil
Wow, I'd forgotten about that — and it was only a year ago, too! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh! Well, I guess the best thing about memory loss is that you can keep rehashing stuff and it still seems fresh!
-Phil
lol - I forgot about that thread also... It seems as though the magic number of instructions for several of the given solutions was six.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe
IC Layout Engineer
Parallax, Inc.
Impressive bit bashing..
You all crack me up sometimes with your wicked sense of humor or is it humour..a
Keep those asm snippets coming.
Can somebdoy make these code samples a sticky or get them in a library..that would be awesome..
Okay I logged on to find out about.. what do you guy's mean with a "GOLF" challenge..
Please explain..
Ron Mel OZ
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe
IC Layout Engineer
Parallax, Inc.
In real golf, the lowest score wins. In computer golf ... well, I'm sure you get it now!
-Phil
A bit like being down under (lower... less)..does that mean we in OZ are better...
Only joking ...
Really enjoying that asm bit stuff... it reminds me.. of way back.. Z80 stuff.
... i see people that totally resort to ASM .. is that odd ?
No C compiler...you know what I mean...
cheers guy's
Ron Mel Oz
deSilva got bored of these ever repeating topics ... He's waiting for the C-Compiler
I've been wondering ... this other "deSilva" that you refer to in the third person: is he a brother? Cousin, perhaps?
Well, at least he has a good memory. Would that my alter-ego were as retentive!
-Phil
(Sorry — just couldn't resist. "Ich" ist nicht hier verboten.)
How many extra COGs do you have?
With one extra COG, you could interleave the Longs, where one COG swapped the ODD_Longs, while the other COG swapped the EVEN_Longs.
Unless you wanted or needed to, the COG's wouldn't even need to be synced 180 out of Phase.·· ....· Then at least you would have the throughput of
doing it in 3 instructions.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe
IC Layout Engineer
Parallax, Inc.
My assembly routine uses one cog to output one horizontal line of 240 pixels in groups of 8 pixels to make 3 longs.
I suppose I could use another cog to do every other group of 8 pixels.
But, I think I just need a bit more speed to make it so that the SD card writing is the choke point...
I think all those wrbytes were slowing me down just a hair too much...
Ray
On the other hand, my new Coby photo frames have more pixels and maybe I will have to use more cogs...
I'm using 1 cog for Rockiki's SD card interface
1 cog to generate bitmap lines in assembly
1 cog in SPIN to monitor the SD card CLK signal and control the writing to the SD card
1 cog for main SPIN routine
1 or 2 cogs (I can't remember) for VGA (even though I'm not really using it)...
1 cog for fullduplexserial
So, if I don't start VGA, I should have 3 cogs left....
Post Edited (Rayman) : 12/8/2007 12:03:08 AM GMT