What is the fastest way to convert 16 "parallel" bytes into 8 "serial" words?
ags
Posts: 386
(in PASM, of course).
In other words, starting with 16 bytes, ordered as byte0..byte15, I want to take bit0 from each byte and assemble a word. The output word would be:
word0,bit0 = byte0,bit0
word0,bit1 = byte1,bit0
...
word0,bit15 = byte15,bit0
The next output word would be:
word1,bit0 = byte0,bit1
word1,bit1 = byte1,bit1
...
word1,bit15 = byte15,bit1
and so on, until
word7,bit0 = byte0,bit7
word7,bit1 = byte1,bit7
...
word7,bit15 = byte15,bit7
I prototyped some code to do this some months ago and it seemed to have timing issues. I am new to PASM. I found the MUXC instruction and ended up using that, but perhaps there is a better design pattern to use?
Thanks for any help that is offered.
In other words, starting with 16 bytes, ordered as byte0..byte15, I want to take bit0 from each byte and assemble a word. The output word would be:
word0,bit0 = byte0,bit0
word0,bit1 = byte1,bit0
...
word0,bit15 = byte15,bit0
The next output word would be:
word1,bit0 = byte0,bit1
word1,bit1 = byte1,bit1
...
word1,bit15 = byte15,bit1
and so on, until
word7,bit0 = byte0,bit7
word7,bit1 = byte1,bit7
...
word7,bit15 = byte15,bit7
I prototyped some code to do this some months ago and it seemed to have timing issues. I am new to PASM. I found the MUXC instruction and ended up using that, but perhaps there is a better design pattern to use?
Thanks for any help that is offered.
Comments
1 - Are your 16 bytes already stored in the cog RAM?
2 - If so, are they stored 1 byte per long, or 4 bytes in each long?
3 - are the byte variables destructible, or do you need to keep their independent values around?
4 - what kind of time budget do you have (in clocks)?
Jonathan
The longs are filled byte-by-byte to convert big-endian to little-endian, in a different cog's RAM, then sent to the "slave cog" through hub RAM.
I do not need to preserve the long values other than to get them shifted out. They can be modified-in-place.
I'm running an 80MHz clock. I need to convert (from the original format) and shift each "serial word" (either 8 or 16 bits long, taken from bit<n> of each original word) out a single Propeller output pin in under 4 microseconds.
Jonathan
I wasn't clear in my response. I have 4 microseconds to perform an entire operation, of which the "parallel-to-serial" step is just one small piece. I haven't done the rest of the code yet so can't estimate very closely, but just getting 8 longs from hub RAM to cog RAM is (roughly) 128 clocks or 1.6 microseconds of the budget. TIme flies when you're having fun...
-Phil
But I guess you first want to prepare the data in a specific way.
But if you edit above code to something like this, and store each parallel byte in a long
you would not have to pre-arrange the data first.
With a few changes you could make it store the data in new serialwords instead. not tested.
Ouch, how embarrassing. I need to find the "mu" character somewhere. I have 4 microseconds, not milliseconds.
-Phil
Phil:
All good questions, right on target. I've tried to simplify the question to not add to much clutter to the forum.
I need to perform bit-banging to encode the simple 1 and 0 values into the proper self-timed sequences the receivers require (as well as adding start/stop sequences to each instruction packet (26 bits)).
I also need to use shift registers to expand the number of available output pins (drivers) since I have to use the WIZnet module in Indirect Addressing mode to get the input data rate required.
OK, that would work. Not sure about the timing until I code it and count clocks.
What I'm asking/hoping is based on my experience in reading really good PASM code. There seem to be folks that have a combination of experience & talent that can see a problem and create just the tightest, fastest code. When I see it, I think "Oh! of course. That's the way to do it". And I can then tailor it to my needs. I used the MUXC instruction in my prototype, but there were address increments and loops and other time wasters that I presume a very skilled PASM coder could avoid with a different design/patterns/instructions. I've seen really great code that uses instructions in combination with other instructions in combination with side-effects in combination with conditional jumps that takes much less time than my simple, unskilled brute-force method.
I hope that helps clarify what I'm looking for.
Thanks.
is probably the fastest way.
start counter to output pwm on clk pin to match below code-timming
nop
test bit_test,mybyte wz 'test the byte bitwize
muxnz outa,#SER 'answer in z, use it to set Serial pin
test bit_test,mybyte+1 wz 'test the byte bitwize
muxnz outa,#SER 'answer in z, use it to set Serial pin
test bit_test,mybyte wz 'test the byte bitwize
muxnz outa,#SER 'answer in z, use it to set Serial pin
test bit_test,mybyte+2 wz 'test the byte bitwize
muxnz outa,#SER 'answer in z, use it to set Serial pin
...
... (and so on untill 16 bytes have been done, a 8 rounds outer loop to change bit is not to bad)