Shop OBEX P1 Docs P2 Docs Learn Events
What is the fastest way to convert 16 "parallel" bytes into 8 "serial" words? — Parallax Forums

What is the fastest way to convert 16 "parallel" bytes into 8 "serial" words?

agsags Posts: 386
edited 2011-03-14 22:08 in Propeller 1
(in PASM, of course).

In other words, starting with 16 bytes, ordered as byte0..byte15, I want to take bit0 from each byte and assemble a word. The output word would be:

word0,bit0 = byte0,bit0
word0,bit1 = byte1,bit0
...
word0,bit15 = byte15,bit0

The next output word would be:

word1,bit0 = byte0,bit1
word1,bit1 = byte1,bit1
...
word1,bit15 = byte15,bit1

and so on, until

word7,bit0 = byte0,bit7
word7,bit1 = byte1,bit7
...
word7,bit15 = byte15,bit7

I prototyped some code to do this some months ago and it seemed to have timing issues. I am new to PASM. I found the MUXC instruction and ended up using that, but perhaps there is a better design pattern to use?

Thanks for any help that is offered.

Comments

  • lonesocklonesock Posts: 917
    edited 2011-03-09 09:54
    Quick questions:

    1 - Are your 16 bytes already stored in the cog RAM?
    2 - If so, are they stored 1 byte per long, or 4 bytes in each long?
    3 - are the byte variables destructible, or do you need to keep their independent values around?
    4 - what kind of time budget do you have (in clocks)?

    Jonathan
  • agsags Posts: 386
    edited 2011-03-09 14:16
    lonesock wrote: »
    Quick questions:

    1 - Are your 16 bytes already stored in the cog RAM?
    2 - If so, are they stored 1 byte per long, or 4 bytes in each long?
    I tried to simplify the question to make it easier to explain and answer. Based on timing, I will have either 8 or 16 longs (16 is prefered) that need to be shifted out serially, first bit0 of all longs, then bit1 of all longs, etc.
    The longs are filled byte-by-byte to convert big-endian to little-endian, in a different cog's RAM, then sent to the "slave cog" through hub RAM.
    3 - are the byte variables destructible, or do you need to keep their independent values around?
    I do not need to preserve the long values other than to get them shifted out. They can be modified-in-place.
    4 - what kind of time budget do you have (in clocks)?
    I'm running an 80MHz clock. I need to convert (from the original format) and shift each "serial word" (either 8 or 16 bits long, taken from bit<n> of each original word) out a single Propeller output pin in under 4 microseconds.
  • lonesocklonesock Posts: 917
    edited 2011-03-10 09:47
    4ms is plenty of time...I think you could do the mixing in Spin. If you still want to do it in PASM it is certainly doable...I would look into the ROL and ROR commands, coupled with the wc flag. That lets you shuffle the bits around inside a variable, and store the bit that was bit 0 (for ROR) or bit 31 (for ROL) into the C flag. Then you would use a RCL command to shift that bit into your word variable. Let me know if that gets you started, otherwise I should have a bit of time this evening to dummy something up.

    Jonathan
  • agsags Posts: 386
    edited 2011-03-13 10:16
    lonesock wrote: »
    4ms is plenty of time

    I wasn't clear in my response. I have 4 microseconds to perform an entire operation, of which the "parallel-to-serial" step is just one small piece. I haven't done the rest of the code yet so can't estimate very closely, but just getting 8 longs from hub RAM to cog RAM is (roughly) 128 clocks or 1.6 microseconds of the budget. TIme flies when you're having fun...
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-03-13 10:22
    128 clocks @80 MHz is 1.6us, not 1.6ms. See? you've got all kinds of time left! :)

    -Phil
  • tonyp12tonyp12 Posts: 1,951
    edited 2011-03-13 10:32
    This is how I shifted out 24bits from a long (MSB first), without destroying the data.
    But I guess you first want to prepare the data in a specific way.
                  mov       bit_test,#1                     'reset bit mask       
                  shl       bit_test,#24                    'we have 24 bits to shift out (MSB)
    
    loop2         test      serial,bit_test wz              'test serial data bitwize
                  muxnz     outa,#SER                       'answer in z, use it to set Serial pin
                  or        outa,#CLK                       'turn pin on  (clock) 
                  andn      outa,#CLK                       'turn pin off (clock)
                  sar       bit_test,#1 wz                  'shift right
            if_nz jmp       #loop2                          'if not zero, jmp 
    


    But if you edit above code to something like this, and store each parallel byte in a long
    you would not have to pre-arrange the data first.
    With a few changes you could make it store the data in new serialwords instead.
                  
              
                 mov    bit_test,#1              'reset bit mask (LSB first)    
                 mov    counter1,#8              'do 8 bits.  
    loop1        movs   loop2,#mybyte            'reset to mybyte, over wright 0-0
                 mov    bytecnt,#16              'we have 16 parallel bytes 
    
    loop2        test   bit_test,0-0 wz          'test the byte bitwize
                 muxnz  outa,#SER                'answer in z, use it to set Serial pin
                 or     outa,#CLK                'turn pin on  (clock) 
                 andn   outa,#CLK                'turn pin off (clock)
                 add    loop2,#1                 'check next byte by selfmod my code
                 djnz   bytecnt,#loop2           'we have 16 bytes to check, so jmp
                 shl    bit_test,#1              'shift left
                 djnz   counter1, #loop1         '8 bits to check, if not zero jmp
    
    mybyte res 16
    bit_test res 1
    counter1 res 1
    bytecnt res 1
    
    not tested.
  • agsags Posts: 386
    edited 2011-03-14 07:26
    128 clocks @80 MHz is 1.6us, not 1.6ms. See? you've got all kinds of time left! :)

    -Phil

    Ouch, how embarrassing. I need to find the "mu" character somewhere. I have 4 microseconds, not milliseconds.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-03-14 07:32
    You mentioned that the rearranged data gets output to a pin. Is there any reason that it needs to be saved to memory first, or could it be output during the repacking process? Other than writing it to a pin, one bit at a time, are there other protocol considerations, such as start and stop bits, timing, separate clock, that have to be considered?

    -Phil
  • Mike GMike G Posts: 2,702
    edited 2011-03-14 07:43
    I would logical AND the source byte with a mask (%0000_0001) using the wz flag. Shift z flag in to a register. When you have all 16 bits shift the word out. Then shift the mask one to the left and do the whole thing over again. This is not standard serial, right?
  • agsags Posts: 386
    edited 2011-03-14 20:04
    You mentioned that the rearranged data gets output to a pin. Is there any reason that it needs to be saved to memory first, or could it be output during the repacking process? Other than writing it to a pin, one bit at a time, are there other protocol considerations, such as start and stop bits, timing, separate clock, that have to be considered?

    -Phil

    Phil:

    All good questions, right on target. I've tried to simplify the question to not add to much clutter to the forum.

    I need to perform bit-banging to encode the simple 1 and 0 values into the proper self-timed sequences the receivers require (as well as adding start/stop sequences to each instruction packet (26 bits)).

    I also need to use shift registers to expand the number of available output pins (drivers) since I have to use the WIZnet module in Indirect Addressing mode to get the input data rate required.
  • agsags Posts: 386
    edited 2011-03-14 20:14
    Mike G wrote: »
    I would logical AND the source byte with a mask (%0000_0001) using the wz flag. Shift z flag in to a register. When you have all 16 bits shift the word out. Then shift the mask one to the left and do the whole thing over again. This is not standard serial, right?

    OK, that would work. Not sure about the timing until I code it and count clocks.

    What I'm asking/hoping is based on my experience in reading really good PASM code. There seem to be folks that have a combination of experience & talent that can see a problem and create just the tightest, fastest code. When I see it, I think "Oh! of course. That's the way to do it". And I can then tailor it to my needs. I used the MUXC instruction in my prototype, but there were address increments and loops and other time wasters that I presume a very skilled PASM coder could avoid with a different design/patterns/instructions. I've seen really great code that uses instructions in combination with other instructions in combination with side-effects in combination with conditional jumps that takes much less time than my simple, unskilled brute-force method.

    I hope that helps clarify what I'm looking for.

    Thanks.
  • tonyp12tonyp12 Posts: 1,951
    edited 2011-03-14 22:08
    Un-roll the inner loop, and start a counter to generate the clock.
    is probably the fastest way.


    start counter to output pwm on clk pin to match below code-timming
    nop
    test bit_test,mybyte wz 'test the byte bitwize
    muxnz outa,#SER 'answer in z, use it to set Serial pin
    test bit_test,mybyte+1 wz 'test the byte bitwize
    muxnz outa,#SER 'answer in z, use it to set Serial pin
    test bit_test,mybyte wz 'test the byte bitwize
    muxnz outa,#SER 'answer in z, use it to set Serial pin
    test bit_test,mybyte+2 wz 'test the byte bitwize
    muxnz outa,#SER 'answer in z, use it to set Serial pin
    ...
    ... (and so on untill 16 bytes have been done, a 8 rounds outer loop to change bit is not to bad)
Sign In or Register to comment.