Better way to tranpose an array?
in ASM, I need to convert an array of 16 26-bit values (in consecutive longs, bits 25:0) to an array of 26 16-bit values (in consecutive longs bits 15:0).
To explain it another way, the first long of the output array would consist of the 16 bit0 bits of the input array. The 2nd output long would be the 16 bit1 bits, etc.
I can't really come up with an elegant way to do this, it just seems like it will have to be done bit by bit by bit. Just wondering if someone has a more clever solution.
To explain it another way, the first long of the output array would consist of the 16 bit0 bits of the input array. The 2nd output long would be the 16 bit1 bits, etc.
I can't really come up with an elegant way to do this, it just seems like it will have to be done bit by bit by bit. Just wondering if someone has a more clever solution.

Comments
shr tableA0,#1 wc rcl tableB,#1 shr tableA1,#1 wc rcl tableB,#1 ' and so on through tableA15 rev tableB,#16 ' correct bit orderThis would handle 16 longs and produce a 16 bit result. You'd put this in a loop that would copy tableB to successive entries in your 26 long table.This is what I have now, not yet tested, but it looks like it's going to take more time than I have. I think I'd be better off eliminating the inner loop and just coding that inline to speed things up per your suggestion. (Next post shows it re-coded in that manner)
'now transpose the input array (16 elements of 26 bits in bits 31:6) to the oputput array (26 elements of 16 bits in bits 15:0) 'inner loop = 20 clocks * 26 passes = 520 clocks outer loop = 28 clocks = 548 clocks * 16 elements = '8786 clocks = prox 110us??? '*16 words = prox 5000 clocks = 65us 'inner loop copies all bits from the current input element to the next bit of every output element 'outer loop steps through all 16 input elements transpose mov iindex,#inwordcnt 'outer loop index, count of input elements mov ipntr,#iarray 'set pointer to 1st element of input array movd mod1,ipntr 'initlz MOD1 instruction to point to input [0] mod1loop mov oindex,#outwordcnt 'inner loop index, count of output elements mov opntr,#oarray 'pointer to 1st element of output array movd mod2,opntr 'modify MOD2 instruction to point to output [0] mod1 shl iarray,#1 WC 'shift current input word left 1, get rslt in C mod2 rol oarray,#1 WC 'put this bit in the current output word add opntr,#1 'modify mod2 instruction to point to next output word movd mod2,opntr djnz oindex,#mod1 'loop add ipntr,#1 'modify mod1 instruction to point to next input word movd mod1,ipntr djnz iindex,#mod1loop temp RES 1 temp2 RES 1 temp3 RES 1 temp4 RES 1 iindex RES 1 'outer loop index for transpose oindex RES 1 'inner loop index for transpose ipntr RES 1 'pointer into input array for transpose operation opntr RES 0 'pointer into output array for transpose operation oarray RES outwordcnt 'output array that will be copied to hubram iarray RES inwordcnt 'input array copied from hubram'new transpose eliminates inner loop in favor of inline code for speed transpose2 mov oindex,#outwordcnt 'loop index = number of output elements mov opntr,#oarray 'pointer to 1st output array element 'copy the next bit from every input element to a temp :loop shl iarray+0,#1 WC 'loop = 144 clocks * 26 = 3744 clocks + 8 = 3752 clocks = 47us rol otemp,#1 WC shl iarray+1,#1 WC rol otemp,#1 WC shl iarray+2,#1 WC rol otemp,#1 WC shl iarray+3,#1 WC rol otemp,#1 WC shl iarray+4,#1 WC rol otemp,#1 WC shl iarray+5,#1 WC rol otemp,#1 WC shl iarray+6,#1 WC rol otemp,#1 WC shl iarray+7,#1 WC rol otemp,#1 WC shl iarray+8,#1 WC rol otemp,#1 WC shl iarray+9,#1 WC rol otemp,#1 WC shl iarray+10,#1 WC rol otemp,#1 WC shl iarray+11,#1 WC rol otemp,#1 WC shl iarray+12,#1 WC rol otemp,#1 WC shl iarray+13,#1 WC rol otemp,#1 WC shl iarray+14,#1 WC rol otemp,#1 WC shl iarray+15,#1 WC rol otemp,#1 WC movd mod1,opntr 'modify the instruction at mod1 to mov the temp to the current output array element add opntr,#1 'then incr the output array pointer mod1 mov temp,otemp 'save this element djnz oindex,#:loop 'loop for the rest, loop executes 26 times temp RES 1 temp2 RES 1 temp3 RES 1 temp4 RES 1 otemp RES 1 'where we build the output long when transposing the array iindex RES 1 'outer loop index for transpose oindex RES 1 'inner loop index for transpose ipntr RES 1 'pointer into input array for transpose operation opntr RES 0 'pointer into output array for transpose operation oarray RES outwordcnt 'output array that will be copied to hubram iarray RES inwordcnt 'input array copied from hubram