help with PASM loop rollup
Here is a challenge - any suggestions on how to roll up the following code to make it smaller?
ShiftRows MOV in, istate CALL #SubByte MOV ostate, SBvar CALL #SubByte0 SHL SBvar, #8 MOV ostate+3, SBvar CALL #SubByte0 SHL SBvar, #16 MOV ostate+2, SBvar CALL #SubByte0 SHL SBvar, #24 MOV ostate+1, SBvar MOV in, istate+1 CALL #SubByte OR ostate+1, SBvar CALL #SubByte0 SHL SBvar, #8 OR ostate, SBvar CALL #SubByte0 SHL SBvar, #16 OR ostate+3, SBvar CALL #SubByte0 SHL SBvar, #24 OR ostate+2, SBvar MOV in, istate+2 CALL #SubByte OR ostate+2, SBvar CALL #SubByte0 SHL SBvar, #8 OR ostate+1, SBvar CALL #SubByte0 SHL SBvar, #16 OR ostate, SBvar CALL #SubByte0 SHL SBvar, #24 OR ostate+3, SBvar MOV in, istate+3 CALL #SubByte OR ostate+3, SBvar CALL #SubByte0 SHL SBvar, #8 OR ostate+2, SBvar CALL #SubByte0 SHL SBvar, #16 OR ostate+1, SBvar CALL #SubByte0 SHL SBvar, #24 OR ostate, SBvar <other stuff> SubByte0 SHR in, #8 SubByte MOV SBvar, in ' SBvar = Sboxptr[in] AND SBvar, #$FF ADD SBvar, SBoxptr RDBYTE SBvar, SBvar SubByte0_ret SubByte_ret RETObviously there are many pieces not shown here. The important criteria is to make ShiftRows require the least number of longs overall (including any temporary storage & initialization, note: istate does not need to be preserved); speed of execution is not important. Normally this could be done in a loop but because the ostate indexes wrap around, I don't think that's possible.

Comments
CON insn_or = %011010_001 DAT org 0 entry mov SubByteTarget, move movs loop, #istate mov par, #4 loop mov indx, istate call #SubByte call #SubByte0 call #SubByte0 call #SubByte0 movi SubByteTarget, #insn_or ' mov -> or add dstn, #1 ' next row (%%) add loop, #1 ' scan istate[0..3] djnz par, #loop ' all 4 rows '<other stuff> SubByte0 ror indx, #8 SubByte mov SBvar, indx ' SBvar = Sboxptr[indx] and SBvar, #$FF add SBvar, SBoxptr rdbyte SBvar, SBvar ' dstn starts as ostate +1 ' ' cycle 1st: ostate +0 +3 +2 +1 increment +2 (%%) ' 2nd: ostate +1 +0 +3 +2 +3 ' 3rd: ostate +2 +1 +0 +3 +4 ' 4th: ostate +3 +2 +1 +0 ostate +1 (ready for next call) sub dstn, #1 cmps dstn, #ostate wc if_c add dstn, #4 movd SubByteTarget, dstn ' if e.g. outb is unused the following 3 (active) lines could be written as ' 'SubByteShift shl SBvar, outb ' 0 -> 8 -> 16 -> 24 -> 0 ' add outb, #8 SubByteShift shl SBvar, #0 ' 0 -> 8 -> 16 -> 24 -> 0 add SubByteShift, #8 andn SubByteShift, #%1111_00_111 SubByteTarget mov 0-0, SBvar SubByte0_ret SubByte_ret ret ' initialised data and/or presets move mov ostate, SBvar dstn long ostate + 1 ' uninitialised data and/or temporaries SBoxptr res 1 SBvar res 1 indx res 1 ostate res 4 istate res 4 fitGood catch. I meant to CALL #SubByte0 to preshift the input so the second byte gets done first and the LSByte done last. And I completely forgot to include the ROL after the CALL to shift the byte into position.
Thanks to "eagle eye" kuroneko for spotting that the output of my revised code still didn't work correctly. I have now reversed the order ostate is updated so the bytes are shifted left rather than right. (Note: the output ends up in istate, which is okay.) Now I need to extend the courtesy and look (and understand) at his code.