Shop OBEX P1 Docs P2 Docs Learn Events
help with PASM loop rollup — Parallax Forums

help with PASM loop rollup

ericballericball Posts: 774
edited 2010-08-27 07:59 in Propeller 1
Here is a challenge - any suggestions on how to roll up the following code to make it smaller?
ShiftRows
		MOV	in, istate
		CALL	#SubByte
		MOV	ostate, SBvar
		CALL	#SubByte0
		SHL	SBvar, #8
		MOV	ostate+3, SBvar
		CALL	#SubByte0
		SHL	SBvar, #16
		MOV	ostate+2, SBvar
		CALL	#SubByte0
		SHL	SBvar, #24
		MOV	ostate+1, SBvar

		MOV	in, istate+1
		CALL	#SubByte
		OR	ostate+1, SBvar
		CALL	#SubByte0
		SHL	SBvar, #8
		OR	ostate, SBvar
		CALL	#SubByte0
		SHL	SBvar, #16
		OR	ostate+3, SBvar
		CALL	#SubByte0
		SHL	SBvar, #24
		OR	ostate+2, SBvar

		MOV	in, istate+2
		CALL	#SubByte
		OR	ostate+2, SBvar
		CALL	#SubByte0
		SHL	SBvar, #8
		OR	ostate+1, SBvar
		CALL	#SubByte0
		SHL	SBvar, #16
		OR	ostate, SBvar
		CALL	#SubByte0
		SHL	SBvar, #24
		OR	ostate+3, SBvar

		MOV	in, istate+3
		CALL	#SubByte
		OR	ostate+3, SBvar
		CALL	#SubByte0
		SHL	SBvar, #8
		OR	ostate+2, SBvar
		CALL	#SubByte0
		SHL	SBvar, #16
		OR	ostate+1, SBvar
		CALL	#SubByte0
		SHL	SBvar, #24
		OR	ostate, SBvar
<other stuff>
SubByte0	SHR	in, #8
SubByte	MOV	SBvar, in		' SBvar = Sboxptr[in]
		AND	SBvar, #$FF
		ADD	SBvar, SBoxptr
		RDBYTE	SBvar, SBvar
SubByte0_ret
SubByte_ret	RET


Obviously there are many pieces not shown here. The important criteria is to make ShiftRows require the least number of longs overall (including any temporary storage & initialization, note: istate does not need to be preserved); speed of execution is not important. Normally this could be done in a loop but because the ostate indexes wrap around, I don't think that's possible.

Comments

  • ericballericball Posts: 774
    edited 2010-08-25 17:09
    After some thinking around corners, I realized that I could squeeze the routines into two loops by putting istate after ostate and doing the LSbyte last I would overlap the output of the later words with the output of the initial words. It's about half the size of the above routine:
    ShiftRows
    		MOV	ostate+1, #0
    		MOV	ostate+2, #0
    		MOV	ostate+3, #0
    		MOVS	SR1, #istate
    		MOVS	SR3, #8
    		MOVD	SR4, #ostate+3
    		MOVD	SR5, #istate
    
    		MOV	SRloop1, #4
    SR1		MOV	in, istate+0
    		ADD	SR1, #1
    		MOV	SRloop2, #4
    SR2		CALL	#SubByte0
    SR3		ROL     SBVar, #8+0
    		ADD	SR3, #8
    SR4		OR	ostate+3, SBvar
    		SUB	SR4, D1
    		DJNZ	SRloop2, #SR2
    SR5		MOV	istate+0, SBvar
    		ADD	SR5, D1
    		ADD	SR4, D3			' reset & advance 1
    		DJNZ	SRloop1, #SR1
    		OR	istate+1, ostate+1
    		OR	istate+2, ostate+2
    		OR	istate+3, ostate+3
    
    
    <other stuff>
    SubByte0	ROR	in, #8
    SubByte	MOV	SBvar, in		' SBvar = Sboxptr[in]
    		AND	SBvar, #$FF
    		ADD	SBvar, SBoxptr
    		RDBYTE	SBvar, SBvar
    SubByte0_ret
    SubByte_ret	RET
    
    D1		LONG	1<<9
    D3		LONG	1<<9
    ostate		RES	4
    istate		RES	4
    SRloop1		RES	1
    SRloop2		RES	1
    
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-25 18:59
    My (minor) headache version requires re-thinking.
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-25 21:39
    updated working version (still with headache)
    CON
      insn_or = %011010_001
      
    DAT             org     0
    
    entry           mov     SubByteTarget, move
                    movs    loop, #istate
                    
                    mov     par, #4
    loop            mov     indx, istate
                    
                    call    #SubByte
                    call    #SubByte0
                    call    #SubByte0
                    call    #SubByte0
    
                    movi    SubByteTarget, #insn_or ' mov -> or
                    add     dstn, #1                ' next row              (%%)
                    add     loop, #1                ' scan istate[0..3]
                    
                    djnz    par, #loop              ' all 4 rows
    
    '<other stuff>
    
    SubByte0        ror     indx, #8
    SubByte         mov     SBvar, indx             ' SBvar = Sboxptr[indx]
                    and     SBvar, #$FF
                    add     SBvar, SBoxptr
                    rdbyte  SBvar, SBvar
    
    ' dstn starts as ostate +1
    '
    ' cycle 1st:    ostate +0 +3 +2 +1   increment +2                       (%%)
    '       2nd:    ostate +1 +0 +3 +2             +3
    '       3rd:    ostate +2 +1 +0 +3             +4
    '       4th:    ostate +3 +2 +1 +0      ostate +1 (ready for next call)
    
                    sub     dstn, #1
                    cmps    dstn, #ostate wc
            if_c    add     dstn, #4
                    movd    SubByteTarget, dstn
    
    ' if e.g. outb is unused the following 3 (active) lines could be written as
    '
    'SubByteShift   shl     SBvar, outb             ' 0 -> 8 -> 16 -> 24 -> 0
    '               add     outb, #8
    
    SubByteShift    shl     SBvar, #0               ' 0 -> 8 -> 16 -> 24 -> 0
                    add     SubByteShift, #8
                    andn    SubByteShift, #%1111_00_111
    
    SubByteTarget   mov     0-0, SBvar
    SubByte0_ret
    SubByte_ret     ret
    
    ' initialised data and/or presets
    
    move            mov     ostate, SBvar
    dstn            long    ostate + 1
    
    ' uninitialised data and/or temporaries
    
    SBoxptr         res     1
    SBvar           res     1
    
    indx            res     1
    ostate          res     4
    istate          res     4
    
                    fit
    
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-25 22:03
    ericball wrote: »
    After some thinking around corners, I realized that I could squeeze the routines into two loops by putting istate after ostate and doing the LSbyte last I would overlap the output of the later words with the output of the initial words.
    Where did you hide all the shifting (#8/#16/#24) in this version?
  • ericballericball Posts: 774
    edited 2010-08-26 05:09
    kuroneko wrote: »
    Where did you hide all the shifting (#8/#16/#24) in this version?

    Good catch. I meant to CALL #SubByte0 to preshift the input so the second byte gets done first and the LSByte done last. And I completely forgot to include the ROL after the CALL to shift the byte into position.
  • ericballericball Posts: 774
    edited 2010-08-27 07:59
    kuroneko wrote:
    just modified my example slightly (forget a blasted wc). It's working now the same way your original code worked. However, I can't really see your second example doing the same. Especially as you don't seem to handle wrapping at all (initial row is supposed to be stored in ostate+0, +3, +2, +1).

    Thanks to "eagle eye" kuroneko for spotting that the output of my revised code still didn't work correctly. I have now reversed the order ostate is updated so the bytes are shifted left rather than right. (Note: the output ends up in istate, which is okay.) Now I need to extend the courtesy and look (and understand) at his code.
Sign In or Register to comment.