Shop OBEX P1 Docs P2 Docs Learn Events
Array of Longs in PASM2 — Parallax Forums

Array of Longs in PASM2

JonnyMacJonnyMac Posts: 9,602
edited 2026-01-12 21:04 in PASM2/Spin2 (P2)

While making a suggestion in another post about putting an array into inline PASM (which works), I started some personal experiments and ran into a snag. In short, I'd like to access an array in PASM2 with a variable index, but code like I used to use in the P1 isn't working. Please -- gently -- show me where I've gone wrong. Thanks.

pub method(p_in, p_out)

  org

                        setq #(32-1)
                        rdlong array, p_in

                        mov       i, #0

loop                    mov       pntr, #array
                        add       pntr, i
                        setd      update, pntr
                        nop             
update                  shr       0-0, #1                       ' NOT WORKING       
                        incmod    i, #31                wc
        if_c            jmp       #loop

                        setq #(32-1)   
                        wrlong array, p_out

done                    ret                     

array                   res       32
pntr                    res       1
i                       res       1

  end

Comments

  • TonyB_TonyB_ Posts: 2,265
    edited 2026-01-12 22:11

    Add another nop or
    altd is an alternative that doesn't require nop

    loop                    altd      i, #array
    update                  shr       0-0, #1        
    
  • JonnyMacJonnyMac Posts: 9,602

    That did it. Thank you, Tony.

  • TonyB_TonyB_ Posts: 2,265
    edited 2026-01-12 22:27

    @JonnyMac said:
    That did it. Thank you, Tony.

    setd is an instruction I've never used but altd and alts I use a lot.

  • RaymanRayman Posts: 15,953

    RES in inline assembly? Surprised that works...

    One thing that trips me up a lot is only adding 1 to long pointers when need to add 4...

  • JonnyMacJonnyMac Posts: 9,602

    @Rayman said:
    RES in inline assembly? Surprised that works...

    I was surprised, too. Remember, we have about $120 (288) instructions available for inline assembly, so we do need to be mindful of that, and that variables defined with RES are going to end up with whatever is in the cog RAM at those locations.

    One thing that trips me up a lot is only adding 1 to long pointers when need to add 4...

    Yeah, that sometimes gets me, too.

  • JonnyMacJonnyMac Posts: 9,602

    @TonyB_ said:
    setd is an instruction I've never used but altd and alts I use a lot.

    I was so excited about fixing with another nop that I jumped right to that. Then I popped back and saw the altd -- much nicer and saves a bit of code. This is the adjustment.

    pub method(p_in, p_out)
    
      org
    
                            setq #(32-1)
                            rdlong array, p_in
    
                            mov       i, #0
    
    loop                    altd      i, #array         
    update                  shl       0-0, #2
                            incmod    i, #31                wc
            if_nc           jmp       #loop
    
                            setq #(32-1)   
                            wrlong array, p_out
    
    done                    ret                     
    
    array                   res       32
    i                       res       1
    
      end
    

    Thanks, again!

  • TonyB_TonyB_ Posts: 2,265

    @JonnyMac said:

    @TonyB_ said:
    setd is an instruction I've never used but altd and alts I use a lot.

    I was so excited about fixing with another nop that I jumped right to that. Then I popped back and saw the altd -- much nicer and saves a bit of code.

    Jon, I think you are interested in PASM2 tips and the (untested) code below is optimal for speed. Two improvements: altd/alts/altr adds sign-extended S[17:9] to D (which leaves D unchanged for 9-bit #S) and rep repeats an instruction block by jumping back to the start in zero cycles.

    pub method(p_in, p_out)
    
      org
    
                            setq    #(32-1)
                            rdlong  array, p_in
    
                            mov     i, #0
                            rep     #2,#32
                            altd    i, s         
                            shl     0-0, #2
    
                            setq    #(32-1)   
                            wrlong  array, p_out
    
    done                    ret                     
    
    s                       long    1 << 9 + array  'add +1 to i in altd
    array                   res     32
    i                       res     1
    
      end
    
  • JonnyMacJonnyMac Posts: 9,602
    edited 2026-01-13 02:19

    That's pretty nifty, Tony. I'll keep both in a recipe book as the first allows random access of array elements.

    I'm still learning how to decode Chip's notes in the PASM instruction list, althought Ada's docs make mention of auto-incrementing value. By using a traditional loop and debug I was able to see this working. Based on your comment, I did change the increment value. One thing... I was a little confused that # is not required with array in the line that declares s.

    pub method(p_in, p_out)
    
      org
    
                            setq      #(32-1)
                            rdlong    array, p_in
    
                            mov       i, #0
                            mov       j, #32
    
    loop                    debug(uhex_long(i))   
                            altd      i, s         
                            shl       0-0, #2
                            djnz      j, #loop
    
                            setq      #(32-1)   
                            wrlong    array, p_out
    
    done                    ret                       
    
    s                       long      1 << 9 + array                ' add +1 to i in altd
    
    array                   res       32
    i                       res       1
    j                       res       1
    
      end
    

    Again, thanks!

  • One thing that trips me up a lot is only adding 1 to long pointers when need to add 4...
    @Rayman, thank you - me too - I was wondering why the whole buffer wasn't being processed!

  • KaioKaio Posts: 287

    @JonnyMac said:
    I'm still learning how to decode Chip's notes in the PASM instruction list, althought Ada's docs make mention of auto-incrementing value.

    Jon, here's an example with the powerful ALTI instruction using auto-increment for the D field. For simplicity one needs a copy of the instruction in a register which will be modified by the ALTI instruction and then the D field is copied into the instruction in the pipeline. The original instruction (after the ALTI) is not changed in the RAM, only in the pipeline like the other ALTx instructions.

                             'mov        s, .instr           ' necessary to initialize if code is not used as inline
    
                             rep        @.end, #32
                             alti       s,#%111_000          ' increment D, let S unchanged 
    .instr                   shl        array, #2
    .end
    ...                                    
    s                        shl        array, #2            ' instruction which is modifed by ALTI
    ...
    

    With the ALTI instruction it is possible to increment or decrement D and/or S and/or R field at the same time. E.g. one could also increment the D field and decrement the S field with only one ALTI instruction.

                             mov        _alti, .instr        ' necessary to initialize, otherwise the instruction will continue on next call with last values of D and S field in _alti
                             rep        @.end, #8
                             alti       _alti,#%111_110      ' increment D and decrement S 
    .instr                   mov        array, buf+7         ' copy from buf[7] .. buf[0] to array[0] ... array[7]
    .end
    ...
    _alti                    res        1
    array                    res        8
    buf                      res        8
    

    Unfortunately the examples in the documentation are incomplete, Therefore, it is difficult to see how the ALTI instruction is working.
    There are some special modes available to control the size of the increment/decrement, e.g. useful for ring buffers. But this exceeds the 9 bit value range on the S field on ALTI. Hence a register is necessary or an AUGS will be used.

  • Here is a PASM circular buffer I recently wrote using AltS and AltD in cog ram.
    The squiggly bracket comments in the PASM line below the AltX explain clearly
    what the AltX is doing...
    The incmod is great for auto rollover of the head and tail index.

    NewHead       mov pHeadLo,sAcLoX   'putting new data in buffer at head index
                  mov pHeadHi,sAcHiX   'called when new blocktime
                  'setup new tail trigger for running average
                  altd pHeadIdx,#pTailTime
                  mov 0{pTailTime[pHeadIdx]},pSwClk 
                  altd pHeadIdx,#pTailTime
                  add 0{pTailTime[pHeadIdx]},pAccTC 
                  altd pHeadIdx,#pTailLoNext
                  mov 0{pTailLoNext[pHeadIdx]},sAcLoX 
                  altd pHeadIdx,#pTailHiNext
                  mov 0{pTailHiNext[pHeadIdx]},sAcHiX 
                  incmod pHeadIdx,#3
                  ret
    
    NewTail       alts pTailIdx,#pTailLoNext            'taking tail index data out of buffer
                  mov pTailLo,0{pTailLoNext[pTailIdx]} 'called when pTailTime == pSwClk
                  alts pTailIdx,#pTailHiNext
                  mov pTailHi,0{pTailHiNext[pTailIdx]}
                  incmod pTailIdx,#3
                  ret
    
    DAT
    pTailIdx       long 0
    pHeadIdx       long 0
    pHeadLo       long 0
    pHeadHi       long 0
    pTailLo       long 0
    pTailHi       long 0
    pTailTime     long 0,0,0,0
    pTailLoNext   long 0,0,0,0
    pTailHiNext   long 0,0,0,0
    
  • KaioKaio Posts: 287
    edited 2026-01-16 20:06

    @JonnyMac said:
    That's pretty nifty, Tony. I'll keep both in a recipe book as the first allows random access of array elements.

    One thing... I was a little confused that # is not required with array in the line that declares s.

    >
    Usually you don't need the s register as the ALTD instruction is already calculating the correct address for i, it's like array[i].
    See the example from @mwroberts above.

                            altd      i, #array         
    

    After clarification with @TonyB_ some messages below, there is second use case possible with ALTD/ALTS/ALTR instructions, where the index register (provided via D) is updated additional. For this case a register is necessary (for performance reason) for S as the value exceeds the 9 bit value range.
    The register holds the base address of the array and a value for the index update. The expression "1 << 9" is a 1 (moved above the 9 bit S value) which gets added to register i provided as D on the ALTD instruction in the code above from Jon.

                            altd      i, s
    ...                                   
    s                       long      1 << 9 + array                ' add +1 to i in altd
    
  • @Kaio said:
    Jon, here's an example with the powerful ALTI instruction using auto-increment for the D field.
    ...
    With the ALTI instruction it is possible to increment or decrement D and/or S and/or R field at the same time. E.g. one could also increment the D field and decrement the S field with only one ALTI instruction.
    Unfortunately the examples in the documentation are incomplete, Therefore, it is difficult to see how the ALTI instruction is working.
    ...

    Wow, I've learned something new, again. I've never used ALTI but only ALTS and ALTD because the original silicon doc was very vague about how ALTI actually works. The new Assembler manual is much better and at least explains the fields and options. But it would be even better if we knew not only HOW it works but also WHY. I mean, @cgracey for sure had some very specific intentions when he implemented those commands. An AI is very good at doing the "diligent work" generating and cleaning up the manuals but it can't guess someones intentions.

  • TonyB_TonyB_ Posts: 2,265

    @Kaio said:

    @JonnyMac said:
    That's pretty nifty, Tony. I'll keep both in a recipe book as the first allows random access of array elements.

    One thing... I was a little confused that # is not required with array in the line that declares s.

    >
    You don't need the s register as the ALTD instruction is already calculating the correct address for i, it's like array[i].
    See the example from @mwroberts above.

                            altd      i, #array         
    ...                                   
    ' not necessary for ALTD and ALTS ' s                       long      1 << 9 + array                ' add +1 to i in altd
    

    The s register is needed in this case to increment D and S[17:9] holds the sign-extended increment. D cannot be changed with 9-bit #S. The inferior alternative is to use ##S but that is slower and takes just as many longs. long and word and byte values are numeric so do not require a # prefix.

  • KaioKaio Posts: 287
    edited 2026-01-16 20:30

    @TonyB_ said:
    The s register is needed in this case to increment D and S[17:9] holds the sign-extended increment. D cannot be changed with 9-bit #S. The inferior alternative is to use ##S but that is slower and takes just as many longs. long and word and byte values are numeric so do not require a # prefix.

    After clarification with @TonyB_ two messages below I correct this message.
    You are thinking in the days of P1. But with ALTD you don't need to know on which bit position the D is located in the instruction. The ALTD instruction will do it for you in the right way.

    You need only provide the base address of your array and the index and ALTD will do the magic change of the D field for you in the pipeline.

    Have you seen the example from @mwroberts above? He is doing the same.

    Added on edit:
    There is second use case possible with ALTD/ALTS/ALTR instructions, where the index register (provided via D) is updated additional.

  • KaioKaio Posts: 287

    @ManAtWork said:
    Wow, I've learned something new, again. I've never used ALTI but only ALTS and ALTD because the original silicon doc was very vague about how ALTI actually works.

    You're welcome, Nicolas. It looks like that the ALTI instruction was used not that much in the past. I'm glad that I could help clarify some of the usage.

  • TonyB_TonyB_ Posts: 2,265

    @Kaio said:

    @TonyB_ said:
    The s register is needed in this case to increment D and S[17:9] holds the sign-extended increment. D cannot be changed with 9-bit #S. The inferior alternative is to use ##S but that is slower and takes just as many longs. long and word and byte values are numeric so do not require a # prefix.

    You are thinking in the days of P1. But with ALTD you don't need to know on which bit position the D is located in the instruction. The ALTD instruction will do it for you in the right way.
    You need only provide the base address of your array and the index and ALTD will do the magic change of the D field for you in the pipeline.

    ALTD/ALTS/ALTR do two separate operations. They add a signed value to the D register in the next instruction and replace the D or S field in the pipeline. This is mentioned in the document and here is the relevant section:

    REGISTER INDIRECTION

    Cog registers can be accessed indirectly most easily by using the ALTS/ALTD/ALTR instructions. These instructions sum their D[8:0] and S/#[8:0] values to compute an address that is directly substituted into the next instruction's S field, D field, or result register address (normally, this is the same as the D field). This all happens within the pipeline and does not affect the actual program code. The idea is that S/# can serve as a register base address and D can be used as an index.

    Additionally, S[17:9] is always sign-extended and added to the D register for index updating. Normally, a nine-bit #address will be used for S, causing S[17:9] to be zero, so that D is unaffected.

  • KaioKaio Posts: 287

    @TonyB_ said:
    ALTD/ALTS/ALTR do two separate operations. They add a signed value to the D register in the next instruction and replace the D or S field in the pipeline. This is mentioned in the document and here is the relevant section:

    Sometimes it needs a third and fourth look at the code to check what's going on. As it is working in that way and also stated in the documentation you are right.
    But it is difficult to see in the code how the increment on register i is working.

    I would prefer the use of the ALTI instruction for incrementing D in this case, as there is no index register necessary in the code. This is also not easy to understand as it is completely working in the P2.
    But if one needs the index register also for other cases in the code the ALTD solution would be better.

  • evanhevanh Posts: 17,043
    edited 2026-01-17 06:23

    @Kaio said:

    @ManAtWork said:
    Wow, I've learned something new, again. I've never used ALTI but only ALTS and ALTD because the original silicon doc was very vague about how ALTI actually works.

    You're welcome, Nicolas. It looks like that the ALTI instruction was used not that much in the past. I'm glad that I could help clarify some of the usage.

    I've used ALTI a number of times over the years. Came in very effective with the 512 528 byte compute of SD card CRC where, in addition to auto incrementing the D register, I was able to redirect the instruction result to eliminate a subsequent load instruction, bringing the inner loop time down to matching the 4-bit data rate of sysclock/3.

    Here's the SD read data block CRC compute function. There is source comments, for my sanity, about altireg fields and what ALTI is doing.

    crc_check
    // CRC-16 check of prior read data block
    // Z is clear upon first data block of transaction, returns quickly with Z set
            mov crc3, #0    // CRC result for DAT3 pin
            mov crc2, #0    // CRC result for DAT2 pin
            mov crc1, #0    // CRC result for DAT1 pin
            mov crc0, #0    // CRC result for DAT0 pin
            mov pb, altireg
        if_z    rep @.rend, #512/4+16/8    // one SD data block + CRC
    
        if_z    alti    pb, #0b100_111_000    // next D-field substitution, then increment PB, result goes to PA
        if_z    movbyts pa, #0b00_01_10_11    // byte swap within longword
        if_z    splitb  pa    // every 4th bit makes a byte, 8 x 4-bit parallel to 4 x 8-bit serial
        if_z    setq    pa    // 4 bytes, one per pin
        if_z    crcnib  crc3, poly    // 16-bit polynomial
        if_z    crcnib  crc3, poly
        if_z    crcnib  crc2, poly
        if_z    crcnib  crc2, poly
        if_z    crcnib  crc1, poly
        if_z    crcnib  crc1, poly
        if_z    crcnib  crc0, poly
        if_z    crcnib  crc0, poly
    .rend
    // pass/fail, pass == zero result, even parity, includes received CRC from SD card
            or  crc3, crc2
            or  crc1, crc0
        _ret_   or  crc3, crc1   wz    // Z set for pass, clear for fail
    
    poly        long 0x8408    // reversed CRC-16-CCITT (x16 + x12 + x5 + x0: 0x1021, even parity)
    altireg     long pa<<19 | cogdatbuf<<9    // register PA for ALTI result substitution, and CRC buffer address
    
    crc3        res 1
    crc2        res 1
    crc1        res 1
    crc0        res 1
    
    cogdatbuf   res 512/4    // longwords for SD data block
    cogcrcbuf   res 16/8    // longwords for CRC nibbles, must be contiguous with the SD data block
    
  • evanhevanh Posts: 17,043
    edited 2026-01-17 06:10

    You'll have to excuse the code alignment. The forum is using size 4 for hard tabs. I do think that's unsuitable as most uses of hard tabs is for assembly like this - Which is always size 8.

    Note: The movbyts pa, #0b00_01_10_11 could be written as movbyts 0-0, #0b00_01_10_11. I put PA there because that's where ALTI sends MOVBYTS's result to.

  • KaioKaio Posts: 287

    @evanh said:
    I've used ALTI a number of times over the years. Came in very effective with the 512 528 byte compute of SD card CRC where, in addition to auto incrementing the D register, I was able to redirect the instruction result to eliminate a subsequent load instruction, bringing the inner loop time down to matching the 4-bit data rate of sysclock/3.

    Thank you, @evanh , for sharing this code snippet. This is a very interesting usage of the ALTI instruction, which enables the use of 3 operands for a P2 instruction.
    This is the explained code performed by ALTI.

    movbyts pa, cogdatbuf++, #0b00_01_10_11  ' R, D, S
    
    '  this is the equivalent 
    mov     pa, cogdatbuf++
    movbyts pa, #0b00_01_10_11
    

    This is possible by using the upper tripple bits in S on the ALTI instruction.

    alti    pb, #0b100_111_000
    
    results in
    D=altireg        ' long pa<<19 | cogdatbuf<<9
    
    S=#0b100_111_000
    100 controls to use another register for the result (PA from altireg)
    111 auto-increment D field from altireg (cogdatbuf)
    000 no change on S field
    
  • evanhevanh Posts: 17,043
    edited 2026-01-17 10:49

    Your dissection looks good. Yes, ALTI provides effective 3-operand instructions. ARM CPUs got that one very right.
    I'd indicate a sudo-indirection-syntax like this: mov pa, [cogdatbuf++]

    Amusingly, register-indirect-register I doubt has ever existed in any CPU architecture. CPUs just don't have that many general registers normally. Maybe GPU has something like it.

Sign In or Register to comment.