Array of Longs in PASM2
JonnyMac
Posts: 9,602
While making a suggestion in another post about putting an array into inline PASM (which works), I started some personal experiments and ran into a snag. In short, I'd like to access an array in PASM2 with a variable index, but code like I used to use in the P1 isn't working. Please -- gently -- show me where I've gone wrong. Thanks.
pub method(p_in, p_out)
org
setq #(32-1)
rdlong array, p_in
mov i, #0
loop mov pntr, #array
add pntr, i
setd update, pntr
nop
update shr 0-0, #1 ' NOT WORKING
incmod i, #31 wc
if_c jmp #loop
setq #(32-1)
wrlong array, p_out
done ret
array res 32
pntr res 1
i res 1
end

Comments
Add another
noporaltdis an alternative that doesn't requirenopThat did it. Thank you, Tony.
setdis an instruction I've never used butaltdandaltsI use a lot.RES in inline assembly? Surprised that works...
One thing that trips me up a lot is only adding 1 to long pointers when need to add 4...
I was surprised, too. Remember, we have about $120 (288) instructions available for inline assembly, so we do need to be mindful of that, and that variables defined with RES are going to end up with whatever is in the cog RAM at those locations.
Yeah, that sometimes gets me, too.
I was so excited about fixing with another nop that I jumped right to that. Then I popped back and saw the altd -- much nicer and saves a bit of code. This is the adjustment.
pub method(p_in, p_out) org setq #(32-1) rdlong array, p_in mov i, #0 loop altd i, #array update shl 0-0, #2 incmod i, #31 wc if_nc jmp #loop setq #(32-1) wrlong array, p_out done ret array res 32 i res 1 endThanks, again!
Jon, I think you are interested in PASM2 tips and the (untested) code below is optimal for speed. Two improvements:
altd/alts/altradds sign-extended S[17:9] to D (which leaves D unchanged for 9-bit #S) andreprepeats an instruction block by jumping back to the start in zero cycles.pub method(p_in, p_out) org setq #(32-1) rdlong array, p_in mov i, #0 rep #2,#32 altd i, s shl 0-0, #2 setq #(32-1) wrlong array, p_out done ret s long 1 << 9 + array 'add +1 to i in altd array res 32 i res 1 endThat's pretty nifty, Tony. I'll keep both in a recipe book as the first allows random access of array elements.
I'm still learning how to decode Chip's notes in the PASM instruction list, althought Ada's docs make mention of auto-incrementing value. By using a traditional loop and debug I was able to see this working. Based on your comment, I did change the increment value. One thing... I was a little confused that # is not required with array in the line that declares s.
pub method(p_in, p_out) org setq #(32-1) rdlong array, p_in mov i, #0 mov j, #32 loop debug(uhex_long(i)) altd i, s shl 0-0, #2 djnz j, #loop setq #(32-1) wrlong array, p_out done ret s long 1 << 9 + array ' add +1 to i in altd array res 32 i res 1 j res 1 endAgain, thanks!
One thing that trips me up a lot is only adding 1 to long pointers when need to add 4...
@Rayman, thank you - me too - I was wondering why the whole buffer wasn't being processed!
Jon, here's an example with the powerful ALTI instruction using auto-increment for the D field. For simplicity one needs a copy of the instruction in a register which will be modified by the ALTI instruction and then the D field is copied into the instruction in the pipeline. The original instruction (after the ALTI) is not changed in the RAM, only in the pipeline like the other ALTx instructions.
'mov s, .instr ' necessary to initialize if code is not used as inline rep @.end, #32 alti s,#%111_000 ' increment D, let S unchanged .instr shl array, #2 .end ... s shl array, #2 ' instruction which is modifed by ALTI ...With the ALTI instruction it is possible to increment or decrement D and/or S and/or R field at the same time. E.g. one could also increment the D field and decrement the S field with only one ALTI instruction.
mov _alti, .instr ' necessary to initialize, otherwise the instruction will continue on next call with last values of D and S field in _alti rep @.end, #8 alti _alti,#%111_110 ' increment D and decrement S .instr mov array, buf+7 ' copy from buf[7] .. buf[0] to array[0] ... array[7] .end ... _alti res 1 array res 8 buf res 8Unfortunately the examples in the documentation are incomplete, Therefore, it is difficult to see how the ALTI instruction is working.
There are some special modes available to control the size of the increment/decrement, e.g. useful for ring buffers. But this exceeds the 9 bit value range on the S field on ALTI. Hence a register is necessary or an AUGS will be used.
Here is a PASM circular buffer I recently wrote using AltS and AltD in cog ram.
The squiggly bracket comments in the PASM line below the AltX explain clearly
what the AltX is doing...
The incmod is great for auto rollover of the head and tail index.
NewHead mov pHeadLo,sAcLoX 'putting new data in buffer at head index mov pHeadHi,sAcHiX 'called when new blocktime 'setup new tail trigger for running average altd pHeadIdx,#pTailTime mov 0{pTailTime[pHeadIdx]},pSwClk altd pHeadIdx,#pTailTime add 0{pTailTime[pHeadIdx]},pAccTC altd pHeadIdx,#pTailLoNext mov 0{pTailLoNext[pHeadIdx]},sAcLoX altd pHeadIdx,#pTailHiNext mov 0{pTailHiNext[pHeadIdx]},sAcHiX incmod pHeadIdx,#3 ret NewTail alts pTailIdx,#pTailLoNext 'taking tail index data out of buffer mov pTailLo,0{pTailLoNext[pTailIdx]} 'called when pTailTime == pSwClk alts pTailIdx,#pTailHiNext mov pTailHi,0{pTailHiNext[pTailIdx]} incmod pTailIdx,#3 ret DAT pTailIdx long 0 pHeadIdx long 0 pHeadLo long 0 pHeadHi long 0 pTailLo long 0 pTailHi long 0 pTailTime long 0,0,0,0 pTailLoNext long 0,0,0,0 pTailHiNext long 0,0,0,0>
Usually you don't need the s register as the ALTD instruction is already calculating the correct address for i, it's like array[i].
See the example from @mwroberts above.
After clarification with @TonyB_ some messages below, there is second use case possible with ALTD/ALTS/ALTR instructions, where the index register (provided via D) is updated additional. For this case a register is necessary (for performance reason) for S as the value exceeds the 9 bit value range.
The register holds the base address of the array and a value for the index update. The expression "1 << 9" is a 1 (moved above the 9 bit S value) which gets added to register
iprovided as D on the ALTD instruction in the code above from Jon.Wow, I've learned something new, again. I've never used ALTI but only ALTS and ALTD because the original silicon doc was very vague about how ALTI actually works. The new Assembler manual is much better and at least explains the fields and options. But it would be even better if we knew not only HOW it works but also WHY. I mean, @cgracey for sure had some very specific intentions when he implemented those commands. An AI is very good at doing the "diligent work" generating and cleaning up the manuals but it can't guess someones intentions.
The s register is needed in this case to increment D and S[17:9] holds the sign-extended increment. D cannot be changed with 9-bit #S. The inferior alternative is to use ##S but that is slower and takes just as many longs.
longandwordandbytevalues are numeric so do not require a # prefix.After clarification with @TonyB_ two messages below I correct this message.
You are thinking in the days of P1. But with ALTD you don't need to know on which bit position the D is located in the instruction. The ALTD instruction will do it for you in the right way.
You need only provide the base address of your array and the index and ALTD will do the magic change of the D field for you in the pipeline.
Have you seen the example from @mwroberts above? He is doing the same.
Added on edit:
There is second use case possible with ALTD/ALTS/ALTR instructions, where the index register (provided via D) is updated additional.
You're welcome, Nicolas. It looks like that the ALTI instruction was used not that much in the past. I'm glad that I could help clarify some of the usage.
ALTD/ALTS/ALTR do two separate operations. They add a signed value to the D register in the next instruction and replace the D or S field in the pipeline. This is mentioned in the document and here is the relevant section:
Sometimes it needs a third and fourth look at the code to check what's going on. As it is working in that way and also stated in the documentation you are right.
But it is difficult to see in the code how the increment on register
iis working.I would prefer the use of the ALTI instruction for incrementing D in this case, as there is no index register necessary in the code. This is also not easy to understand as it is completely working in the P2.
But if one needs the index register also for other cases in the code the ALTD solution would be better.
I've used ALTI a number of times over the years. Came in very effective with the 512 528 byte compute of SD card CRC where, in addition to auto incrementing the D register, I was able to redirect the instruction result to eliminate a subsequent load instruction, bringing the inner loop time down to matching the 4-bit data rate of sysclock/3.
Here's the SD read data block CRC compute function. There is source comments, for my sanity, about
altiregfields and what ALTI is doing.crc_check // CRC-16 check of prior read data block // Z is clear upon first data block of transaction, returns quickly with Z set mov crc3, #0 // CRC result for DAT3 pin mov crc2, #0 // CRC result for DAT2 pin mov crc1, #0 // CRC result for DAT1 pin mov crc0, #0 // CRC result for DAT0 pin mov pb, altireg if_z rep @.rend, #512/4+16/8 // one SD data block + CRC if_z alti pb, #0b100_111_000 // next D-field substitution, then increment PB, result goes to PA if_z movbyts pa, #0b00_01_10_11 // byte swap within longword if_z splitb pa // every 4th bit makes a byte, 8 x 4-bit parallel to 4 x 8-bit serial if_z setq pa // 4 bytes, one per pin if_z crcnib crc3, poly // 16-bit polynomial if_z crcnib crc3, poly if_z crcnib crc2, poly if_z crcnib crc2, poly if_z crcnib crc1, poly if_z crcnib crc1, poly if_z crcnib crc0, poly if_z crcnib crc0, poly .rend // pass/fail, pass == zero result, even parity, includes received CRC from SD card or crc3, crc2 or crc1, crc0 _ret_ or crc3, crc1 wz // Z set for pass, clear for fail poly long 0x8408 // reversed CRC-16-CCITT (x16 + x12 + x5 + x0: 0x1021, even parity) altireg long pa<<19 | cogdatbuf<<9 // register PA for ALTI result substitution, and CRC buffer address crc3 res 1 crc2 res 1 crc1 res 1 crc0 res 1 cogdatbuf res 512/4 // longwords for SD data block cogcrcbuf res 16/8 // longwords for CRC nibbles, must be contiguous with the SD data blockYou'll have to excuse the code alignment. The forum is using size 4 for hard tabs. I do think that's unsuitable as most uses of hard tabs is for assembly like this - Which is always size 8.
Note: The
movbyts pa, #0b00_01_10_11could be written asmovbyts 0-0, #0b00_01_10_11. I put PA there because that's where ALTI sends MOVBYTS's result to.Thank you, @evanh , for sharing this code snippet. This is a very interesting usage of the ALTI instruction, which enables the use of 3 operands for a P2 instruction.
This is the explained code performed by ALTI.
This is possible by using the upper tripple bits in S on the ALTI instruction.
Your dissection looks good. Yes, ALTI provides effective 3-operand instructions. ARM CPUs got that one very right.
I'd indicate a sudo-indirection-syntax like this:
mov pa, [cogdatbuf++]Amusingly, register-indirect-register I doubt has ever existed in any CPU architecture. CPUs just don't have that many general registers normally. Maybe GPU has something like it.