@evanh said:
Tony used a technical term there - "Direct Addressing" is the way an instruction is encoded for fetching operand data. Any of the 512 cogRAM addresses can be directly specified in both the encoded S and D fields of an instruction. But, for lutRAM and hubRAM, because of the extra indexing options in the RDLUT/WRLUT instructions, only the first 256 addresses can be dirCorrect, but note that ectly specified this way and only in the S field.
I'm pretty sure that this does not affect my code if I place the instructions in LUT ram and the data in COG ram. Jumps and branches, REP etc. are PC relative so they still work in LUT ram.
Executing instructions in LUT RAM that access data in cog RAM is fine. Note that some branch instructions, indicated by Call/Jump to S** in the spreadsheet, have signed-relative addressing in the #S form, which means you cannot branch from everywhere in cog RAM to anywhere in LUT RAM or vice-versa, unless you use S. Usually, though, you can move code by hand so that the branch address is within the PC +/- 256 range for #S.
@ManAtWork said:
I'm pretty sure that this does not affect my code if I place the instructions in LUT ram and the data in COG ram. Jumps and branches, REP etc. are PC relative so they still work in LUT ram.
Correct. Addressing modes are about data accesses, not code fetching.
@pik33 said:
Instead of rdword/add inx/rdword to get 2 samples to interpolate you can use rdlong and getword. rdlong does not have to be long aligned in P2. That saves one hub read, and the cost is one getword and one clock penalty for unaligned long read.
mul/add/shr maybe can be sped up using scas: scas returns an already shifted result in Q
Good point. Reading the two table entries at once also has the advantage that I can swap the two MULs which eliminates one of the SUBRs. The first two MULs can't be substituted by SCA because of the odd shifting of #9 but the third can.
mov inx,theta
testb inx,#31 wz ' bit 31 -> Z
shl inx,#2 wc ' bit 30 -> C
if_c not inx ' mirror 2nd and 4th quadrant
getbyte ipol,inx,#2 ' fraction -> interpolate
shr inx,#24 ' MSB -> table index
shl inx,#1
add inx,adrTable
rdlong sin1,inx ' fetch two consecutive table entries
getword sin2,sin1,#1 ' 2nd table entry
mul sin2,ipol
subr ipol,#$100
mul sin1,ipol
add sin1,sin2
shr sin1,#9 ' 8 bits + /2 for average of sin1+sin2
sca sin1,ampl
negz sin1 ' flip 3rd and 4th quadrant
@evanh said:
Tony used a technical term there - "Direct Addressing" is the way an instruction is encoded for fetching operand data. Any of the 512 cogRAM addresses can be directly specified in both the encoded S and D fields of an instruction. But, for lutRAM and hubRAM, because of the extra indexing options in the RDLUT/WRLUT instructions, only the first 256 addresses can be directly specified this way and only in the S field.
I'm pretty sure that this does not affect my code if I place the instructions in LUT ram and the data in COG ram. Jumps and branches, REP etc. are PC relative so they still work in LUT ram.
I recall in some cases you might have to use absolute addresses when calling LUTRAM from LUTRAM. I had something like that in my memory driver which had to call code in LUT RAM and I think I had to put in the #\ to make it work. Not sure if this was an assembler limitation at the time or something else.
Code in LUT RAM:
r_resume_burst getnib b, request, #0 ' a b c d e get bank parameter LUT address
rdlut b, b ' a b c d e get bank limit/mask
bmask mask, b ' | | | d e build mask for addr
<snip>
w_burst mov orighubsize, count ' a b c | e save original hub size
w_locked_fill cmp count, #1 wz ' a b c d | g optimization for single transfers
shl count, a ' a b c | | | scale into bytes
tjz count, #nowrite_lut ' a b c d e | check for any bytes to write
w_resume_burst mov c, count ' a b c d e f g h get the number of bytes to write
call #\r_resume_burst ' a b c d e f g h get per bank limit and read delay info <----- NOTE: #\ syntax
@rogloh said:
I recall in some cases you might have to use absolute addresses when calling LUTRAM from LUTRAM. I had something like that in my memory driver which had to call code in LUT RAM and I think I had to put in the #\ to make it work. Not sure if this was an assembler limitation at the time or something else.
Code in LUT RAM:
r_resume_burst getnib b, request, #0 ' a b c d e get bank parameter LUT address
rdlut b, b ' a b c d e get bank limit/mask
bmask mask, b ' | | | d e build mask for addr
<snip>
w_burst mov orighubsize, count ' a b c | e save original hub size
w_locked_fill cmp count, #1 wz ' a b c d | g optimization for single transfers
shl count, a ' a b c | | | scale into bytes
tjz count, #nowrite_lut ' a b c d e | check for any bytes to write
w_resume_burst mov c, count ' a b c d e f g h get the number of bytes to write
call #\r_resume_burst ' a b c d e f g h get per bank limit and read delay info <----- NOTE: #\ syntax
This is skipping related and applies to cog RAM exec and LUT RAM exec. From the doc:
Special SKIPF Branching Rules
Within SKIPF sequences where CALL/CALLPA/CALLPB are used to execute subroutines in which skipping will be suspended until after RET, all CALL/CALLPA/CALLPB immediate (#) branch addresses must be absolute in cases where the instruction after the CALL/CALLPA/CALLPB might be skipped. This is not possible for CALLPA/CALLPB but CALL can use '#\address' syntax to achieve absolute immediate addressing. CALL/CALLPA/CALLPB can all use registers as branch addresses, since they are absolute.
IMHO, SKIPF is evil. I've used it only once. In some cases it can really save a lot of instructions but it is so error prone especially when jumps, calls or branching is involved, it has to be used with great care.
Yes, SKIPF can be helpful for interpreter and emulator kind of applications where the data flow is always similar but there are many different cases of what to do with the data. But for normal linear and procedural programming I just don't need it very often.
Unfortunatelly, for my VFD application, things are much more complicated than I first thought. The signals are noisy so I have to apply filtering. This is no problem because the main control loop runs at least 20 times slower than the interrupts processing the ADC samples. But I have to apply the filtering after the Park transformation because applying filtering to a rotating vector causes phase lag and jitter if the timing is not synchronized. After the park transformation the phase relation is quasi-static or at least changes with a much slower rate.
So I have to do all the vector math in the interrupt service routine and I definitely need the CORDIC. The main control loop then becomes rather trivial. It should be just a few lines of code which can be called from the motion control loop which also runs once per millisecond and hopefully has some CPU time left over. So I can use a dedicated cog running PASM for the VFD without the need for interrupts combined with compiled Spin code.
Comments
Executing instructions in LUT RAM that access data in cog RAM is fine. Note that some branch instructions, indicated by
Call/Jump to S**
in the spreadsheet, have signed-relative addressing in the#S
form, which means you cannot branch from everywhere in cog RAM to anywhere in LUT RAM or vice-versa, unless you useS
. Usually, though, you can move code by hand so that the branch address is within the PC +/- 256 range for#S
.Correct. Addressing modes are about data accesses, not code fetching.
Good point. Reading the two table entries at once also has the advantage that I can swap the two MULs which eliminates one of the SUBRs. The first two MULs can't be substituted by SCA because of the odd shifting of #9 but the third can.
I recall in some cases you might have to use absolute addresses when calling LUTRAM from LUTRAM. I had something like that in my memory driver which had to call code in LUT RAM and I think I had to put in the #\ to make it work. Not sure if this was an assembler limitation at the time or something else.
Code in LUT RAM:
This is skipping related and applies to cog RAM exec and LUT RAM exec. From the doc:
IMHO, SKIPF is evil. I've used it only once. In some cases it can really save a lot of instructions but it is so error prone especially when jumps, calls or branching is involved, it has to be used with great care.
I think SKIPF is the best feature of P2. When used more than trivially, it really helps if skip patterns can be created automatically. For complicated code that's essential. https://forums.parallax.com/discussion/171125/skip-patterns-generated-automatically
Example:
Yes, SKIPF can be helpful for interpreter and emulator kind of applications where the data flow is always similar but there are many different cases of what to do with the data. But for normal linear and procedural programming I just don't need it very often.
Unfortunatelly, for my VFD application, things are much more complicated than I first thought. The signals are noisy so I have to apply filtering. This is no problem because the main control loop runs at least 20 times slower than the interrupts processing the ADC samples. But I have to apply the filtering after the Park transformation because applying filtering to a rotating vector causes phase lag and jitter if the timing is not synchronized. After the park transformation the phase relation is quasi-static or at least changes with a much slower rate.
So I have to do all the vector math in the interrupt service routine and I definitely need the CORDIC. The main control loop then becomes rather trivial. It should be just a few lines of code which can be called from the motion control loop which also runs once per millisecond and hopefully has some CPU time left over. So I can use a dedicated cog running PASM for the VFD without the need for interrupts combined with compiled Spin code.