Calculation of arcus tangens

in Propeller 2
... with an ASM/C mix. Calculation needs 153 µs @ 160MHz. Can optimized in pure ASM.
Attached is a plot, with the data copied from terminal output.
Attached is a plot, with the data copied from terminal output.
fastspin -2b p2atan.c
loadp2 -t -b230400 -p/dev/ttyUSB0 p2atan.binary
( Entering terminal mode. Press Ctrl-] to exit. )
get arcus tangens
-6.000000 -1.405642
-5.500000 -1.390937
-5.000000 -1.373395
-4.500000 -1.352122
-4.000000 -1.325813
-3.500000 -1.292492
-3.000000 -1.249041
-2.500000 -1.190285
-2.000000 -1.107144
-1.500000 -0.982790
-1.000000 -0.785395
-0.500000 -0.463646
0.000000 0.000000
0.500000 0.463646
1.000000 0.785395
1.500000 0.982790
2.000000 1.107144
2.500000 1.190285
3.000000 1.249041
3.500000 1.292492
4.000000 1.325813
4.500000 1.352122
5.000000 1.373395
5.500000 1.390937
6.000000 1.405642
Comments
An easy saving, 48 clocks, is move the second QVECTOR up two lines so that they are both in the pipeline together.
This can also be solved with the Cordic QLOG. I keep having problems with integer / floating conversion. I never know exactly when necessary, when not.
Arcus is more common in European countries.
I'm sure with fixedreal and the right handling the calc time for atan goes down to a few µs.
And what is that good for?
Imagine you want to measure the rotation angle of a motor. This creates an excitation voltage
Z = sin (phi)
induced in the rotor winding in two mutually perpendicular stator windings.
X = Z * sin (phi)
Y = Z * cos (phi).
The amplitude of the two voltages is proportional to the angle of rotation of the rotor. This method therefore provides the absolute angle of rotation of the motor. At that time, an ADSP2105 required 7µs for this calculation.
Thanks for the attention.
1µs @ 160 MHz.
__asm { mov xx,x not xx mov z2_re,##ONE mov z2_im,xx //'complex z2 = 1.0 - I * x mov z3_re,##ONE mov z3_im,x //'complex z3 = 1.0 + I * x qvector z2_re,z2_im qvector z3_re,z3_im getqy z2_arg getqy z3_arg sub z3_arg,z2_arg }; return z3_arg;
I am more hobbyist so I just diddle around with hardware, without any specific project in mind. Even moving from Germany to the US of A did not change my rate of utterly failures whenever I try to produce some sellable product.
Programming on the other hand feeds me since decades and I simply like your iterations of your code. It's almost like thinking in parallel, I thought that should work without log and there it is.
Brilliant.
Mike
__asm { mov xx,x mov z3_re,##ONE mov z3_im,x //'complex z3 = 1.0 + I * x qvector z3_re,z3_im not xx mov z2_re,##ONE mov z2_im,xx //'complex z2 = 1.0 - I * x qvector z2_re,z2_im getqy z3_arg getqy z2_arg sub z3_arg,z2_arg }; return z3_arg;
Forget what I just wrote, choosing the right vector is also possible in 200ns (!)
__asm { //mov xx,x mov z3_re,##ONE mov z3_im,x //'complex z3 = 1.0 + I * x qvector z3_re,z3_im //not xx //mov z2_re,##ONE //mov z2_im,xx //'complex z2 = 1.0 - I * x //qvector z2_re,z2_im getqy z3_arg //getqy z2_arg //sub z3_arg,z2_arg }; return z3_arg;
As another slight optimization, you could replace z3_re with an immediate value, since it's a constant, and replace z3_im with x and eliminate the MOV and unnecessary cogram variable:
qvector ##ONE,x
yes, everything you say is correct. In this special case, there are actually 2 different use cases that I have not kept apart. The first (slow, 154 µs) case is a floating point calculation. This can be used in a pocket calculator application, for example, and is more precise than the usual series development used to calculate atan (e.g. P2 simulates a Texas Instrument scientific calculator).
The second case is intended for fast closed loop control algorithms and provides a scaled integer value. And with 200ns (or less) you are quite good at it.
So both methods are justified.