Calculation of arcus tangens

... with an ASM/C mix. Calculation needs 153 µs @ 160MHz. Can optimized in pure ASM.
Attached is a plot, with the data copied from terminal output.
fastspin -2b p2atan.c
loadp2  -t -b230400  -p/dev/ttyUSB0 p2atan.binary
( Entering terminal mode.  Press Ctrl-] to exit. )
get arcus tangens
-6.000000 -1.405642
-5.500000 -1.390937
-5.000000 -1.373395
-4.500000 -1.352122
-4.000000 -1.325813
-3.500000 -1.292492
-3.000000 -1.249041
-2.500000 -1.190285
-2.000000 -1.107144
-1.500000 -0.982790
-1.000000 -0.785395
-0.500000 -0.463646
0.000000 0.000000
0.500000 0.463646
1.000000 0.785395
1.500000 0.982790
2.000000 1.107144
2.500000 1.190285
3.000000 1.249041
3.500000 1.292492
4.000000 1.325813
4.500000 1.352122
5.000000 1.373395
5.500000 1.390937
6.000000 1.405642

Comments

  • Never knew the arc' was short for arcus.

    An easy saving, 48 clocks, is move the second QVECTOR up two lines so that they are both in the pipeline together.

  • Thanks for the tip. I think most of the time is spent logarithmizing with floats.
    This can also be solved with the Cordic QLOG. I keep having problems with integer / floating conversion. I never know exactly when necessary, when not.
    Arcus is more common in European countries.
  • ReinhardReinhard Posts: 384
    edited 2020-02-05 - 14:49:12
    The log. -calculation is now also in ASM(cordic). The result is still correct, but the time for a calculation is still 138µs. I want to use the --fixedreal (16:16)statement. But I don't know how to put the data into the Cordic(0:31). All my attempts bring nonsensical results.
    I'm sure with fixedreal and the right handling the calc time for atan goes down to a few µs.
  • Now requires 1.15µs for a calculation.
    And what is that good for?
    Imagine you want to measure the rotation angle of a motor. This creates an excitation voltage
    Z = sin (phi)
    induced in the rotor winding in two mutually perpendicular stator windings.
    X = Z * sin (phi)
    Y = Z * cos (phi).
    The amplitude of the two voltages is proportional to the angle of rotation of the rotor. This method therefore provides the absolute angle of rotation of the motor. At that time, an ADSP2105 required 7µs for this calculation.
    Thanks for the attention.
  • ReinhardReinhard Posts: 384
    edited 2020-02-07 - 23:47:48
    Logarithmic is not necessary at all ;-)
    1µs @ 160 MHz.
    	__asm {
    	mov	xx,x
    	not	xx
    	
    	mov	z2_re,##ONE
    	mov	z2_im,xx		//'complex z2 = 1.0 - I * x
    	mov	z3_re,##ONE
    	mov	z3_im,x			//'complex z3 = 1.0 + I * x
    	
    	qvector	z2_re,z2_im
    	qvector	z3_re,z3_im
    	
    	getqy	z2_arg
    	getqy	z3_arg	
    
    	sub	z3_arg,z2_arg
    
        };
    	return z3_arg;
    
  • Hi @Reinhard,

    I am more hobbyist so I just diddle around with hardware, without any specific project in mind. Even moving from Germany to the US of A did not change my rate of utterly failures whenever I try to produce some sellable product.

    Programming on the other hand feeds me since decades and I simply like your iterations of your code. It's almost like thinking in parallel, I thought that should work without log and there it is.

    Brilliant.

    Mike
  • Potential to save 6 cycles just by reordering the code
    	__asm {
    	mov	xx,x
    
    	mov	z3_re,##ONE
    	mov	z3_im,x			//'complex z3 = 1.0 + I * x
    
    	qvector	z3_re,z3_im
    
    	not	xx	
    	mov	z2_re,##ONE
    	mov	z2_im,xx		//'complex z2 = 1.0 - I * x
    	
    	qvector	z2_re,z2_im
    	getqy	z3_arg	
    	getqy	z2_arg
    
    	sub	z3_arg,z2_arg
    
        };
    	return z3_arg;
    
  • How is this computation different from the arc tangent performed by just a single qvector command?
  • ReinhardReinhard Posts: 384
    edited 2020-02-08 - 09:32:14
    That's a good question. The reason is, with only one QVECTOR, I would have to make case distinctions. This does not apply to conjugated QVECTORs.
  • ReinhardReinhard Posts: 384
    edited 2020-02-08 - 09:44:25
    @Electrodude
    Forget what I just wrote, choosing the right vector is also possible in 200ns (!)
    	__asm {
    	//mov	xx,x
    
    	mov	z3_re,##ONE
    	mov	z3_im,x			//'complex z3 = 1.0 + I * x
    
    	qvector	z3_re,z3_im
    
    	//not	xx	
    	//mov	z2_re,##ONE
    	//mov	z2_im,xx		//'complex z2 = 1.0 - I * x
    	
    	//qvector	z2_re,z2_im
    	getqy	z3_arg	
    	//getqy	z2_arg
    
    	//sub	z3_arg,z2_arg
    
        };
    	return z3_arg;
    
  • What a disgrace. As with blinkers, I tried to do what Wolfram Alpha suggested. Without paying attention to the strength of the Cordic. I've only been in the boat since December last year and would like to try everything possible at the same time. As I said, a disgrace, but I'm old enough to take it ;-)
  • It's that we learn from experience, and from bad experience only. Good experiences we just copy. We had to learn fireworks, paper production, the compass to find direction from other cultures which now copy what we do, do improvements by making bad experiences and we will follow up later, in another time. But as mentioned: we learn from bad experience, like children have to touch the cooking plate, but what we should have learned: culture and cult have nothing in common, they are exactly opposite. But the path is like exploring the earth: if only you go to one direction, you ultimately will come home again. So if you go the cult direction, you will eventually come to culture, but you have to pass hell. What is an experience we should not necessarily experience. The power given to Pxx must be used carefully and thoughtfully. So again we saw: think before you act!
  • ErNaErNa Posts: 1,343
    edited 2020-02-08 - 12:59:29
    By the way: aren't blinkers blinders? Or was Wolfram Alpha working like blinkers indeed, and you were the fish caught? Checked: no, indeed Scheuklappen are blinkers, words often have different meaning and are taken falsly.
  • Wolfram Alpha's proposal is 100% correct. I took a very close look at that. And the theory is correctly implemented with my first attempt. What I overlooked is that the Cordic is easier to calculate. That is the learning success.
  • You appear to be referencing something. Got a link?
  • Huh, cool, formatted like a dictionary. Can't say I'm learning anything from it though. :( :)

  • ElectrodudeElectrodude Posts: 1,323
    edited 2020-02-08 - 23:06:06
    @Reinhard One reason I've noticed for why it's often hard to use the P2 correctly is that everyone takes for granted ugly hacks that you have to use on other processors that, on the P2, simply aren't necessary and actually make things more complicated. Another good example of this is the FFT - people tell you that you need sine and cosine tables and four multiplications per operation, but the whole thing can be done on the P2 with just a single QROTATE.

    As another slight optimization, you could replace z3_re with an immediate value, since it's a constant, and replace z3_im with x and eliminate the MOV and unnecessary cogram variable:
    qvector	##ONE,x
    
  • @Electrodude
    yes, everything you say is correct. In this special case, there are actually 2 different use cases that I have not kept apart. The first (slow, 154 µs) case is a floating point calculation. This can be used in a pocket calculator application, for example, and is more precise than the usual series development used to calculate atan (e.g. P2 simulates a Texas Instrument scientific calculator).
    The second case is intended for fast closed loop control algorithms and provides a scaled integer value. And with 200ns (or less) you are quite good at it.
    So both methods are justified.
Sign In or Register to comment.