Fast Trigonometry with Spin

Duane Degn · 2015-03-10 11:05

Over in the thread about speeding up Spin, it was mentioned scaled integer math is much faster than floating point math.

If I have basic add, subtract, multiply and divide types of equations, I always use scaled integers myself. When I need to perform trigonometric equations I usually turn to F32 for floating point math.

The HMC5883L compass module links to a program which calculates headings using integer trig functions.

The object's name is "SL32_INTEngine_2."

Here's part of the heading of the object:

// SL32 Integer Engine
//
// Author: Kwabena W. Agyeman
// Updated: 8/28/2010
// Designed For: P8X32A
// Version: 1.0

I decided to compare the integer trig functions with similar calculations using F32.

My test program generated 80 test points. These test points were laid out in a 20 by 20 square. The corners of the square were -10, 10; 10, 10; 10, -10; -10, -10.

I had the program calculate the distance to these points from the origin. The integer math section was:

repeat localIndex from 0 to MAX_POINT_INDEX
    intDistance[localIndex] := Math.polarRadius(y[localIndex], x[localIndex])

The floating point section was:

repeat localIndex from 0 to MAX_POINT_INDEX
    floatDistance[localIndex] := ComputeDistance2D(fY[localIndex], fX[localIndex])

F32 doesn't include a distance method. Here's the method I wrote to compute the distance.

PUB ComputeDistance2D(fLocalX, fLocalY) 

  result := F32.FSqr(F32.FAdd(F32.FMul(fLocalX, fLocalX), F32.FMul(fLocalY, fLocalY)))

Integer math won this round. It took the integer math equation about 6ms to compute the 80 distances while the floating point math calculations took about 14ms. Technically, this isn't really a trig problem. I wasn't really surprised integer math won this round.

The first of three trig problems was to compute the angle these points made with the x-axis with respect to the origin.

The integer math method was "polarAngle."

repeat localIndex from 0 to MAX_POINT_INDEX
    intAngle[localIndex] := Math.polarAngle(y[localIndex], x[localIndex])

The equivelent F32 method is "ATan2."

repeat localIndex from 0 to MAX_POINT_INDEX
    floatAngle[localIndex] := F32.ATan2(fY[localIndex], fX[localIndex])

The ATan2 method is a very useful method with program a hexapod. The IK calculations use it and it's also used to calculate the direction of travel using the x and y feedback from the joystick.

The integer math method required 59.7ms while F32 was able to compute the 80 angles in 5.4ms.

Not only was F32 able to calculate the angles in less than a tenth of the time as the integer math equations, it also provide a much more precise answer.

While scaling the integers allowed the distance to the points to be accurately calculated, I'm not aware of a way of scaling the integers such as to produce a scaled angle. I think the "polarAngle" method would require extensive rewriting in order to have it return a scaled angle.

Not only were the angles returned from "polarAngle" rounded to the nearest full integer angle, they weren't always rounded correctly (30.964 was rounded to 30).

This lack a precision with angles is something I don't know how to get around when using integer math.

The remaining comparisons are between the calculations used to recompute the x and y coordinates based on the distance and angles previously calculated.

The integer trig object has a nice method for calculating coordinates based on the angle and distance. Here the scaled distance can enhance the accuracy of having to use integer angles.

repeat localIndex from 0 to MAX_POINT_INDEX
    xNew[localIndex] := Math.cos(intAngle[localIndex], intDistance[localIndex])

The same calculation requires to F32 calls. One to compute the cosine of the angle and another to multiply the result by the distance.

repeat localIndex from 0 to MAX_POINT_INDEX
    fXNew[localIndex] := F32.FMul(floatDistance[localIndex], F32.cos(floatAngle[localIndex]))

The same type of calculations were made to compute the y coordinates, this time using the sine.

Once again, F32 was faster. The integer math calculation used to find the x coordinates took 15.7ms while F32 took 7.9ms. The "find y" times were 14.4ms and 7.9ms with F32 again computing the coordinates the fastest.

In general I think it's safe to say scaled integer math calculations are faster than floating point calculations but based on what I've seen, if there's a lot of trig in the calculations then floating point math appears to be the best option.

I've attached my test program.

Here's the very long output from the program. The output did not occur at the time of the calculations. The numbers were saved to arrays to be displayed at the end of the program.

****** Scroll down to see times. ******

 exact | int  | int | recalculated  |float | float  | recalculated
(x, y) | dist |angle| integer x, y  | dist | angle  | float x, y

-10, 10|14.142|  135| -9.999,  9.999|14.142| 135.000|-10.000, 10.000
 -9, 10|13.453|  132| -8.996, 10.002|13.454| 131.987| -9.000, 10.000
 -8, 10|12.806|  129| -8.055,  9.955|12.806| 128.660| -8.000, 10.000
 -7, 10|12.206|  125| -6.997, 10.000|12.207| 124.992| -7.000, 10.000
 -6, 10|11.661|  121| -6.002,  9.997|11.662| 120.964| -6.000, 10.000
 -5, 10|11.180|  117| -5.072,  9.962|11.180| 116.565| -5.000, 10.000
 -4, 10|10.770|  112| -4.029,  9.987|10.770| 111.801| -4.000, 10.000
 -3, 10|10.440|  107| -3.045,  9.985|10.440| 106.699| -3.000, 10.000
 -2, 10|10.198|  102| -2.119,  9.975|10.198| 101.310| -2.000, 10.000
 -1, 10|10.049|   96| -1.046,  9.994|10.050|  95.711| -1.000, 10.000
  0, 10|10.000|   90|  0.000, 10.000|10.000|  90.000|  0.000, 10.000
  1, 10|10.049|   84|  1.054,  9.993|10.050|  84.289|  1.000, 10.000
  2, 10|10.198|   78|  2.127,  9.973|10.198|  78.690|  2.000, 10.000
  3, 10|10.440|   73|  3.053,  9.983|10.440|  73.301|  3.000, 10.000
  4, 10|10.770|   68|  4.037,  9.984|10.770|  68.199|  4.000, 10.000
  5, 10|11.180|   63|  5.080,  9.959|11.180|  63.435|  5.000, 10.000
  6, 10|11.661|   59|  6.010,  9.992|11.662|  59.036|  6.000, 10.000
  7, 10|12.206|   55|  7.005,  9.995|12.207|  55.008|  7.000, 10.000
  8, 10|12.806|   51|  8.063,  9.948|12.806|  51.340|  8.000, 10.000
  9, 10|13.453|   48|  9.003,  9.995|13.454|  48.013|  9.000, 10.000
 10, 10|14.142|   45|  9.999,  9.999|14.142|  45.000| 10.000, 10.000
 10,  9|13.453|   42| 10.002,  8.996|13.454|  41.987| 10.000,  9.000
 10,  8|12.806|   38| 10.095,  7.878|12.806|  38.660| 10.000,  8.000
 10,  7|12.206|   35| 10.000,  6.997|12.207|  34.992| 10.000,  7.000
 10,  6|11.661|   30| 10.101,  5.825|11.662|  30.964| 10.000,  6.000
 10,  5|11.180|   26| 10.050,  4.896|11.180|  26.565| 10.000,  5.000
 10,  4|10.770|   21| 10.057,  3.852|10.770|  21.801| 10.000,  4.000
 10,  3|10.440|   16| 10.035,  2.877|10.440|  16.699| 10.000,  3.000
 10,  2|10.198|   11| 10.011,  1.943|10.198|  11.310| 10.000,  2.000
 10,  1|10.049|    5| 10.011,  0.869|10.050|   5.711| 10.000,  1.000
 10,  0|10.000|    0| 10.000,  0.000|10.000|   0.000| 10.000,  0.000
 10, -1|10.049|   -5| 10.010, -0.869|10.050|  -5.711| 10.000, -1.000
 10, -2|10.198|  -11| 10.009, -1.943|10.198| -11.310| 10.000, -2.000
 10, -3|10.440|  -16| 10.033, -2.877|10.440| -16.699| 10.000, -3.000
 10, -4|10.770|  -21| 10.054, -3.852|10.770| -21.801| 10.000, -4.000
 10, -5|11.180|  -26| 10.047, -4.896|11.180| -26.565| 10.000, -5.000
 10, -6|11.661|  -30| 10.097, -5.825|11.662| -30.964| 10.000, -6.000
 10, -7|12.206|  -35|  9.995, -6.997|12.207| -34.992| 10.000, -7.000
 10, -8|12.806|  -38| 10.089, -7.878|12.806| -38.660| 10.000, -8.000
 10, -9|13.453|  -42|  9.995, -8.996|13.454| -41.987| 10.000, -9.000
 10,-10|14.142|  -45|  9.999, -9.999|14.142| -45.000| 10.000,-10.000
  9,-10|13.453|  -48|  8.996, -9.995|13.454| -48.013|  9.000,-10.000
  8,-10|12.806|  -51|  8.055, -9.948|12.806| -51.340|  8.000,-10.000
  7,-10|12.206|  -55|  6.997, -9.995|12.207| -55.008|  7.000,-10.000
  6,-10|11.661|  -59|  6.002, -9.992|11.662| -59.036|  6.000,-10.000
  5,-10|11.180|  -63|  5.072, -9.959|11.180| -63.435|  5.000,-10.000
  4,-10|10.770|  -68|  4.029, -9.984|10.770| -68.199|  4.000,-10.000
  3,-10|10.440|  -73|  3.045, -9.983|10.440| -73.301|  3.000,-10.000
  2,-10|10.198|  -78|  2.119, -9.973|10.198| -78.690|  2.000,-10.000
  1,-10|10.049|  -84|  1.046, -9.993|10.050| -84.289|  1.000,-10.000
  0,-10|10.000|  -90|  0.000,-10.000|10.000| -90.000|  0.000,-10.000
 -1,-10|10.049|  -96| -1.046, -9.994|10.050| -95.711| -1.000,-10.000
 -2,-10|10.198| -102| -2.119, -9.975|10.198|-101.310| -2.000,-10.000
 -3,-10|10.440| -107| -3.045, -9.985|10.440|-106.699| -3.000,-10.000
 -4,-10|10.770| -112| -4.029, -9.987|10.770|-111.801| -4.000,-10.000
 -5,-10|11.180| -117| -5.072, -9.962|11.180|-116.565| -5.000,-10.000
 -6,-10|11.661| -121| -6.002, -9.997|11.662|-120.964| -6.000,-10.000
 -7,-10|12.206| -125| -6.997,-10.000|12.207|-124.992| -7.000,-10.000
 -8,-10|12.806| -129| -8.055, -9.955|12.806|-128.660| -8.000,-10.000
 -9,-10|13.453| -132| -8.996,-10.002|13.454|-131.987| -9.000,-10.000
-10,-10|14.142| -135| -9.999, -9.999|14.142|-135.000|-10.000,-10.000
-10, -9|13.453| -138| -9.995, -9.003|13.454|-138.013|-10.000, -9.000
-10, -8|12.806| -142|-10.089, -7.886|12.806|-141.340|-10.000, -8.000
-10, -7|12.206| -145| -9.995, -7.005|12.207|-145.008|-10.000, -7.000
-10, -6|11.661| -150|-10.097, -5.833|11.662|-149.036|-10.000, -6.000
-10, -5|11.180| -154|-10.047, -4.903|11.180|-153.435|-10.000, -5.000
-10, -4|10.770| -159|-10.054, -3.860|10.770|-158.199|-10.000, -4.000
-10, -3|10.440| -164|-10.033, -2.884|10.440|-163.301|-10.000, -3.000
-10, -2|10.198| -169|-10.009, -1.951|10.198|-168.690|-10.000, -2.000
-10, -1|10.049| -175|-10.010, -0.877|10.050|-174.289|-10.000, -1.000
-10,  0|10.000|  180|-10.000,  0.000|10.000| 180.000|-10.000,  0.000
-10,  1|10.049|  175|-10.010,  0.877|10.050| 174.289|-10.000,  1.000
-10,  2|10.198|  169|-10.009,  1.951|10.198| 168.690|-10.000,  2.000
-10,  3|10.440|  164|-10.033,  2.884|10.440| 163.301|-10.000,  3.000
-10,  4|10.770|  159|-10.054,  3.860|10.770| 158.199|-10.000,  4.000
-10,  5|11.180|  154|-10.047,  4.903|11.180| 153.435|-10.000,  5.000
-10,  6|11.661|  150|-10.097,  5.833|11.662| 149.036|-10.000,  6.000
-10,  7|12.206|  145| -9.995,  7.005|12.207| 145.008|-10.000,  7.000
-10,  8|12.806|  142|-10.089,  7.886|12.806| 141.340|-10.000,  8.000
-10,  9|13.453|  138| -9.995,  9.003|13.454| 138.013|-10.000,  9.000
Integer Math: atan2 time =  59.691 ms, distance time =   6.061 ms, find x time =  15.662 ms, find y time =  14.366 ms
  Float Math: atan2 time =   5.362 ms, distance time =  13.932 ms, find x time =   7.868 ms, find y time =   7.851 ms

If there are faster ways of performing trig calculations with integers, I'd like to know about it. I'm particularly interested in a fast and accurate ATan2 method.

The reason I'm so interested in fast trig calculations is so I can increase the capability of my hexapod. (If you haven't done so, make sure and watch the video embedded to post #73.)

attachment.php?attachmentid=113445&d=1426005926

attachment.php?attachmentid=113445&d=1426005926

(Don't worry, it won't always have puppy dog eyes.)

So far the hexapod moves with its body level to the ground. A level body doesn't require as many calculations to figure out where the feet should be as the calculations required by a body tilted with respect to the ground. I'd like to add the complexity of allowing the robot to walk while tilted but I'm concerned about the time all these calculations will take. My current plan is to use multiple instances of F32 with multiple cogs performing the needed calculations.

idbruce · 2015-03-10 14:27

Duane

Very interesting post. Thanks for sharing.

Bruce

Phil Pilgrim (PhiPi) · 2015-03-10 15:16

Duane,

You might give the following atan2 method a try. It's a modification of Chip's cordic algo:

PUB atan2(y, x) : a | negate, i, da, dx, dy

  'CORDIC atan2 algorithm
  '
  '  - angle range: -1800 to 1799 tenths of a degree.
  '  - hypotenuse of x,y must be within ±1_300_000_000 to avoid overflow

  i := 29 - >|(||x #> ||y)    'Make the arguments as large as possible.
  if (i > 0)
    x <<= i
    y <<= i
     
  a~    
  negate := x < 0             'check quadrant 2 | 3                        
   
  if negate                   'if quadrant 2 | 3, (may be outside ±106° convergence range)                                                                 
    a += $80000000            '..add 180 degrees to angle
    -x                        '..negate x
    -y                        '..negate y

  repeat i from 0 to 26       'do CORDIC iterations (27 is optimal for 32-bit values)
    da := corlut[i]
    dx := y ~> i
    dy := x ~> i

    negate := y < 0           'if atan2 mode, drive y towards 0

    if negate
      -da
      -dx
      -dy

    a += da
    x += dx
    y -= dy

  a := ((a ~> 16) * 3600) ~> 16  'Convert to tenths of a degree.

DAT

corlut        long $20000000    'CORDIC angle lookup table
              long $12E4051E
              long $09FB385B
              long $051111D4
              long $028B0D43
              long $0145D7E1
              long $00A2F61E
              long $00517C55
              long $0028BE53
              long $00145F2F
              long $000A2F98
              long $000517CC
              long $00028BE6
              long $000145F3
              long $0000A2FA
              long $0000517D
              long $000028BE
              long $0000145F
              long $00000A30
              long $00000518
              long $0000028C
              long $00000146
              long $000000A3
              long $00000051
              long $00000029
              long $00000014
              long $0000000A

-Phil

Duane Degn · 2015-03-10 15:30

Phil Pilgrim (PhiPi) wrote: »

You might give the following atan2 method a try. It's a modification of Chip's cordic algo:

Very good. It's scaled to tenths of a degree. It should be useful if fast enough. I'll report back how it compares.

@Bruce. You're welcome.

Duane Degn · 2015-03-10 15:41

Phil Pilgrim (PhiPi) wrote: »

It's a modification of Chip's cordic algo:

The extra resolution came at a price. Chip's algorithm was slower than the one Kye used in his integer math object.

Integer Math: atan2 time = 177.335 ms
  Float Math: atan2 time =   5.362 ms

The other integer math atan2 method took 59.691 ms to complete the 80 calculations.

Edit: I've attached the new test code to this post.

Phil Pilgrim (PhiPi) · 2015-03-10 16:15

Duane,

That doesn't jibe with my experience. When I run the following method, I get a printout of 2210, or 2.2 ms per execution of atan2:

PUB start | i, t0, t1

  cps.start(0, 1, 0)
  sio.startrxtx(5, 4, 0, 115200)
  t0 := cnt
  i := 1234567
  repeat 1000
    cps.atan2(?i, ?i)
  t1 := cnt
  sio.dec((t1 - t0) / 80_000)

Edit: Okay, upon rereading your post, I'm a bit confused. Are the times for one execution or 80?

-Phil

Duane Degn · 2015-03-10 17:19

Phil Pilgrim (PhiPi) wrote: »

Are the times for one execution or 80?

The times listed are for the full 80 points (calculations).

It does make a bit more sense when the times are displayed under the list of the 80 data points but I should have clarified it better when I used the times without the list of data.

So yes, the execution time for a single atan2 calculation was 2.2ms. A single calculation of F32's atan2 took 0.067ms.

Tracy Allen · 2015-03-10 18:16

Hi Duane,

That is a comparison of sweet peppers to jalape

Duane Degn · 2015-03-10 19:25

Tracy Allen wrote: »

That is a comparison of sweet peppers to jalape

Tracy Allen · 2015-03-11 11:36

Duane,
F32 does seem like the best bet unless you absolutely need the cog for something else.

I'm kind of surprised that someone hasn't made a CORDIC fixed point navigation engine in pasm, but then again, I don't really know if it would be much smoother or more efficient. Chip had at one point in time promised CORDIC hardware to be build into the P2, a pet interest of his. I've always been curious how that was or is going to be implemented, but I wait and see.

max72 · 2015-03-13 00:58

I modified Bean's asm trig functions (cannot find the original).
Knowing my coding skills it is broken for sure, but I attach it.

There are some examples of assembly integer trig functions, but they are probably hard to find..

Massimo

Lawson · 2015-03-14 11:00

Have you tried the 2-cog Float32_full? It includes an equation scripting method that should really speed up repetitive calculations. Also there's a good chance that F32 is faster than the spin code calling it. Bypassing the spin wrapper and directly driving the assembly code with your spin code should get you some more speed. (i.e. spin function calls aren't free so inline it all)

Also look at the "FME" library linked in my signature. It's a spin only floating point library with a full set of trig and exponential functions. It's way slower than F32, but should be fine for less time critical tasks and it frees up a cog.

Both F32 and FME, spend significant amounts of time packing and unpacking floating point numbers. I think it'd be faster and more precise to have them work from a stack of un-packed floating point numbers instead. (i.e. like a RPN calculator) My guess is that a stack/RPN based FME would probably get 2x faster, while F32 would get ~30% faster.

Marty

Duane Degn · 2015-03-14 11:44

Thanks for the assembly trig functions Massimo. I'll take a look at them.

Lawson wrote: »

Have you tried the 2-cog Float32_full? It includes an equation scripting method that should really speed up repetitive calculations. Also there's a good chance that F32 is faster than the spin code calling it. Bypassing the spin wrapper and directly driving the assembly code with your spin code should get you some more speed.

This has crossed my mind a few times. I remember Jonathan mentioning equation scripting would need to be left out in order to fit F32 into a single cog. I don't use F32's log functions and there are probably other functions I could remove from F32 to make room for my own scripts. Much of F32 is beyond my present understanding my I think I understand it enough to add some sort of scripting feature.

I'm trying to keep all the calculations within the 20ms (50Hz) servo refresh interval. The present IK calculations are fast enough complete at 50Hz but I'm concerned adding the additional calculations required to allow the hexapod to walk while in a tilted attitude will exceed this 20ms interval. I think creating some sort of script in PASM would likely allow the additional calculations to be completed in time. Thanks for the suggestion. I'll probably do this.

Fast Trigonometry with Spin

Comments