32-bit CORDIC algorithm in Spin

cgracey · 2011-06-15 05:11

Here is a CORDIC algorithm written in Spin which can generate very precise sines and cosines, rotate x,y points around 0,0, and convert x,y to polar form (angle, length).

This code uses the Propeller Demo Board and a TV for viewing text. The main point of this, though is just the 'pri cordic' method, the 27-long lookup table, and the three a,x,y long variables. You can pull those out and implement them into your own code

Kwabena (Kye) is using this algorithm to process GPS coordinates (converted into 32-bit-angles) to determine distances between points on the earth with a numerical accuracy of a few inches.

Chip

max72 · 2011-06-15 05:44

Thanks, I modified cordic code available around and created a spin object in order to do bearing/distance/XTE and so on using planar cartesian projection, which is ok for points not too far away (up to hundreds of miles). I tried to post the object to the forum but with no interest.
This routine is very elegant, and I'm sure Kye will produce great results.
Thanks again,
Massimo

Cluso99 · 2011-06-15 13:08

Nice, but shame the GPS is not that accurate

max72 · 2011-06-15 13:50

Cluso99 wrote: »

Nice, but shame the GPS is not that accurate

True, but navigation algorithms are very complex, and if a great circle routine is so good you only error is the one from the GPS. I think this is a great achievement.
Massimo

idbruce · 2011-06-15 15:28

I wonder if this could be used in any way to set ramp up and ramp down speed for steppers?

Bruce

Rayman · 2011-06-15 15:31

I'd be interested to see how this compares to interpolation from the Prop1 ROM tables...

If it could actually be more precise, I'd be impressed...

cgracey · 2011-06-15 18:12

Rayman wrote: »

I'd be interested to see how this compares to interpolation from the Prop1 ROM tables...

If it could actually be more precise, I'd be impressed...

The Propeller's sine rom has 2^13 samples, full circle, that are 16 bits in range.

This CORDIC algorithm has about 2^32 samples, full circle, that are about 30 bits in range.

So, it's got 2^19 better phase resolution and 2^14 better range resolution.

If you think about the earth being 24,900 miles in circumference, 2^13 discrete angles resolve to 3.04 miles each, while 2^32 discrete angles resolve to 0.367 inches each.

Considering that the earth's radius is about 3,960 miles, a 16-bit range yields 319 feet per count, while a 30-bit range yields 0.448 inches per count.

Lawson · 2011-06-15 21:24

How hard would it be to modify this to calculate LOG and Anti-LOG? I'm building a new controller for a seed laser at work. This is mostly a temperature control problem. I've got two thermistors in the laser and one of the better options for calculating the temperature is a 3rd order LOG fit equation. (my other option is to just make a calibration table) My circuit *should* measure the resistance of the thermistors to better than 18-bits of precision, so I don't want to throw away resolution in the calibration.

Lawson

Cluso99 · 2011-06-15 23:00

Wow, that is some accuracy. The mathematics is more accurate than the earth is round

Toby Seckshund · 2011-06-15 23:09

"Wow, that is some accuracy. The mathematics is more accurate than the earth is round"

I still recon that the earth is flat :-)

Ale · 2011-06-16 01:12

Toby S. wrote:

I still recon that the earth is flat :-)

You belong to the Society I presume ? ;-)

max72 · 2011-06-16 01:21

Cluso99 wrote: »

Wow, that is some accuracy. The mathematics is more accurate than the earth is round

Probably Kye is considering also the ellipsoid... ;-)

Cluso99 · 2011-06-16 01:23

Apparently even the water level between the oceans is not equal due to the placement of the continents. With the right formulae though, we should be able to calculate it with cordic.

Now, how high was that bridge? and the mast above the water? Oops!

max72 · 2011-06-16 03:21

The models to take into account the difference between ellipsoid and geoid elevation would require quite a lot of memory, the EGM96 matrix is huge. I don't know if there is an acceptable simplified model...
Massimo

Rayman · 2011-06-16 03:23

Thanks for the reply Chip. That does sound a lot better...
I'd like to see a graph or table that shows the improvement over using the table.

Bean · 2011-06-16 05:48

I hope Chip doesn't mind, but I took the liberty of making some modifications to the program.

The main modification is to scale up small input values to get full (or almost full) resolution.

I changed it to use PST (I don't have TV at work)

I expanded the corlut table to 1 (this may not make any difference, I don't know)

Comment welcome

Bean

' Original program by Chip Gracey
'
' Modification by Terry Hitt (Bean)
'  Changed from TV to PST (I don't have a TV available)
'  Scale up inputs for better resolution with small input values
'  Added 3 values to corlut to finish table to zero 

con

  _clkmode = xtal1 + pll16x
  _xinfreq = 5_000_000


obj

 PST : "Parallax Serial Terminal"


var

  long a, x, y


pub go

  PST.start(115200)       ' Setup PST for 115200 baud (Bean)
  WaitCnt(cnt+ClkFreq*5)  ' Give user 5 seconds to start PST (Bean) 
 
  a := 0
  x := -4
  y := -3

  show              'show original x,y and angle to rotate by
  cordic(1)         'rotate x,y
  show              'show
  cordic(0)         'convert to polar
  show              'show, should be very close to original
  
  repeat 99         'do a bunch more to see cumulative error
    cordic(1)
    cordic(0)
  show


pri show

  ' Changed from TV to PST (Bean)
  PST.str(string("A "))
  PST.dec(a)
  PST.char(13)
  PST.str(string("  X "))
  PST.dec(x)
  PST.char(13)
  PST.str(string("  Y "))
  PST.dec(y)
  PST.char(13)
  PST.char(13)
  

pri cordic(atan2) | negate, i, da, dx, dy, shift

' CORDIC algorithm
'
' if atan2 = 0: x,y are rotated by angle in a
' if atan2 = 1: x,y are converted from cartesian to polar with angle->a, length->x
'
'   - angle range: $0000_0000-$FFFF_FFFF = 0-359.9999999 degrees
'   - hypotenuse of x,y must be within ±1_300_000_000 to avoid overflow
'   - algorithm works best if x,y are kept large:
'       example: x=$40000000, y=0, a=angle, cordic(0) performs cos,sin into x,y

  ' Scale up values for better resolution (Bean)
  shift := 0
  repeat while (||x < 400_000_000) and (||y < 400_000_000)
    x *= 2
    y *= 2
    shift += 1

  if atan2                      'if atan2 mode, reset a
    a~    
    negate := x < 0             'check quadrant 2 | 3 for either atan2 or rotate mode                        
  else
    negate := (a ^ a << 1) & $80000000
  
  if negate                     'if quadrant 2 | 3, (may be outside ±106° convergence range)                                                                 
    a += $80000000              '..add 180 degrees to angle
    -x                          '..negate x
    -y                          '..negate y

  repeat i from 0 to 29         'do CORDIC iterations (added 3 values to original values)
    da := corlut[i]
    dx := y ~> i
    dy := x ~> i

    if atan2
      negate := y < 0           'if atan2 mode, drive y towards 0
    else
      negate := a => 0          'if rotate mode, drive a towards 0

    if negate
      -da
      -dx
      -dy

    a += da
    x += dx
    y -= dy

  'remove CORDIC gain by multiplying by ~0.60725293500888        
  i := $4DBA76D4  
  x := (x ** i) << 1 + (x * i) >> 31                           
  y := (y ** i) << 1 + (x * i) >> 31

  ' Undo scaling of original data (Bean)
  x := (x + 1 << (shift-1)) ~> shift
  y := (y + 1 << (shift-1)) ~> shift
  

dat

corlut  long $20000000        'CORDIC angle lookup table
        long $12E4051E
        long $09FB385B
        long $051111D4
        long $028B0D43
        long $0145D7E1
        long $00A2F61E
        long $00517C55  
        long $0028BE53
        long $00145F2F
        long $000A2F98
        long $000517CC
        long $00028BE6
        long $000145F3
        long $0000A2FA
        long $0000517D   
        long $000028BE
        long $0000145F
        long $00000A30
        long $00000518
        long $0000028C
        long $00000146
        long $000000A3
        long $00000051   
        long $00000029
        long $00000014
        long $0000000A
        long $00000005 ' 3 extra values added (Bean)
        long $00000003
        long $00000001

cgracey · 2011-06-16 06:29

Lawson wrote: »

How hard would it be to modify this to calculate LOG and Anti-LOG?Lawson

You would have to replace the lookup table, start the iterations at i=1, instead of at i=0, then repeat every third iteration without changing i. The next Propeller has both circular and hyperbolic CORDIC in hardware.

Today my computer is dead with a virus, but I hope to have it reborn by noon. I will do it then and post it.

Chip

cgracey · 2011-06-16 06:38

Bean wrote: »

...The main modification is to scale up small input values to get full (or almost full) resolution.

...I expanded the corlut table to 1 (this may not make any difference, I don't know)[/code]

Good idea. I found through experimentation that if you provide as many sub-bits as you have whole bits, you will be lossless. For example, if you have 32-bit A,X,Y's, give them each 32-sub bits and not only is there no need to normalize them upwards, but they will be as accurate as possible. You cannot do any better than that.

I had also read, and verified through experimentation, that for 32-bit CORDIC variables, 27 iterations is optimal. Beyond that, you get too much truncation in the X and Y deltas and it amounts to a few LSBs' worth of noise in the results. It just fidgits around.

Graham Stabler · 2011-06-16 07:55

I have somewhere a CORDIC "engine" that I wrote based on this: http://www.voidware.com/cordic.htm

It does circular and hyperbolic functions etc, it stalled because of some small detail or other, I wonder if together we can whip it in to shape. It was in assembly too so very fast and might make a nice object, if I remember correctly it might have been 16 bit but could be made 32bit or switch-able.

Graham

Bean · 2011-06-16 08:04

cgracey wrote: »

Good idea. I found through experimentation that if you provide as many sub-bits as you have whole bits, you will be lossless. For example, if you have 32-bit A,X,Y's, give them each 32-sub bits and not only is there no need to normalize them upwards, but they will be as accurate as possible. You cannot do any better than that.

Chip, thanks for the comments.
By "sub-bits" do you mean like a 32.32 fixed point number ? That might make more sense as it would allow fractional values to be used.

Bean

Kye · 2011-06-16 09:12

Here's my UNTESTED!!! code.

max72 · 2011-06-16 09:13

Graham Stabler wrote: »

I have somewhere a CORDIC "engine" that I wrote based on this: http://www.voidware.com/cordic.htm

It does circular and hyperbolic functions etc, it stalled because of some small detail or other, I wonder if together we can whip it in to shape. It was in assembly too so very fast and might make a nice object, if I remember correctly it might have been 16 bit but could be made 32bit or switch-able.

Graham

There is a PASM routine from Beau doing atan2... it could be a nice starting point.

max72 · 2011-06-16 09:27

Kye wrote: »

Here's my UNTESTED!!! code.

Thanks Kye!
With the distance already done the course is just a little step away...
Did you test it for close points (say less than a mile)?
Massimo

Tracy Allen · 2011-06-16 11:08

Chip, elegante!

Thanks too, Bean, for the scaling algorithm and the PST output.

It might be possible to condense this

shift := 0
  repeat while (||x < 400_000_000) and (||y < 400_000_000)
    x *= 2
    y *= 2
    shift += 1

to something like,

' enter with ||x =< $2000_0000 and ||y =< $2000_0000
      ' (overly strict to keep hypotenuse<1300000000)
  shift := 30 - ( >| (||x <# ||y) )   ' encode the biggest of x and y into 32 bits
  x << shift      ' left justify the largest into the field msb at $2000_0000
  y << shift     '  both shifted equally, angle unchanged

Sometimes one might not want to restore the shift directly at the end. For example, if x=1 and y=1 at entry, you want an angle of 45 degrees and a hypotenuse of 1.41421, and the fractional part is coded in the extra bits and would be lost by scaling back.

Graham Stabler · 2011-06-16 11:26

max72 wrote: »

There is a PASM routine from Beau doing atan2... it could be a nice starting point.

My code is in PASM. I will upload it later.

Graham

Rayman · 2011-06-16 16:41

I was reading about CORDIC on wikipedia today... From there, it seems to make a lot of sense for Prop 1 because there is no multiply instruction...
But, Prop 2 will have a multiply instruction, so it seems you could do it faster with an infinite series approximation there...
Or, what am I missing?

Mark_T · 2011-06-16 16:43

Its a great algorithm, and I've been playing with a PASM implementation.

I limited myself to calculating cis(theta) - ie sines and cosines rather than atan2 or arbitrary rotation, which means the multiply factor isn't needed (you just start with a scaled down value for x).

Anyway the great thing is that the SUMNC instruction comes into its own for the conditional negation and I've managed to get the loop down to 13 instructions

:loop         movs      :loadins, corindex      ' overwrite source register with table entry regnum
              add       corindex, #1            ' can't use loadins straight away, remember
:loadins      mov       da, corlut              ' this modified instruction indexes corlut
              mov       dx, yy
              sar       dx, shift               ' dx = yy >> ii
              neg       dy, xx
              sar       dy, shift               ' dy = -xx >> ii
              add       shift, #1
              cmps      aa, #0  wc              ' drive angle to zero
              sumnc     aa, da                  ' sumnc instruction very handy for this.
              sumnc     xx, dx
              sumnc     yy, dy
              djnz      nn, #:loop              ' 52 cycles per loop, 27 loops = 1404 cycles

Phil Pilgrim (PhiPi) · 2011-06-16 17:06

Rayman,

Series approximations don't always converge quickly. For some functions, Tchebychev polynomials provide a reasonable compromise, but for the amount of work the Prop II would have to get the same accuracy, I think my money would still be on the CORDIC method.

You do raise an interesting point, though. For example, the transcendental function library for the IBM 1130 (a 16-bitter w/ multiply) used Tchebychev polynomials. But I don't know if CORDIC had been thought of yet (1968).

-Phil

Rayman · 2011-06-16 18:09

Well, here's the series approximation I used when I programmed a scientific calculator for my wristwatch:

http://www.ganssle.com/approx/approx.pdf

It can do a arbitrarily good job with just a few multiplications...

Tracy Allen · 2011-06-16 21:29

Rayman, those are floating point approximations, and would be very efficient when the FPU is available in hardware. CORDIC is ideal for integer math, because it uses shifts but nary a multiplication (well, with the exception of the final adjustment by the CORDIC gain, which can be done efficiently.) Furthermore, the CORDIC engine can can handle several functions using the same machinery and data table, simply by the way the crank is turned. For example, it can do both sin and cos and atan and hyp, using the same table. With a change of table, the same CORDIC engine can handle log and exp as well as multiply and divide.

The guru for polynomial approximations is Jack Crenshaw, whose Math Toolkit for Real-Time Programming is referenced in the nice Genssle pdf. Crenshaw goes into depth on computing one function after another, for the most part assuming there is an FPU available. One has to be struck by the many different tricks and mathematics that are involved with the unique personality of each function and the effort to find a polynomial or an algorithm that will converge quickly. The results are very good and can certainly whip CORDIC, again provided there is an FPU available, and provided there is enough memory to hold all the unique algorithms.

Rayman · 2011-06-17 02:17

The series approximation works with fixed point too. It doesn't require floating point. (I say that, but I haven't done it).
And actually, multiplication with floating point is fast even without an FPU.
But, it is very slow without a hardware multiply.

I guess that's my question... Wikipedia (not that it's always right) says that CORDIC is used primarily when you don't have a hardware multiply. But, the Prop 2 will have a hardware multiply. So, why use CORDIC on Prop2?

I think I remember something about the Prop2 having special CORDIC circuitry though... Maybe that's the reason...

But, it does look like CORDIC is the best option for Prop1 to do better than the ROM table.
(Although, I do wonder if an external table, like on SD card or other flash device, would be faster...)

32-bit CORDIC algorithm in Spin

Comments