Shop OBEX P1 Docs P2 Docs Learn Events
32-bit CORDIC algorithm in Spin — Parallax Forums

32-bit CORDIC algorithm in Spin

cgraceycgracey Posts: 14,256
edited 2011-06-21 07:38 in Propeller 1
Here is a CORDIC algorithm written in Spin which can generate very precise sines and cosines, rotate x,y points around 0,0, and convert x,y to polar form (angle, length).

This code uses the Propeller Demo Board and a TV for viewing text. The main point of this, though is just the 'pri cordic' method, the 27-long lookup table, and the three a,x,y long variables. You can pull those out and implement them into your own code

Kwabena (Kye) is using this algorithm to process GPS coordinates (converted into 32-bit-angles) to determine distances between points on the earth with a numerical accuracy of a few inches.

Chip
«1

Comments

  • max72max72 Posts: 1,155
    edited 2011-06-15 05:44
    Thanks, I modified cordic code available around and created a spin object in order to do bearing/distance/XTE and so on using planar cartesian projection, which is ok for points not too far away (up to hundreds of miles). I tried to post the object to the forum but with no interest.
    This routine is very elegant, and I'm sure Kye will produce great results.
    Thanks again,
    Massimo
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-06-15 13:08
    Nice, but shame the GPS is not that accurate :(
  • max72max72 Posts: 1,155
    edited 2011-06-15 13:50
    Cluso99 wrote: »
    Nice, but shame the GPS is not that accurate :(

    True, but navigation algorithms are very complex, and if a great circle routine is so good you only error is the one from the GPS. I think this is a great achievement.
    Massimo
  • idbruceidbruce Posts: 6,197
    edited 2011-06-15 15:28
    I wonder if this could be used in any way to set ramp up and ramp down speed for steppers?

    Bruce
  • RaymanRayman Posts: 14,876
    edited 2011-06-15 15:31
    I'd be interested to see how this compares to interpolation from the Prop1 ROM tables...

    If it could actually be more precise, I'd be impressed...
  • cgraceycgracey Posts: 14,256
    edited 2011-06-15 18:12
    Rayman wrote: »
    I'd be interested to see how this compares to interpolation from the Prop1 ROM tables...

    If it could actually be more precise, I'd be impressed...

    The Propeller's sine rom has 2^13 samples, full circle, that are 16 bits in range.

    This CORDIC algorithm has about 2^32 samples, full circle, that are about 30 bits in range.

    So, it's got 2^19 better phase resolution and 2^14 better range resolution.

    If you think about the earth being 24,900 miles in circumference, 2^13 discrete angles resolve to 3.04 miles each, while 2^32 discrete angles resolve to 0.367 inches each.

    Considering that the earth's radius is about 3,960 miles, a 16-bit range yields 319 feet per count, while a 30-bit range yields 0.448 inches per count.
  • LawsonLawson Posts: 870
    edited 2011-06-15 21:24
    How hard would it be to modify this to calculate LOG and Anti-LOG? I'm building a new controller for a seed laser at work. This is mostly a temperature control problem. I've got two thermistors in the laser and one of the better options for calculating the temperature is a 3rd order LOG fit equation. (my other option is to just make a calibration table) My circuit *should* measure the resistance of the thermistors to better than 18-bits of precision, so I don't want to throw away resolution in the calibration.

    Lawson
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-06-15 23:00
    Wow, that is some accuracy. The mathematics is more accurate than the earth is round :)
  • Toby SeckshundToby Seckshund Posts: 2,027
    edited 2011-06-15 23:09
    "Wow, that is some accuracy. The mathematics is more accurate than the earth is round"


    I still recon that the earth is flat :-)
  • AleAle Posts: 2,363
    edited 2011-06-16 01:12
    Toby S. wrote:
    I still recon that the earth is flat :-)
    You belong to the Society I presume ? ;-)
  • max72max72 Posts: 1,155
    edited 2011-06-16 01:21
    Cluso99 wrote: »
    Wow, that is some accuracy. The mathematics is more accurate than the earth is round :)

    Probably Kye is considering also the ellipsoid... ;-)
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-06-16 01:23
    Apparently even the water level between the oceans is not equal due to the placement of the continents. With the right formulae though, we should be able to calculate it with cordic.

    Now, how high was that bridge? and the mast above the water? Oops!
  • max72max72 Posts: 1,155
    edited 2011-06-16 03:21
    The models to take into account the difference between ellipsoid and geoid elevation would require quite a lot of memory, the EGM96 matrix is huge. I don't know if there is an acceptable simplified model...
    Massimo
  • RaymanRayman Posts: 14,876
    edited 2011-06-16 03:23
    Thanks for the reply Chip. That does sound a lot better...
    I'd like to see a graph or table that shows the improvement over using the table.
  • BeanBean Posts: 8,129
    edited 2011-06-16 05:48
    I hope Chip doesn't mind, but I took the liberty of making some modifications to the program.

    The main modification is to scale up small input values to get full (or almost full) resolution.

    I changed it to use PST (I don't have TV at work)

    I expanded the corlut table to 1 (this may not make any difference, I don't know)

    Comment welcome

    Bean
    ' Original program by Chip Gracey
    '
    ' Modification by Terry Hitt (Bean)
    '  Changed from TV to PST (I don't have a TV available)
    '  Scale up inputs for better resolution with small input values
    '  Added 3 values to corlut to finish table to zero 
    
    con
    
      _clkmode = xtal1 + pll16x
      _xinfreq = 5_000_000
    
    
    obj
    
     PST : "Parallax Serial Terminal"
    
    
    var
    
      long a, x, y
    
    
    pub go
    
      PST.start(115200)       ' Setup PST for 115200 baud (Bean)
      WaitCnt(cnt+ClkFreq*5)  ' Give user 5 seconds to start PST (Bean) 
     
      a := 0
      x := -4
      y := -3
    
      show              'show original x,y and angle to rotate by
      cordic(1)         'rotate x,y
      show              'show
      cordic(0)         'convert to polar
      show              'show, should be very close to original
      
      repeat 99         'do a bunch more to see cumulative error
        cordic(1)
        cordic(0)
      show
    
    
    pri show
    
      ' Changed from TV to PST (Bean)
      PST.str(string("A "))
      PST.dec(a)
      PST.char(13)
      PST.str(string("  X "))
      PST.dec(x)
      PST.char(13)
      PST.str(string("  Y "))
      PST.dec(y)
      PST.char(13)
      PST.char(13)
      
    
    pri cordic(atan2) | negate, i, da, dx, dy, shift
    
    ' CORDIC algorithm
    '
    ' if atan2 = 0: x,y are rotated by angle in a
    ' if atan2 = 1: x,y are converted from cartesian to polar with angle->a, length->x
    '
    '   - angle range: $0000_0000-$FFFF_FFFF = 0-359.9999999 degrees
    '   - hypotenuse of x,y must be within ±1_300_000_000 to avoid overflow
    '   - algorithm works best if x,y are kept large:
    '       example: x=$40000000, y=0, a=angle, cordic(0) performs cos,sin into x,y
    
      ' Scale up values for better resolution (Bean)
      shift := 0
      repeat while (||x < 400_000_000) and (||y < 400_000_000)
        x *= 2
        y *= 2
        shift += 1
    
      if atan2                      'if atan2 mode, reset a
        a~    
        negate := x < 0             'check quadrant 2 | 3 for either atan2 or rotate mode                        
      else
        negate := (a ^ a << 1) & $80000000
      
      if negate                     'if quadrant 2 | 3, (may be outside ±106° convergence range)                                                                 
        a += $80000000              '..add 180 degrees to angle
        -x                          '..negate x
        -y                          '..negate y
    
      repeat i from 0 to 29         'do CORDIC iterations (added 3 values to original values)
        da := corlut[i]
        dx := y ~> i
        dy := x ~> i
    
        if atan2
          negate := y < 0           'if atan2 mode, drive y towards 0
        else
          negate := a => 0          'if rotate mode, drive a towards 0
    
        if negate
          -da
          -dx
          -dy
    
        a += da
        x += dx
        y -= dy
    
      'remove CORDIC gain by multiplying by ~0.60725293500888        
      i := $4DBA76D4  
      x := (x ** i) << 1 + (x * i) >> 31                           
      y := (y ** i) << 1 + (x * i) >> 31
    
      ' Undo scaling of original data (Bean)
      x := (x + 1 << (shift-1)) ~> shift
      y := (y + 1 << (shift-1)) ~> shift
      
    
    dat
    
    corlut  long $20000000        'CORDIC angle lookup table
            long $12E4051E
            long $09FB385B
            long $051111D4
            long $028B0D43
            long $0145D7E1
            long $00A2F61E
            long $00517C55  
            long $0028BE53
            long $00145F2F
            long $000A2F98
            long $000517CC
            long $00028BE6
            long $000145F3
            long $0000A2FA
            long $0000517D   
            long $000028BE
            long $0000145F
            long $00000A30
            long $00000518
            long $0000028C
            long $00000146
            long $000000A3
            long $00000051   
            long $00000029
            long $00000014
            long $0000000A
            long $00000005 ' 3 extra values added (Bean)
            long $00000003
            long $00000001
    
  • cgraceycgracey Posts: 14,256
    edited 2011-06-16 06:29
    Lawson wrote: »
    How hard would it be to modify this to calculate LOG and Anti-LOG?Lawson

    You would have to replace the lookup table, start the iterations at i=1, instead of at i=0, then repeat every third iteration without changing i. The next Propeller has both circular and hyperbolic CORDIC in hardware.

    Today my computer is dead with a virus, but I hope to have it reborn by noon. I will do it then and post it.

    Chip
  • cgraceycgracey Posts: 14,256
    edited 2011-06-16 06:38
    Bean wrote: »
    ...The main modification is to scale up small input values to get full (or almost full) resolution.

    ...I expanded the corlut table to 1 (this may not make any difference, I don't know)[/code]

    Good idea. I found through experimentation that if you provide as many sub-bits as you have whole bits, you will be lossless. For example, if you have 32-bit A,X,Y's, give them each 32-sub bits and not only is there no need to normalize them upwards, but they will be as accurate as possible. You cannot do any better than that.

    I had also read, and verified through experimentation, that for 32-bit CORDIC variables, 27 iterations is optimal. Beyond that, you get too much truncation in the X and Y deltas and it amounts to a few LSBs' worth of noise in the results. It just fidgits around.
  • Graham StablerGraham Stabler Posts: 2,510
    edited 2011-06-16 07:55
    I have somewhere a CORDIC "engine" that I wrote based on this: http://www.voidware.com/cordic.htm

    It does circular and hyperbolic functions etc, it stalled because of some small detail or other, I wonder if together we can whip it in to shape. It was in assembly too so very fast and might make a nice object, if I remember correctly it might have been 16 bit but could be made 32bit or switch-able.

    Graham
  • BeanBean Posts: 8,129
    edited 2011-06-16 08:04
    cgracey wrote: »
    Good idea. I found through experimentation that if you provide as many sub-bits as you have whole bits, you will be lossless. For example, if you have 32-bit A,X,Y's, give them each 32-sub bits and not only is there no need to normalize them upwards, but they will be as accurate as possible. You cannot do any better than that.

    Chip, thanks for the comments.
    By "sub-bits" do you mean like a 32.32 fixed point number ? That might make more sense as it would allow fractional values to be used.

    Bean
  • KyeKye Posts: 2,200
    edited 2011-06-16 09:12
    Here's my UNTESTED!!! code. =)
  • max72max72 Posts: 1,155
    edited 2011-06-16 09:13
    I have somewhere a CORDIC "engine" that I wrote based on this: http://www.voidware.com/cordic.htm

    It does circular and hyperbolic functions etc, it stalled because of some small detail or other, I wonder if together we can whip it in to shape. It was in assembly too so very fast and might make a nice object, if I remember correctly it might have been 16 bit but could be made 32bit or switch-able.

    Graham

    There is a PASM routine from Beau doing atan2... it could be a nice starting point.
  • max72max72 Posts: 1,155
    edited 2011-06-16 09:27
    Kye wrote: »
    Here's my UNTESTED!!! code. =)

    Thanks Kye!
    With the distance already done the course is just a little step away...
    Did you test it for close points (say less than a mile)?
    Massimo
  • Tracy AllenTracy Allen Posts: 6,666
    edited 2011-06-16 11:08
    Chip, elegante!

    Thanks too, Bean, for the scaling algorithm and the PST output.

    It might be possible to condense this
    shift := 0
      repeat while (||x < 400_000_000) and (||y < 400_000_000)
        x *= 2
        y *= 2
        shift += 1
    
    to something like,
    ' enter with ||x =< $2000_0000 and ||y =< $2000_0000
          ' (overly strict to keep hypotenuse<1300000000)
      shift := 30 - ( >| (||x <# ||y) )   ' encode the biggest of x and y into 32 bits
      x << shift      ' left justify the largest into the field msb at $2000_0000
      y << shift     '  both shifted equally, angle unchanged
    
    Sometimes one might not want to restore the shift directly at the end. For example, if x=1 and y=1 at entry, you want an angle of 45 degrees and a hypotenuse of 1.41421, and the fractional part is coded in the extra bits and would be lost by scaling back.
  • Graham StablerGraham Stabler Posts: 2,510
    edited 2011-06-16 11:26
    max72 wrote: »
    There is a PASM routine from Beau doing atan2... it could be a nice starting point.

    My code is in PASM. I will upload it later.

    Graham
  • RaymanRayman Posts: 14,876
    edited 2011-06-16 16:41
    I was reading about CORDIC on wikipedia today... From there, it seems to make a lot of sense for Prop 1 because there is no multiply instruction...
    But, Prop 2 will have a multiply instruction, so it seems you could do it faster with an infinite series approximation there...
    Or, what am I missing?
  • Mark_TMark_T Posts: 1,981
    edited 2011-06-16 16:43
    Its a great algorithm, and I've been playing with a PASM implementation.

    I limited myself to calculating cis(theta) - ie sines and cosines rather than atan2 or arbitrary rotation, which means the multiply factor isn't needed (you just start with a scaled down value for x).

    Anyway the great thing is that the SUMNC instruction comes into its own for the conditional negation and I've managed to get the loop down to 13 instructions ;)
    :loop         movs      :loadins, corindex      ' overwrite source register with table entry regnum
                  add       corindex, #1            ' can't use loadins straight away, remember
    :loadins      mov       da, corlut              ' this modified instruction indexes corlut
                  mov       dx, yy
                  sar       dx, shift               ' dx = yy >> ii
                  neg       dy, xx
                  sar       dy, shift               ' dy = -xx >> ii
                  add       shift, #1
                  cmps      aa, #0  wc              ' drive angle to zero
                  sumnc     aa, da                  ' sumnc instruction very handy for this.
                  sumnc     xx, dx
                  sumnc     yy, dy
                  djnz      nn, #:loop              ' 52 cycles per loop, 27 loops = 1404 cycles
    
    
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-06-16 17:06
    Rayman,

    Series approximations don't always converge quickly. For some functions, Tchebychev polynomials provide a reasonable compromise, but for the amount of work the Prop II would have to get the same accuracy, I think my money would still be on the CORDIC method.

    You do raise an interesting point, though. For example, the transcendental function library for the IBM 1130 (a 16-bitter w/ multiply) used Tchebychev polynomials. But I don't know if CORDIC had been thought of yet (1968).

    -Phil
  • RaymanRayman Posts: 14,876
    edited 2011-06-16 18:09
    Well, here's the series approximation I used when I programmed a scientific calculator for my wristwatch:

    http://www.ganssle.com/approx/approx.pdf

    It can do a arbitrarily good job with just a few multiplications...
  • Tracy AllenTracy Allen Posts: 6,666
    edited 2011-06-16 21:29
    Rayman, those are floating point approximations, and would be very efficient when the FPU is available in hardware. CORDIC is ideal for integer math, because it uses shifts but nary a multiplication (well, with the exception of the final adjustment by the CORDIC gain, which can be done efficiently.) Furthermore, the CORDIC engine can can handle several functions using the same machinery and data table, simply by the way the crank is turned. For example, it can do both sin and cos and atan and hyp, using the same table. With a change of table, the same CORDIC engine can handle log and exp as well as multiply and divide.

    The guru for polynomial approximations is Jack Crenshaw, whose Math Toolkit for Real-Time Programming is referenced in the nice Genssle pdf. Crenshaw goes into depth on computing one function after another, for the most part assuming there is an FPU available. One has to be struck by the many different tricks and mathematics that are involved with the unique personality of each function and the effort to find a polynomial or an algorithm that will converge quickly. The results are very good and can certainly whip CORDIC, again provided there is an FPU available, and provided there is enough memory to hold all the unique algorithms.
  • RaymanRayman Posts: 14,876
    edited 2011-06-17 02:17
    The series approximation works with fixed point too. It doesn't require floating point. (I say that, but I haven't done it).
    And actually, multiplication with floating point is fast even without an FPU.
    But, it is very slow without a hardware multiply.

    I guess that's my question... Wikipedia (not that it's always right) says that CORDIC is used primarily when you don't have a hardware multiply. But, the Prop 2 will have a hardware multiply. So, why use CORDIC on Prop2?

    I think I remember something about the Prop2 having special CORDIC circuitry though... Maybe that's the reason...

    But, it does look like CORDIC is the best option for Prop1 to do better than the ROM table.
    (Although, I do wonder if an external table, like on SD card or other flash device, would be faster...)
Sign In or Register to comment.