32-bit CORDIC algorithm in Spin
cgracey
Posts: 14,256
Here is a CORDIC algorithm written in Spin which can generate very precise sines and cosines, rotate x,y points around 0,0, and convert x,y to polar form (angle, length).
This code uses the Propeller Demo Board and a TV for viewing text. The main point of this, though is just the 'pri cordic' method, the 27-long lookup table, and the three a,x,y long variables. You can pull those out and implement them into your own code
Kwabena (Kye) is using this algorithm to process GPS coordinates (converted into 32-bit-angles) to determine distances between points on the earth with a numerical accuracy of a few inches.
Chip
This code uses the Propeller Demo Board and a TV for viewing text. The main point of this, though is just the 'pri cordic' method, the 27-long lookup table, and the three a,x,y long variables. You can pull those out and implement them into your own code
Kwabena (Kye) is using this algorithm to process GPS coordinates (converted into 32-bit-angles) to determine distances between points on the earth with a numerical accuracy of a few inches.
Chip
Comments
This routine is very elegant, and I'm sure Kye will produce great results.
Thanks again,
Massimo
True, but navigation algorithms are very complex, and if a great circle routine is so good you only error is the one from the GPS. I think this is a great achievement.
Massimo
Bruce
If it could actually be more precise, I'd be impressed...
The Propeller's sine rom has 2^13 samples, full circle, that are 16 bits in range.
This CORDIC algorithm has about 2^32 samples, full circle, that are about 30 bits in range.
So, it's got 2^19 better phase resolution and 2^14 better range resolution.
If you think about the earth being 24,900 miles in circumference, 2^13 discrete angles resolve to 3.04 miles each, while 2^32 discrete angles resolve to 0.367 inches each.
Considering that the earth's radius is about 3,960 miles, a 16-bit range yields 319 feet per count, while a 30-bit range yields 0.448 inches per count.
Lawson
I still recon that the earth is flat :-)
Probably Kye is considering also the ellipsoid... ;-)
Now, how high was that bridge? and the mast above the water? Oops!
Massimo
I'd like to see a graph or table that shows the improvement over using the table.
The main modification is to scale up small input values to get full (or almost full) resolution.
I changed it to use PST (I don't have TV at work)
I expanded the corlut table to 1 (this may not make any difference, I don't know)
Comment welcome
Bean
You would have to replace the lookup table, start the iterations at i=1, instead of at i=0, then repeat every third iteration without changing i. The next Propeller has both circular and hyperbolic CORDIC in hardware.
Today my computer is dead with a virus, but I hope to have it reborn by noon. I will do it then and post it.
Chip
Good idea. I found through experimentation that if you provide as many sub-bits as you have whole bits, you will be lossless. For example, if you have 32-bit A,X,Y's, give them each 32-sub bits and not only is there no need to normalize them upwards, but they will be as accurate as possible. You cannot do any better than that.
I had also read, and verified through experimentation, that for 32-bit CORDIC variables, 27 iterations is optimal. Beyond that, you get too much truncation in the X and Y deltas and it amounts to a few LSBs' worth of noise in the results. It just fidgits around.
It does circular and hyperbolic functions etc, it stalled because of some small detail or other, I wonder if together we can whip it in to shape. It was in assembly too so very fast and might make a nice object, if I remember correctly it might have been 16 bit but could be made 32bit or switch-able.
Graham
Chip, thanks for the comments.
By "sub-bits" do you mean like a 32.32 fixed point number ? That might make more sense as it would allow fractional values to be used.
Bean
There is a PASM routine from Beau doing atan2... it could be a nice starting point.
Thanks Kye!
With the distance already done the course is just a little step away...
Did you test it for close points (say less than a mile)?
Massimo
Thanks too, Bean, for the scaling algorithm and the PST output.
It might be possible to condense this to something like, Sometimes one might not want to restore the shift directly at the end. For example, if x=1 and y=1 at entry, you want an angle of 45 degrees and a hypotenuse of 1.41421, and the fractional part is coded in the extra bits and would be lost by scaling back.
My code is in PASM. I will upload it later.
Graham
But, Prop 2 will have a multiply instruction, so it seems you could do it faster with an infinite series approximation there...
Or, what am I missing?
I limited myself to calculating cis(theta) - ie sines and cosines rather than atan2 or arbitrary rotation, which means the multiply factor isn't needed (you just start with a scaled down value for x).
Anyway the great thing is that the SUMNC instruction comes into its own for the conditional negation and I've managed to get the loop down to 13 instructions
Series approximations don't always converge quickly. For some functions, Tchebychev polynomials provide a reasonable compromise, but for the amount of work the Prop II would have to get the same accuracy, I think my money would still be on the CORDIC method.
You do raise an interesting point, though. For example, the transcendental function library for the IBM 1130 (a 16-bitter w/ multiply) used Tchebychev polynomials. But I don't know if CORDIC had been thought of yet (1968).
-Phil
http://www.ganssle.com/approx/approx.pdf
It can do a arbitrarily good job with just a few multiplications...
The guru for polynomial approximations is Jack Crenshaw, whose Math Toolkit for Real-Time Programming is referenced in the nice Genssle pdf. Crenshaw goes into depth on computing one function after another, for the most part assuming there is an FPU available. One has to be struck by the many different tricks and mathematics that are involved with the unique personality of each function and the effort to find a polynomial or an algorithm that will converge quickly. The results are very good and can certainly whip CORDIC, again provided there is an FPU available, and provided there is enough memory to hold all the unique algorithms.
And actually, multiplication with floating point is fast even without an FPU.
But, it is very slow without a hardware multiply.
I guess that's my question... Wikipedia (not that it's always right) says that CORDIC is used primarily when you don't have a hardware multiply. But, the Prop 2 will have a hardware multiply. So, why use CORDIC on Prop2?
I think I remember something about the Prop2 having special CORDIC circuitry though... Maybe that's the reason...
But, it does look like CORDIC is the best option for Prop1 to do better than the ROM table.
(Although, I do wonder if an external table, like on SD card or other flash device, would be faster...)