Some CORDIC documentation

Graham Stabler · 2006-10-09 21:22

I need to do some real time trig for a differential odometer thing I'm working on, as such I've been struggling to get to grips with the CORDIC method. I finally got it and so for my own benefit I added these comments to the Memsic object that uses the Cordic method to compute angles given x and y lengths.

I have read quite a few explanations of CORDIC and found it hard going, in my explanation I hope to explain the process before filling in the maths and going on the realization. I've probably not done a better job than anyone else but it has helped me.

I'd welcome input from beginners and experienced folk on the accuracy and clarity of what I have written. In particular I'm unsure about the bits with setting the sign in the assembly section, note the ????????????.

Cheers,

Graham

' Documented Cordic
'
' A vector from the origin to a point may be represented as X and Y co-ordinates.  And those co-ordinates
' if the vector is of unity length are equal to the sin and cosines of the angle between the x axis and the point.
'
'   x = cos(Theta)
'   y = sin(Theta)          
'
' So we can see that if we find values of x and y corresponding to a given theta we have sin and cos(theta)
' and if we find the change in theta required to reduce y to zero we have found the angle theta.  The CORDIC
' algorithm does just that.
'
'**********************************************************************************************************************
'
' --- Performing a sin(theta) or Cos(theta), general description
'
' What we have are two vectors, one is the vector we want to determine the angle of or the x and y
' co-ordinates of (to get sine and cos) and the other is our estimate vector with co-ordinates x'and y'.
'
' In the case of finding Cos or Sin(theta) we start with our estimate vector on the x axis, it is then rotated by 45
' degrees towards the vector to be measured.  45 degrees is added to the angle accumulator and the values for
' x' and y' are modified.  How do you do that? How can we work out where the end of the estimate vector is just by the
' angle when we can't do sin/cos? The trick is to choose the angles very carefully so we know what the change in x'
' and y' will be, more on that in the maths. The important word is change, we have to go thorough a process of tweaking
' the values of x' and y',  estimate by estimate until they are correct.
'
' Say for instance that the angle we are trying to get the sin/cos of is +30 degrees, our rotation of the
' estimate vector has overshot.  So the next rotation must be in the opposite direction.  How big is the rotation?
' well it is smaller than before, 26.56 degrees, seems random but its not as we will see.  So after this rotation
' the angle accumulator has 18.44 degrees in it and the values of x' and y' have been updated too, next the angle
' to be rotated is 14 degrees and the rotation direction is positive again.  Now the angle accumulator
' has 18.44+14 = 32.44 degrees, getting warmer and the values in x' and y' are getting closer to sin(theta) and
' cos(theta) too.
'
' This process continues for a fixed number of iterations, the angle of rotation gets smaller each time as the correct
' angle is homed in on.  At the end of the process you have your values of x' and y' and hence your sin(theta)
' and cos(theta) values.
'
'
'**********************************************************************************************************************
'
' --- Getting the angle - General description.
'
' You can do a very similar thing to find the angle given the x and y co-ordinates and this is what the assembly
' code below does.  To find an angle the values of x' and y' are set to those provided.  The angle accumulator
' is set to zero and a similar process ensues.  The difference in this case is that rather than aiming for a target
' vector the algorithm is trying to rotate the estimate vector to the x-axis, once it has done that the angle
' accumulator will contain the correct angle.  As before it is performed rotation by rotation, the sign of y'
' determines the direction of rotation if it becomes positive the vector must be rotated CW and if negative CCW
' to make sure it moves towards the x-axis.  So although not used directly the values of x' and y' are still
' important in deciding if to add or subtract the increasingly small rotation angles to the angle estimate.  
'
'
'**********************************************************************************************************************
' 
' --- Some maths
'
' As we know a vector can be represented by a pair of co-ordinates x,y
'
' If that vector is rotated around the origin by an angle theta the new co-ordinates of its endpoint become
'
'       x' = x cos(theta) - ySin(theta)            ' From some trig (look up on net if interested)
'       y' = y cos(theta) + xSin(theta)             
'
'       As Tan(theta) = Sin(theta)/Cos(theta) then it may be re written as
'
'       x' = cos(theta)( x - yTan(theta))
'       y' = cos(theta)( y + xTan(theta))
'
' This looks a little worse if anything, we now have a Tan multiplied by y.  If only yTan(theta) could be easily
' calculated.  Well in binary, multiplies/divides by a power of 2 (2,4,8...) can be done by bit shifting.  To multiply by 2 you
' simply shift the bits 2^1 places to the left, 0010 => 0100 voila!  Similarly you can divide with left bit shifts so if Tan(theta)
' was made a power of 2 the yTan(theta) could be done with simple shifts.
'
' As binary itself shows us you can represent a number by adding up lots of progressively smaller bits so this is what
' is done in the Cordic algorithm.  We choose Tan(theta) to be a limited set of powers of 2, to be precise we set:
'
'       Tan(theta) = 2^-i            (note: 2^-i means 1/(2^i) and  i is the iteration number,)
'
'           when i = 1 Tan(theta) = 2^-1 and theta = 45 degrees
'           when i = 2 Tan(theta) = 2^-2 and theta = 26.56 degrees
'           ....
'
' Sound familiar?  
'
' We can easily put these angles in a look up table and access them as we go along, the example below
' uses 20 of them, that's not many for some amazing accuracy.
'    
' Doing some more mathematical substitution and defining Xi as the current estimate and Xi+1 as the next
' we have:
'
'       Xi+1 = Cos(atan(2^-i)).[noparse][[/noparse]Xi - YiDi(2^-i)]
'       Yi+1 = Cos(atan(2^-i)).[noparse][[/noparse]Yi + XiDi(2^-i)]
'
' Its not as bad as it looks.  Firstly I have dropped the ' for clarity and secondly atan(2^-i) has been subbed for
' theta because that is the angle we shall be rotating by.  There is also a new variable called Di, I suppose that D
' might stand for decision as this value is determined by the direction of rotation required and may be +/-1.
'
' The whole thing can be neatened up by defining:
'
'       Ki = Cos(atan(2^-i))
'
'   Xi+1 = Ki.[noparse][[/noparse]Xi - YiDi(2^-i)]
'   Yi+1 = Ki.[noparse][[/noparse]Yi + XiDi(2^-i)]
'
' Here's the clever part, Ki does not contain anything that varies other than i, and i always does the same thing,
' it goes from 1 : n.  The thing to understand is that without Ki the estimate vector gets longer and longer and
' will end up being longer than one, that will mean that Xi and Yi will no longer represent Sine and Cosine of
' Theta.  Ki simply scales the vector to unity and it does the same thing every time you run the cordic
'
' The final effect of Ki is that the result has been multiplied by K1*K2*K3*K4 ...... Kn
'
' This may be given the name K (the cordic gain) and applied to Xi and Yi eiher after or before iterating
'  ___________________________________________________________________________________________________
'|                                                                                                    |
'| Aside: K, the Cordic gain                                                                          |
'|                                                                                                    |
'| K = Product(Ki) from K1-Kn        ' This means K1*K2*Kn...                                         |
'|                                                                                                    |
'| K = Product(Cos(atan(2^-1)))      ' Subbing for Ki                                                 |
'|                                                                                                    |
'| K = Product(1/sqrt(1+2^-2i))      ' according to some boffin                                       |
'|                                                                                                    |
'| and that becomes 0.607252935 very quickly as n gets bigger (because cos of a small angle is one).  |
' ____________________________________________________________________________________________________|

' So cutting a long story short you either multiply X and Y by 0.607252935 at the beginning or at the end if you
' care about there real lengths.

'  So now we have the more dainty equations of:
' 
'  Xi+1 = [noparse][[/noparse]Xi - YiDi(2^-i)]
'  Yi+1 = [noparse][[/noparse]Yi + XiDi(2^-i)]
'
'  A reminder on how to use them.
'
'  To find sin/cos:
'
'       1. Begin with Xi = 1 and Yi = 0 so the estimate is on the x-axis
'       2. Iterate the equation, updating the values of Xi and Yi, Increment/decrement the angle accumulator according to Di 
'          with the values in the look up table, 45, 26.5 etc. Work out the next value for D by comparing the the angle accumulator 
'          with the angle provided (is the estimate less or greater than the angle).
'       3. Repeat 2 for a given number of steps, more steps means greater accuracy.
'       4. Scale Xi and Yi by 0.607252935
'       5. And relax
'
'  To find theta given x and y:  
'
'       1. Begin with Xi = x and Yi = y , the values given.  
'       2. Iterate the equation, updating the values of Xi and Yi.  Increment/decrement the angle accumulator with the values in the
'          look up table, 45, 26.56 etc decide on Di by looking at sign of Yi in order to rotate the estimate towards the x-axis.  
'       3. Repeat 2 for a given number of steps
'       4. Relax, the angle accumulator holds the answer no need for scaling.
'
'
'******************************************************************************************************************
' And now some assembly code:
'
'
' This is code taken from the mesmic2125 object, it computes angle given x any y values.  I have added my comments.
' Comments between lines tend to be explanatory of the process, those on the lines are about the code and may help
' those new to assembly (like me) 

DAT                     'cx and cy are the supplied variables.
                        'ca will hold the eventual answer


cordic                  ' The initial vector could be in any of the 4 quadrants, the basic algorithm will only work for angles between
                        ' +/- 90 degrees but you want to be able to get an answer in any instance.  You also want to ensure that the final
                        ' answer is between 360 degrees. 
                        '
                        ' The number system for angle is carefully chosen, it varies from 0-360 or rather it varies from 0-0, as it overflows
                        ' back to 0 when full.  This means that if your starting vector is in the 4th quadrant (down and right) and is say
                        ' 10 degrees from the x-axis if the angle is set to zero degrees at the start then as the estimate approaches the
                        ' x-axis the angle becomes 340.
                        ' ________________________________________________________________________________________________________________
                        '|                                                                                                                |
                        '| Why?  Its 2s-complement,                                                                                       |
                        '|                                                                                                                |
                        '| take an 8 bit number 00000000 then take one from it you get 11111110 which if you forget about the fact it     |
                        '| should be neagitive is 254 or more importantly 255-1                                                           |
                        '|                                                                                                                |
                        '| As a further note, in assembly the program has to keep track of whether a number is supposed to be signed      |
                        '| or not.  e.g. the first line of this code checks for negativeness (wc)so we must assume cx is being used as a  |
                        '| signed variable.                                                                                               |
                        '|________________________________________________________________________________________________________________|

                        ' If the vector is in the 1st quadrant the angle is just added as the estimate vector moves towards the x-axis.
                                                        '
                        ' If in the 3rd quadrant (left and down) the angle is set to 180 at the start and the vector flipped to the 1st quadrant.
                        ' This means that the final angle is 180 + the swept angle as it should be.  
                        '
                        ' Finally for the 2nd quadrant the angle is set to 180 at the start and the vector flipped to the 4th quadrant the final result 
                        ' becomes 180-swept angle.
                        '

                        ' Make cx positve as the algoritm always operates in quadrants 1 and 2.
                        ' This makes Q2=>Q1 and Q3=>Q4
                        abs     cx,cx           wc      ' take the absolute of cx
                        ' If was in Q3 or Q2 flip about x-axis
                        ' So Q2=>Q4 and Q3=>Q1 overall.
        if_c            neg     cy,cy                   ' negate cy if cx was negative

                        ' Decide on starting angle, 0 or 180
                        mov     ca,#0                   ' load 0 into ca, the estimated angle
                        rcr     ca,#1                   ' Load with 180 if was on left                       

                        movs    :lookup,#table          ' Puts address of table into sumc command
                        mov     t1,#0                   ' Initialize t1 (angle shift 2^-t)
                        mov     t2,#20                  ' Initialize t2 (number of interations)

                        'The loop
:loop                   'First calculate dX(i+1) => Yi.2^-t
                        mov     dx,cy           wc      ' Load dx with cy cary is set if negative
                        sar     dx,t1                   ' Shifts dx by t1 while preserving sign
                        'Then  calculate  dY(i+1) => Xi.2^-t
                        mov     dy,cx                   ' Load dy with cx
                        sar     dy,t1                   ' Shifts dy by t1 while preserving sign
                        'Now add to the estimate with sign dependant on sign of Yi
                        sumc    cx,dx                   ' Add -dx to cx if c=1 or dx if c=0 
                        sumnc   cy,dy                   ' Add -dy to cx if c=0 or dy if c=1
                        'Next add or subract the specified rotation from the angle accumulator   
:lookup                 sumc    ca,table                ' Add -table to ca if c=1 or table if c=0
                                                        ' Remembering that the source field of the sumc command "table"
                                                        ' is being over written to address different elements
                                                        ' of the table below 
                        add     :lookup,#1              ' Increments the destination of lookup sumc command
                        add     t1,#1                   ' increment t1, this is i
                        djnz    t2,#:loop               ' decrement t2, keep looping if not zero.

cordic_ret              ret                             ' return to rest of program (whatever it is)

'Table of atan(2^-i)


table                   long    $20000000        'atan(1)   = 45    degrees
                        long    $12E4051E        'atan(1/2) = 26.56 degrees
                        long    $09FB385B        'atan(1/4) 
                        long    $051111D4        'atan(1/8)
                        long    $028B0D43        'atan(1/16)
                        long    $0145D7E1        'atan(1/32)
                        long    $00A2F61E        'atan(1/64)
                        long    $00517C55        'atan(1/128)
                        long    $0028BE53        'atan(1/256)
                        long    $00145F2F        'atan(1/512)
                        long    $000A2F98        'atan(1/1024)
                        long    $000517CC        'atan(1/2048)
                        long    $00028BE6        'atan(1/4096)
                        long    $000145F3        'atan(1/8192)
                        long    $0000A2FA        'atan(1/16384)
                        long    $0000517D        'atan(1/32768)
                        long    $000028BE        'atan(1/65536)
                        long    $0000145F        'atan(1/131072)
                        long    $00000A30        'atan(1/262144)
                        long    $00000518        'atan(1/524288) = 1e-4 degrees

' Uninitialized data

t1                      res     1
t2                      res     1

dx                      res     1
dy                      res     1
cx                      res     1
cy                      res     1
ca                      res     1

Post Edited (Graham Stabler) : 10/10/2006 7:02:10 PM GMT

Beau Schwabe · 2006-10-09 22:21

Graham Stabler,

Great job!

When I wrote the Memsic object I was at the Parallax office in Rocklin, and had Chip over my shoulder helping me
out.· Originally I had an object that used a reverse lookup table deriving Arcsine from of the existing Sine table in
ROM.· Although it worked it was not as fast, and didn't have the precision of Chip's CORDIC routine.· At the time
he happened to be using the CORDIC stuff for something else he was working on, so it was literally a cut-n-paste
from his object into mine with only a few minor tweaks here and there.· As for a detailed explanation on how the
CORDIC routine works, understand it a little.
·
·

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe

IC Layout Engineer
Parallax, Inc.

Tracy Allen · 2006-10-10 04:53

Great job, Graham, seconded! Thanks for posting this.

I'll add a couple of comments.

1) This statement regarding the cosine term,

> At each stage of the process k does a known and consistant thing to the result.
> This allows K to be taken out of process and just applied at the end as a scaling factor.
> As it happens it very quickly converges to 0.607252935"

It is not k by itself, but the _product_ of terms K(i) that quickly converges:

{k(0) * k(1) * k(2) * ... * k(n-1) --> 0.607252935}

It converges quickly, because cosine approaches unity as the rotation angles get smaller and smaller.

Also, there is a central insight behind CORDIC is is fuzzy there. The rotation for each angle can be either + or -, and the cosine term also contains the factor, (D) for decision, but this is the insight...

k = Cos(Di * atan(2^-i))    <--- note the Di is included!

but from elementary trig...
cosine(theta)=cosine(-theta).  

k = Cos(atan(2^-i))    <--- therefore the Di can be dropped

2) In the item, "To find theta, given x and y". It is true that there is no need for scaling of theta when the algorithm is finished. But there is a "furthermore". At the end, when the vector has been rotated to the x axis, the value cx now contains the magnitude of the original vector, which may be also be of interest. That needs to be scaled by the CORDIC gain.

3) It is really instructions like sumc and sumnc that set the Propeller apart from other processors. In one step, it accomplishes one of the core operations of CORDIC, to speedily add or subtract conditonal on a flag. My hat is off to Chip! He is always thinking ahead!

4) You asked about the business with the

rcr     ca,#1                   ' sets sign bit if cx was negative ??????????????

I don't quite understand it either, but I am sure it is very clever and there will be an aha!

Note that the value Chip has chosen to represent 45 degrees is $2000_0000. I say "chosen", because, other things being equal, that is indeed an arbitrary choice. You might want to choose decimal 45_000_000 to represent 45 degrees, and to construct the table, you would simply enter that starting number into excel and have it pop out the values of 45_000_000 * atan(2^-i) for i=0 to 31. It is simply a matter of scale and the CORDIC will work just fine at any scale. This table is constructed with the values, $2000_0000 * atan(2^-i). The full circle is 8 time 45 degrees, the full circle $ffff_ffff, or +/- $7fff_ffff. That of course gives the best possible angular resolution.

Now for my take on the question. The CORDIC algorithm has to be restricted to work with angles in the range of -90 to +90 degrees, or +/- $3fff_ffff. By priming the value of angle (ca) with the sign bit, the additions and subtractions of the algorithm are relected through the origin into the left half plane. These things always take a tight twist of logic! If the sign bit is set, then ca starts at $8000_0000, which is 180 degrees, or -180 degrees if you wish. Positive rotations add a positive value to that, say a 45 degrees moves to $A000_0000, which is 225 degrees in quadrant 3, or equivalently - 135 degrees. And a negative rotations move into quadrant 2.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

cgracey · 2006-10-10 07:10

Graham, Beau, and Tracy:

I've found that CORDIC scaling is best performed BEFORE the rotation so that the results never overflow the unit circle. This means multiplying X (and maybe Y) beforehand by the constant 0.607252935 that Tracy pointed out. This is, of course, a pain in the rear. In 32-bit quality, that number in binary is:

%0.1001101101110100_1110100110101000 ("_" separates groups of 16 bits)

For 16-bit quality, this would round up to:

%0.1001101101110101

This would take ~50 instruction cycles for a looped multiplier to compute.

There's a faster way, though. That sequence of 1's and 0's can be searched for sub-patterns which might have exploitable symmetries. I looked at that sequence and found a few sub-patterns which repeated, but %10001 was the most agreable since it occurs four times at equal relative offsets, allowing those four occurrences to be·summed in·two folded operations. The last occurrence requires individual attention since it·didn't fit into the first four and was at a different relative offset, anyway:

······· 0.1001101101110101·· <---·CORDIC voodoo·scaling term
·······

+ >>0···· 10001 ^^ ^^^ ^ ^·· <--- made up of goblins
+ >>3······· 10001 ^^^ ^ ^
+ >>6·········· 10001^ ^ ^
+ >>9 ············ 10001 ^
+ >>11·············· 10001·· <--- orphan

Here's some code that will scale·X by %0.1001101101110101 (0.607254) in just 12 instructions:

··············· sar···· x,#1··········· 'get %0.10001 of x
··············· mov···· t1,x··········· '(····· 1··· )
··············· sar···· t1,#4·········· '(+········ 1)
··············· add···· x,t1

··············· mov···· t1,x··········· 'get %0.10011001 of x
··············· sar···· t1,#3·········· '( ······· 10001)
··············· add···· t1,x··········· '(+·····10001·· )

··············· mov···· t2,t1·········· 'get %0.10011011011001 of x
··············· sar···· t2,#6·········· '( ·········· 10011001)
··············· add···· t2,t1·········· '(+·····10011001····· )

··············· sar···· x,#11·········· 'get %0.1001101101110101 of x
··············· add···· x,t2··········· '( ··············· 10001)
······································· '(+···· 10011011011001· )

·

{k(0) * k(1) * k(2) * ... * k(n-1) --> 0.607252935} must be one of those cosmically special numbers!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 10/10/2006 8:15:29 AM GMT

Graham Stabler · 2006-10-10 08:58

Tracey,

1) you are right about the k thing, it is K not k that converges, I'll correct that, bad wording.

2) As for the d in the cosine, if you define k as not having d in it i.e. you seperate the angle from its sign you get the same result and don't have to consider the symetry or lack of it. I've seen explanations on the web that mention the symmetry and those that don't. Not sure if what I have done is really wrong.

3) Interesting about the length of the vector being in Cx, I'll add that info.

4) There is nothing random about Chips choice for 45 degrees, if you multiply $2000_0000 by 8 which should give 360 degrees you get $1_0000_0000 which in 32bits is zero, and 360 and zero are the same thing. This provides I assume the maximum potential for resolution? Oh I think you said that.

5) I'm going to work out this Ca thing, I think you are probably right, I'll come back with more info.

Graham Stabler · 2006-10-10 09:05

Chip,

1) Thanks, I think [noparse]:)[/noparse] Now some new stuff to get my head around! I read it was more convenient to pre-scale so that the vector becomes of unit length at the end but as I didn't know how to do it, I went for the easy option in my explanation. I might add a part 2, a more verbose description of what you have explained and an example that calculates the cin and cosines of a given angle.

2) In a recent thread I asked about sources of clever tricks with binary, this is definitely one, can you give me some insight on where you learnt all this? And yes I know you were born with a microprocessor in your hand [noparse]:)[/noparse]

Graham

Graham Stabler · 2006-10-10 12:14

I have updated the description of K and have added more description to the assembly now I think I finally get it.

Chip, PLEASE check it!

Graham

Graham Stabler · 2006-10-10 12:20

Chip/Tracey,

If you want to calculate the Sin of an angle you start with the estimate vector on the x axis and give it unit length, isn't multiplying 1 by 0.607252935 even easier [noparse]:)[/noparse]

The only time you need to do the multiplication is if you are trying to work out the vector length and angle given x and y, so you are finding sqrt(x^2 + y^2)

is that correct?

Graham

Graham Stabler · 2006-10-10 13:05

Corrected which quadrant it which.

cgracey · 2006-10-10 17:16

Graham,

My incomplete view of the CORDIC algorithm is that it is good for two basic things:

1) Rotating an (X,Y) point around the origin by driving the desired angle towards 0.

····· a) Initializing X to Ro, Y to 0, and angle to Theta is useful for polar to cartesian conversion

···· ·b) Initializing X to 0.6072539, Y to 0, and setting angle is useful for COS/SIN generation - you get both at once!

····· c) Simply rotating (X,Y) by an angle is often useful for graphics and signal synthesis.

2) Clearing the angle and then rotating·(X,Y)·by driving Y towards 0 yields the original (X,Y) angle and length in X

····· a) This is effectively cartesian to polar conversion

Depending on·your needs, the algorithm can be simplified around the setup and takedown code which goes·before and after the rotation code. The·rotation code is always the same in function, though the the decision point·may be·either the sign of the angle or the sign of Y. You can make a loop or straight-line code it.

I've read that the CORDIC algorithm can also be used·for log conversions. I imagine the main difference would be replacing the lookup table with some other values. Tracy could probably figure out how to do this.

As far as numerical tricks go to simplify computations: they always revolve around the commutative and associative properties of addition, subtraction, multiplication, and division (which are the building blocks of integrals, differentials, and everything higher). Programming real-time signal-processing applications is so much fun because you're building a functional puzzle that must be sub-solved many times along the way to completion. It's always about putting more stuff in, and then finding ways to take·almost all of it back out, so that just a few instructions remain to perform the essential work.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 10/10/2006 7:44:21 PM GMT

Bean · 2006-10-10 17:56

Graham,
· Very nice. But I do have one question. In the text at the top you have:

'       As Tan(theta) = Sin(theta)/Cos(theta) then it may be re written as
'
'       x' = cos(theta)( x - yTan(theta))
'       y' = cos(theta)( x - yTan(theta))

I can't see how X' and y' can be equal to the exact same formula ?

Bean.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheap used 4-digit LED display with driver IC·www.hc4led.com

Low power SD Data Logger www.sddatalogger.com
SX-Video Display Modules www.sxvm.com

Don't mistake experience for intelligence. And vis-vera.
·

Graham Stabler · 2006-10-10 19:10

Bean, that was the special 45degree only version [noparse]:)[/noparse]

Now corrected and thanks for taking the time to read!

Thanks Chip, I now have an almost complete version of the code that will provide Sin(theta) and Cos(theta) (at the same time yeh!), I'm just debugging it. Learning the binary tricks is good fun but it can be mind bending decoding other peoples clever code.

I think I may need that divide routine after all as I am doing some resolving, I think my vector length will need to be multiplied by 0.65.... So I can end up with r.cos(theta)

Cheers

Graham

Tracy Allen · 2006-10-10 19:45

In the Stamp version of the CORDIC arctangent, www.emesys.com/BS2mathC.htm, I had to incorporate a wrapper to prescale numbers up to larger values. For example, if x=3 and y=3, the angle is should obviously be 45 degrees and the length sqr(6)=2.449489742783, but the algorithm would not converge well, unless the small x and y were both multiplied by a contant factor such as 2^10 * 0.607252935. You can't effectively prescale a small integer by a fraction like 0.607252935! By scaling both x and y by the same factor, the angle is unaltered, and the length scales by the same factor, and the scale factor can be compensated at the end. As I write this it's not obvious how the memsic Prop algorithm above wiil respond to small integers. Having 32 bits of resolution should converge very close to the correct angle, within
long $00000518 'atan(1/524288) = 1e-4 degrees
But I think a prescaling wrapper would be necessary for precision with "small" numbers on SQR(x^2 + y^2).

I didn't learn anything about CORDIC until I had an email exchange with Al Williams about his math coprocessor. (Where is Al these days? -- I haven't seen him on the forums for a long time). That was when I got interested in implementing (maybe _kluging_ would be a better description) it on the BASIC Stamp. Ray Adraka's article is the best I have seen for an overview of what can be accomplished with CORDIC and a succinct treatment of the procedures. The underlying machinery is the same. For exmple, to get to log and exponential, the table contains the values of the hyperbolic trig funcitons instead of the ordinary trig functions. As for creative use in signal processing, I think Chip has it down, bar none. Wow, that quick algorithm for multiplying by 0.607252935 is cool! I was wondering too how you came up with that decomposition.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Bean · 2006-10-10 20:59

When I wrote an ATAN CORDIC program in SX/B, I scaled the inputs according to:
If either input (X or Y) was > 16384 then both were divided by 2.
Then...
If both inputs are < 4096 then both are multiplied by 2.

Here is the code if anyone is interested...

Bean.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheap used 4-digit LED display with driver IC·www.hc4led.com

Low power SD Data Logger www.sddatalogger.com
SX-Video Display Modules www.sxvm.com

Don't mistake experience for intelligence. And vis-vera.

Post Edited (Bean (Hitt Consulting)) : 10/10/2006 9:04:09 PM GMT

cgracey · 2006-10-11 00:43

Tracy Allen said...
Wow, that quick algorithm for multiplying by 0.607252935 is cool! I was wondering too how you came up with that decomposition.

I just drew the pattern onto paper and started crossing out groups of bits. I was surprised to find that a couple of sub-patterns emerged right away. I chose %10001 because of the 4 occurrences at constant relative offsets. This makes me think that a searcher could be made which would scan a target multiplier pattern and come up with minimized solutions. Aside from adding, there's always subtracting, too. I suppose there are hundreds of solutions to some patterns. I wonder if any optimizing compilers ever bothered to do this kind of thing. Maybe even non-linear formulae could be realized, as well. This subject fascinates me. One the one hand you have some mathematical objective, while on the other you have a processor capable of doing only certain things. How to connect the two most efficiently is the challenge.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 10/11/2006 2:58:02 AM GMT

Tracy Allen · 2006-10-11 00:58

Here is an Excel spreadsheet that underscores how rapidly the cordic gain converges to 0.607252935.

I    cos(atn(1/2^I))    product
0    0.70710678118655    0.70710678118655
1    0.89442719099992    0.63245553203368
2    0.97014250014533    0.61357199107790
3    0.99227787671367    0.60883391251775
4    0.99805257848289    0.60764825625617
5    0.99951207608708    0.60735177014130
6    0.99987795203470    0.60727764409353
7    0.99996948381879    0.60725911229889
8    0.99999237069278    0.60725447933256
9    0.99999809265682    0.60725332108988
10    0.99999952316318    0.60725303152913
11    0.99999988079073    0.60725295913895
12    0.99999997019768    0.60725294104140
13    0.99999999254942    0.60725293651701
14    0.99999999813736    0.60725293538591
15    0.99999999953434    0.60725293510314
16    0.99999999988359    0.60725293503245
17    0.99999999997090    0.60725293501477
18    0.99999999999272    0.60725293501035
19    0.99999999999818    0.60725293500925
20    0.99999999999955    0.60725293500897
21    0.99999999999989    0.60725293500890
22    0.99999999999997    0.60725293500889
23    0.99999999999999    0.60725293500888
24    1.00000000000000    0.60725293500888
25    1.00000000000000    0.60725293500888
26    1.00000000000000    0.60725293500888
27    1.00000000000000    0.60725293500888
28    1.00000000000000    0.60725293500888
29    1.00000000000000    0.60725293500888
30    1.00000000000000    0.60725293500888
31    1.00000000000000    0.60725293500888

The term cos(atn(1/2^n)) in the product can be re-written as follows, eliminating the cos and atn.

attachment.php?attachmentid=43567

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Post Edited (Tracy Allen) : 10/11/2006 1:07:45 AM GMT

Phil Pilgrim (PhiPi) · 2006-10-11 04:30

Chip,

Automating the decomposition process intrigues me, too. As it turns out, it may not be that difficult — at least for 16-bit word sizes. The reason I say that is that the search space is fairly small, as combinatorial problems go, making an exhaustive search feasible.

Anyway, for a w-wide pattern of n ones, and w - n zeroes, there are comb(n, w) = n! / ((n-w)! w!) possible combinations. At each of the 17 - w shift positions this pattern can take, we have three choices: add, subtract, or do nothing. So there are 3 ** (17 - w) different ways to handle each pattern. Therefore, the total search space for a w wide pattern of n one bits is

····comb(n, w) × 3^{17 - w}

Summing this over a pattern width of n to 16 yields the following for each n from 2 to 8:

[u]n | Search Size[/u]
2 |  48,427,344
3 |  24,212,652
4 |  12,102,756
5 |   6,042,096
6 |   3,002,484
7 |   1,472,070
8 |     699,570

For any given pattern and any number of adds/subtracts of that pattern, it's easy to compute both the resulting coefficient (to see if it's even correct) and, if it is, the number of machine instructions (the "cost") required to perform the computation. Then it becomes a cost-minimization problem over the entire problem space.

Okay, so checking 90 million different algorithms will probably take awhile. Whether or not to apply heuristics to decrease the search space will depend on how many times you have to do it, I guess. If I only had to do it once, I'd go for the exhaustive search. If that took too long and I had to do it more than once, I might look for a cleverer approach.

Unfortunately, the cost function is by no means continuous over the search domain. So gradient search methods are out. One could use genetic algorithms, but they don't guarantee the best solution, only a good one.

Maybe I'll try the exhaustive technique, just to see how long it takes.

-Phil

Post Edited (Phil Pilgrim (PhiPi)) : 10/11/2006 4:40:14 AM GMT

cgracey · 2006-10-11 06:21

Phil Pilgrim (PhiPi) said...

Automating the decomposition process intrigues me, too. As it turns out, it may not be that difficult — at least for 16-bit word sizes. The reason I say that is that the search space is fairly small, as combinatorial problems go, making an exhaustive search feasible...

...For any given pattern and any number of adds/subtracts of that pattern, it's easy to compute both the resulting coefficient (to see if it's even correct) and, if it is, the number of machine instructions (the "cost") required to perform the computation. Then it becomes a cost-minimization problem over the entire problem space.

Phil,

This would work for multiplies and divides and it would be very handy. I'm anxious to hear all about anything you come up with.

I've daydreamed for a long time about some program that would help you write code for math functions, but not necessarily by knowing anything about math. It would·just simulate random instruction groupings in order to·arrive at·some desired characteristic. Say you want a random number generator with a certain distribution, or you need something to approximate a root only to a certain quality, or you want some fuzzy filter, etc. It would be like some kind of chemistry set for assembly coding. You could use it to synthesize odd little 5-instruction sequences or maybe even whole systems, depending on your behavioral criteria. I bet it would come up with stuff that would completely defy analysis, but (apparently) work fine. Maybe all that is practical is something that can deal with up to 8-instruction sequences, or so, but even that would be educational to play with. It could teach us some things.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Phil Pilgrim (PhiPi) · 2006-10-11 09:01

Chip,

I remember reading about some guys in England who did that with an FPGA. They "trained" it, using a genetic algorithm, to act as a frequency discriminator. It worked brilliantly, and with a minimum of gates, even though the resulting gate assemblage made no intutive sense.

The "multiply decomposition" problem, and others like it, would be an interesting arena for trying a GA, since the "alleles" could be limited to a small subset of the available instructions, i.e. shifts, adds, subtracts, Booleans, etc. Ideally, the search space would be representated in such a way that small structural changes (resulting from crossover and mutation, say) would entail small behaviorial changes. This is almost never the case with digital computer instructions, though, where changing even a single bit can result in wildly different behavior. But, at least, eliminating the worst offenders (e.g. jumps and calls) can help to regularize the search domain and "smooth" the resulting utility (cost) function.

-Phil

James Long · 2006-10-11 14:41

I hate to break in here with pure ignorance........I don't remember CORDIC in any of my past math.....although that was a long time ago.

What is CORDIC used for? A simple explaination would be great.·I'm just try to get this in the soft fleshy part contained in the skull.

James L

Phillip Y. · 2006-10-11 15:08

I remember an article in the HP journal about the new HP35 calculator and that used CORDIC functions,
the function for TAN did not need to be scaled because the factor canceled out or reduced to below the 10 digit resolution,
instead of calculating each binary digit it was done in decimal one digit at a time.

I also reviewed an article buy someone who built a prototype CORDIC IC for the DOD (wo a multiplier),
the author mentioned that he used a trick he called "servo feedback" during the rotation calculations
to eliminate the error/scale factor, I think this amounted to combining/doing both calculations simultaneously.

The articles are buried in my archives.

Mike Green · 2006-10-11 15:10

I can't speak about earlier uses (probably military), but HP used CORDIC algorithms in their ground-breaking programmable calculators.
These were serial BCD shift register processors with maybe 16 digits in a register, could add, subtract, and compare, had maybe 16 to 32
words of memory (16 digits each) originally with enough masked ROM for the algorithms. The CORDIC algorithms allowed them to provide
all of the trig and log functions to high precision.

Tracy Allen · 2006-10-11 15:18

James,

If you want a good introduction to CORDIC, download the classic article from Ray Andraka's site. www.andraka.com/cordic.htm.

I doubt if you learned it in school, because it is not usually taught, which to me seems very strange.

It is the basis for calculation of the trig and transcentental functions and inverses in practically every hand calculator. It is a very fast O(n) successive approximation algorithm that is well suited to small integer processors or FPGAs. It has its origins in navigation computers for avionics, where the trig functions and inverses have to be computed in real time. It has exotic applications in signal processing and filtering. I've heard speculation that mathematicians at the NSA have figured out ways to break encryption codes using CORDIC methods, but that would be a stretch.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Graham Stabler · 2006-10-11 15:43

James, in case the above reference is a bit heavy for you CORDIC is used to work out Sin, cos, arcsin, arccos, tan, arctan and logs as well. It does this very quickly using a very small look up table and needs only bit shifts and additions.

The above algorithm takes an X and y co-ordinate and provides an angle accurate to about .0004 degrees in only 20 iterations and it doesn't have to do much in each iteration.

Gaham

Tracy Allen · 2006-10-11 15:48

It's always worthwhile to study Chip's voodoo code, to see how he exorcises the goblins.

Here is a reinterpretation, a graphic decomposition of the cordic gain he posted yesterday.

attachment.php?attachmentid=43577

The critical observation is that the factor 17=%10001 is repeated. (Phil and Chip, I'd be very interested too, to see how this process of decomposition, to discover those repeating factors, could be automated!?)

Chips' algorithm effectively applies parentheses around the groupings as in the bottom row, so it computes x * 17/(2^5), and then uses that grouping "a" to compute super grouping "b", and then uses supergrouping "b" another time, and finally adds in the orphan at the end. Here is Chip's pasm code to do that. Due to the folding, it takes 12 instructions and 12 clock cycles to multiply x times the prescaler.

                sar     x,#1            'get %0.10001 of x
                mov     t1,x            '(      1    )
                sar     t1,#4           '(+         1)
                add     x,t1

                mov     t1,x            'get %0.10011001 of x
                sar     t1,#3           '(         10001)
                add     t1,x            '(+     10001   )

                mov     t2,t1           'get %0.10011011011001 of x
                sar     t2,#6           '(            10011001)
                add     t2,t1           '(+     10011001      )

                sar     x,#11           'get %0.1001101101110101 of x
                add     x,t2            '(                 10001)
                                        '(+     10011011011001  )

Just to see how it would work out, I tried it with the parentheses and grouping as in the second line. That computes x*17/(2^5) as grouping a, but then repeatedly shifts and adds by 2^3, and then finally by 2^2 at the end. It uses only 11 instructions, but because of the loop, it takes 18 clock cycles instead of Chip's 12. (caveat, I'm not sure if my untried code is corrrect, and I'd appreciate comment if anyone spots an error.)

                sar     x,#1            'get %0.10001 of x
                mov     t1,x            '(      1    )
                sar     t1,#4           '(+         1)
                add     x,t1

                mov     t1,x            ' x* 17/(2^5)
                mov   t2,#3            ' going to repeat 3 times

loop           sar     x,#3           ' add factor, x*17/(2^n), for n=8,11,14
                add    x,t1
                djnz    t2,#loop     ' one clock cycle for loop back, 2 for fall through

                sar    x,#2
                add     t1,x            ' add factor 17/(2^16)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Post Edited (Tracy Allen) : 10/11/2006 3:53:42 PM GMT

cgracey · 2006-10-11 16:39

James Long said...

What is CORDIC used for? A simple explaination would be great.·I'm just try to get this in the soft fleshy part contained in the skull.

I remember reading that the CORDIC algorithm was developed in the 1950's to serve as·a practical in-flight navigation "computer".·The acronym stands for Coordinate Rotation DIgital Computer.
It makes me wonder how many other such ingenious algorithms may be possible, but haven't been discovered.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Harley · 2006-10-11 17:38

CORDIC! Hadn't seen that term for decades.

Back in the mid-to-late '60s (yes, way past 'sell by date'), I worked on a military project for Missile Battery equipment. CORDIC was used by the discrete F/Fs and gates computer (BC = before computers or chips). Somewhere I may still have a document describing CORDIC. Seemed complex. Was used to correct for the non-flat earth when sites are distant from each other; their reference planes were tilted. Target data had to remove that 'aiming' error.

One of those 'been there, seen that' situations. Good to know CORDIC is still useful.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
h.a.s. designn

Bean · 2006-10-11 18:48

Here is a document that does a decent job of explaining CORDIC atan/hyp algorithm.

http://www.gmw.com/magnetic_measurements/Sentron/sensors/documents/AN_125KIT_manual.pdf#search=%22an125_kit%20cordic%22

Bean.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheap used 4-digit LED display with driver IC·www.hc4led.com

Low power SD Data Logger www.sddatalogger.com
SX-Video Display Modules www.sxvm.com

Don't mistake experience for intelligence. And vis-vera.

Post Edited (Bean (Hitt Consulting)) : 10/11/2006 6:51:54 PM GMT

Graham Stabler · 2006-10-11 19:30

Race you!

Cordic FFT

Tracy Allen · 2006-10-11 20:12

This is one of those of those topics that needs to be revisited over and over again and from different approaches. Thank you for those references.

Here are some more:

www.andraka.com/cordic.htm
Ray Andraka Consulting
By far the most concise yet thorough discussion of the subject I know of. Download the pdf file, "Survey of CORDIC Algorithms for FPGAs".

www.dspguru.com/info/faqs/cordic2.htm
Grant R. Griffin dspGURU.
An informative FAQ, with additional links to the online and printed literature.

www.ee.ualberta.ca/courses/ee401/microboard/cordic_CCink.html
Ingo Cyliax
CORDIC swiss army knife for computing math functions.. An article from Circuit Cellar INK. It even includes a program for calculating sine and cosine on the BASIC Stamp II. A different
approach from the one taken here.

www.ddj.com/documents/s=1070/ddj9010f/9010f.htm
Pitts Jarvis
IMPLEMENTING CORDIC ALGORITHMS A single compact routine for computing transcendental functions. A tutorial from Dr. Dobbs Journal, October 1990.

people.csail.mit.edu/hqm/imode/fplib/cordic_code.html
www.math.utep.edu/Faculty/helmut/presentations/1997/cordic/wcordic.html
Helmut Knaust
How Do Calculators Calculate? A tutorial from University of Texas.

Volder, J.E., 1959, The CORDIC Trigonometric Computing Technique,
IRE Transactions on Electronic Computers, V. EC-8, No. 3, pp. 330-334
seminal paper

www.math.ucl.ac.be/%7Emagnus/num1a/cordic.txt
Vladimir BaykovInteresting newsgroup posting by Vladimir BAYKOV

www.cnmat.berkeley.edu/%7Enorbert/cordic/node3.html
Norbert Lindlbauer, Berkeley Center for New MusicDiscussion of CORDIC architectures with a view toward music synthesizer

www.emesys.com/BS2mathC.htm
Some tutorial on theory and code for sin/cos and for atn/hyp on the BASIC Stamp.
http://forums.parallax.com/showthread.php?p=563729
A discussion with Peter Verkaik about prescaling and about cordic for the Javalin Stamp.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Graham Stabler · 2006-10-11 20:36

Tracy, I should have said earlier how much help your site was to me and I certainly have not finished reading!

I liked your spreadsheet showing the convergence of k so I went a step further and have produced a spreadsheet that will calculate SIN and COS of theta using CORDIC, I didn't do it out of philanthropy but because my damn program keeps cocking it up and I want to know when and why!

Find attached, the 360=>32bit column computes the decimal number the angle will appear if I output it to the TV (for debug), I need a similar thing for Xi and Yi possibly.

Graham