CORDIC vs BKM

Mark Swann · 2007-10-08 17:03

All,

I have seen at least one thread discussing, in depth, the use of CORDIC algorithms in the the Propeller. I have not used CORDIC algorithms, but do have a need, so I did some research and discovered·a newer, and supposedly better, BKM algorithm. Has anyone explored this algorithm? How would its implementation in the Propeller be different? Is it better? Any thoughts are appreciated.

Thanks

Graham Stabler · 2007-10-08 17:20

Until I get back to work I can't download any of the juicy pdfs on the subject but it looks interesting. I suspect the implementation would be very similar, a loop and lookup table and some sort of condition that you assess on each loop. Cordic is actually really simple in terms of the programming, its the understanding and dealing with "simpler" issues like angle representation that can be tricky.

Graham

Graham Stabler · 2007-10-08 17:21

This is free to download: perso.ens-lyon.fr/jean-michel.muller/BKM94.pdf

Mark Swann · 2007-10-08 17:45

Graham Stabler said...
Until I get back to work I can't download any of the juicy pdfs on the subject but it looks interesting. I suspect the implementation would be very similar, a loop and lookup table and some sort of condition that you assess on each loop. Cordic is actually really simple in terms of the programming, its the understanding and dealing with "simpler" issues like angle representation that can be tricky.

Graham

I did find this bit on Wikipedia:

BKM provides a simpler method of computing some elementary functions,
and unlike CORDIC, BKM needs no result scaling factor. The convergence
rate of BKM is approximately one bit per iteration, like CORDIC, but
BKM requires more precomputed table elements for the same precision
because the table stores logarithms of complex operands.

It claims to be simpler, but needs more table space for logarithms. The Prop has a log table. Maybe BKM can use that, instead of its own table.

Is the lack of need for a result scaling factor really an issue?

Graham Stabler · 2007-10-08 18:00

I doubt you could use the inbuilt table sucessfully as they will be quite special values and if you start interpolating things will get slower and you might as well not bother. The CORDIC table is one value for each interation and you get one bit of accuracy per iteration so even at max resolution that's only 32 values so as long as it doesn't need masses of values I suspect there will be no problem.

As far as scaling goes, in cordic you sometimes need to scale and other times not, generally if you care about the length of the vector produced at the end you do need to scale. Chip wrote a neat optimized multiplication routine to do this but it still has to be done and takes time.

I'm going to have a read of that paper, it looks scary so wish me luck.

Graham

cgracey · 2007-10-09 07:39

Graham,

I looked at the .pdf, but it overwhelmed me pretty quickly.·Paul Baker stands a much better chance of figuring it out,·and so·do·you. The big question I have is: How big of lookup tables are required? This is the make/break issue, I think.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Mark Swann · 2007-10-09 15:14

Chip Gracey (Parallax) said...

Graham,

I looked at the .pdf, but it overwhelmed me pretty quickly.·Paul Baker stands a much better chance of figuring it out,·and so·do·you. The big question I have is: How big of lookup tables are required? This is the make/break issue, I think.

Chip,

Quoted from the PDF: "CORDIC BKM needs the storage of (9*p)/2 constants, while CORDIC
needs the storage of p constant."

Mark

Tracy Allen · 2007-10-09 16:01

But I got stuck immediately on terminology. For example, what is a "redundant number" as opposed to one that is not? The BKM is "more suitable for computations in a redundant number system than the CORDIC...". It has to do with an understanding of subtle aspects of number systems. What is the difference between a "classical or signed-digit number system" and a "carry-save representation"? There is a lot of that.

The algorithm itself based on generalization of the series log(1 + 1/(2^k)). That is the basis of some real valued shift-add algorithms for log and power functions, for example, he references the Briggs algorithm, which is described in more detail in chapter 20 of Arndt's book. But in this case, the decision variable is a complex number chosen from the set, d:{1, 0, -1, -i, i, i-1, i+1, -1-1, -i+1) and the values of log(1 + d/(2^k)) and the other functions of d are thus also complex valued. That gets pretty thick, pretty fast!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Graham Stabler · 2007-10-09 16:43

Well it seems that an example of a redundant number system is signed digit, this allows each bit that makes up a number to be 0, 1 or -1 and that allows carry free high speed addition in hardware.

http://en.wikipedia.org/wiki/Signed-digit_representation

When you come to do the addition in Cordic you need to find the sign of y quite often and this means finding the sign of the highest non zero bit in the number rather than just looking at the msb as you would with normal 2's complement. This searching the number would remove the advantage of the carry free addition.

The BKM algorithm seems to be to get around that problem and as we are not using signed digit representation there is no advantage as far as I can see.

Graham

Graham Stabler · 2007-10-09 16:53

The question of scaling factors also seems to relate to the special number system because in Cordic the scaling factor is not a constant using that system and so you could not create optimized scaling hardware as easily.

cgracey · 2015-04-20 22:50

I was looking into this BKM algorithm today and there seems to be nothing new on the internet about it. Hence, this old thread popped up near the top of the search results. If there was some implementation to study, it might be usable, but I don't see any.

I'm onto the pipelined hub CORDIC implementation now for Prop2 and I've been working on some K-factor correction, which makes it really easy to use. This BKM algorithm looks neat, but I think I'll stick with the CORDIC for now, because I understand it.

To get K correction, I'll reduce the X and Y components by some right-shifted amounted within 16 of the 32 stages. I made a little SmallBASIC program to find the shift values:

k = 0.6072529350088812561694
j = 1.0
t = 0
for x = 0 to 39
  jj = j / pow(2,x)
  if j - jj > k then
      j = j - jj
      t++
      print x,1,j
  else
      print x,0,j
  endif
next
print
print "total",t

Here's the result:

0	0	1	
1	0	1	
2	1	0.75	
3	1	0.65625	
4	1	0.615234375	
5	0	0.615234375	
6	0	0.615234375	
7	1	0.610427856	
8	1	0.608043373	
9	0	0.608043373	
10	1	0.60744958	
11	0	0.60744958	
12	1	0.607301277	
13	0	0.607301277	
14	1	0.60726421	
15	0	0.60726421	
16	1	0.607254944	
17	0	0.607254944	
18	0	0.607254944	
19	1	0.607253786	
20	1	0.607253207	
21	0	0.607253207	
22	1	0.607253062	
23	1	0.60725299	
24	1	0.607252954	
25	1	0.607252935	
26	0	0.607252935	
27	0	0.607252935	
28	0	0.607252935	
29	0	0.607252935	
30	0	0.607252935	
31	1	0.607252935	
32	1	0.607252935	
33	0	0.607252935	
34	1	0.607252935	
35	0	0.607252935	
36	1	0.607252935	
37	0	0.607252935	
38	0	0.607252935	
39	0	0.607252935
	
total	19

I'll make the last correction at #30 and ignore the ones below. That gets the error almost below 10 digits. It's really nice to get CORDIC results without X and Y being amplified by 1/~0.607252935!

Mark_T · 2015-04-21 06:57

Graham Stabler wrote: »

Well it seems that an example of a redundant number system is signed digit, this allows each bit that makes up a number to be 0, 1 or -1 and that allows carry free high speed addition in hardware.

Graham

Internally I think all fast processors these days use this or more complex schemes to reduce the penalty for carry -
the hardware can be a long way from the abstract programming model in order to eek out a bit more speed
and power efficiency. Carry is a real show-stopper for 64 bit or double float multiply if not addressed. The other
approach is pipelining as in a vector processor.

Kye · 2015-04-21 07:36

Design Compiler does all arithmetic using carry save logic. Basically, repeated mathematical operations are not done in binary. They are converted to binary and the carry is propagated at the very end of computation.

Tracy Allen · 2015-04-21 09:31

There's the rub. It needs a fast mechanism to convert from ordinary binary to the redundant representation and back again. Does that make sense for a single mathematical operation, say to compute one sin(x)?

CORDIC vs BKM

Comments