Dog-leg hypotenuse approximation

lonesock · 2013-04-22 09:06

For reasons that may never become clear, I needed a fast approximation to the hypot function, calculating the length of the hypotenuse given the length of the other two sides of a right triangle. Here were my requirements:

works with integers larger than 16-bit
doable in Propeller Assembly
FAST
reasonably accurate

I had heard of an ingenious function a long time ago: sort the lengths of the sides a and b such that:

hi = max( a, b )
lo = min( a, b )
hypot ~= hi + lo/2

(Note that I am using 'max' and 'min' in the C sense, not the Propeller sense.) This is extremely fast, and not too horrible. To test this out, I made a spreadsheet with a graph, a RMSE calculation, and a worst error calculation. I normalized the maximum value to 1.0, and ran the minimum value from 0 up to 1. I'm attaching the spreadsheet.

I found that the simple hi+lo/2 approximation had a worst case error of ~11.8%, and the estimate was always high. By changing the multiplier on the lo term to 0.33 you could minimize the RMSE of the approximation. If your processor has integer division, hi + lo/3 is your best bet for this (max error ~5.7%). Since I am doing this in PASM, I found 3/8 to be the best approximation for me (3 adds and a shift @ ~6.8% max error). Note that you can reduce the RMSE further by allowing the gain of the hi term to be < 1.0 (hypot ~= 0.936*hi + 0.426*lo), but I have some reasons for not doing that and the worst error is still 6.4%.

I was thinking given the shape of the curve, a piece-wise linear function would do a better job than the straight linear function...something with a dog-leg in it (bent somewhere in the middle). It occurred to me that the function max( 0, N*lo - hi) should work. E.g., if N==2, when lo < hi/2, the term will be clamped to z, and when above hi/2 it will be linear. So the form of the new equation is:

hypot ~= hi + Gain_lo * lo + Gain_dogleg * max( 0, N * lo - hi )

For N = 2, the best (integer operation) gains turned out to be = 1/5. So if you have integer division you could say:

hypot ~= hi + (lo + max( 0, lo+lo-hi ) ) / 5
(max error is ~2.0%)

However, the best solution I found with just shifts and adds was this:

hypot ~= hi + (lo + max( 0, lo+lo+lo-hi ) ) / 8

The max error is ~2.8%, and it only takes 11 PASM instructions, including the 3 to sort into hi and lo.

hypot         ' fast hypotenuse approximation
              ' first sort into max and min
              mov       vRes, v1
              min       v1, v2
              max       v2, vRes
              ' now that v1 = max, and v2 = min
              ' Dog-Leg approximation N=3, 2.8% error
              ' do the dog-leg term (1/3 of the way in)
              mov       vRes, v2
              add       vRes, v2
              add       vRes, v2
              sub       vRes, v1
              mins      vRes, #0               ' clamp the dog-leg term
              add       vRes, v2                ' add in the linear term
              shr       vRes, #3                ' scale by 1/8th
              add       vRes, v1                ' add in the max term

hypot_ret     ret

I welcome any optimizations or comments, and hopefully this is useful to someone.

thanks,
Jonathan

lonesock · 2013-04-22 11:43

Here is a version with a double-dog-leg, one at 1/4 and another at 1/2:

hypot ~= hi + 3*lo/32 + max(0,2*lo-hi)/8 + max(0,4*lo-hi)/16
(worst error is ~0.77%)

Here's the shortest PASM code I could find so far (18 instructions), and it requires extra scratch variables, so I really appreciate any possible improvements!

hypot         ' fast hypotenuse approximation
              ' first sort into max and min
              mov       vRes, v1
              min       v1, v2
              max       v2, vRes
              ' now that v1 = max, and v2 = min
              ' Double-Dog-Leg approximation, 0.8% error
              ' hypot ~= hi + 3*lo/32 + max(0,2*lo-hi)/8 + max(0,4*lo-hi)/16
              ' alternatively
              ' hypot ~= hi + 3*lo/32 + (max(0,2*lo-hi) + max(0,2*lo-hi/2) )/8
              mov       vRes, v1
              shr       v1, #1
              mov       tmp, v2
              add       tmp, v2
              add       v2, tmp
              shr       v2, #5
              add       vRes, v2
              sub       tmp, v1
              mins      tmp, #0
              mov       tmp1, tmp
              sub       tmp, v1
              mins      tmp, #0
              add       tmp1, tmp
              shr       tmp1, #3
              add       vRes, tmp1

hypot_ret     ret

thanks,
Jonathan

Mark_T · 2013-04-22 12:36

lonesock wrote: »

For reasons that may never become clear, I needed a fast approximation to the hypot function, calculating the length of the hypotenuse given the length of the other two sides of a right triangle. Here were my requirements:

Oh I am intrigued as to the reason - though I can't see why a general table-lookup linear interpolation won't serve your purpose more flexibly.

One reason I've done this is using integer arithmetic to calculate CNC stepper drive signals - the hypotenuse is needed to drive the speed feedback
DDA - am I close?

lonesock · 2013-04-22 14:49

I can neither confirm nor deny the exactitude with which your answer approaches reality. [8^)

RAM is precious, I don't think I could fit a table in right now. How would you do a table implementation without at least one scaling division? This only costs me 72 clocks atm, and of course I'd love to reduce that. [8^)

Jonathan

Rayman · 2013-04-22 17:20

Is it similar to this one?:
http://danielnouri.org/docs/SuperColliderHelp/BinaryOps/hypotApx.html

lonesock · 2013-04-22 20:19

Rayman, yes, the form there is identical to the first linear equation, simply matching the distance at min=0 case and the distance at min=max case.

Note that it's a tough gain to approximate with integer math, and of course the worst case error (~8.2%) and the RMSE are both worse than the 3/8 gain linear form. The dogleg form is different and superior. The double-dogleg form is even more so. [8^)

EDIT: I take that back...either that equation as presented has some typos, or their approximation is much worse (15.9%!). I think I charted what they *meant* to do.

Jonathan

Tracy Allen · 2013-04-22 23:32

I googled "approximation to hypontenuse", and it came up with this reputedly ancient Tamil poem.
Divide the distance you've run (longer side) into 8 equal parts and discard one part from it. Then if you add the half the height to that, you'll get Hypotenuse.

That is like your first approximation,
hypot ~= hi + (lo + max( 0, lo+lo+lo-hi ) ) / 8
in the region where the max(0, lo+lo+lo-hi ) is positive. There it reduces to
hypot ~= hi + 4*lo/8 - hi/8 = 7*hi/8 + lo/2.

Your dog-leg condition max(.,.) is much better though--You have indeed surpassed the ancients!

I plotted the values with and without the dogleg along with the true sqrt(x^2+y^2). In this rendition the total length is constrained to x+y=1.

Screen shot 2013-04-22 at 11.26.33 PM.png

Heater. · 2013-04-22 23:54

Tamil poetry is so romantic don't you think?

Looking at the graph it seems we need to do half of the dog leg at one end of the scale and replace the other half of the dog leg with tamil poetry where it fits very well.

Cluso99 · 2013-04-23 00:47

This is quite fascinating. I am always calculating hypotenuse when laying out pcbs, but the good old spreadsheet is great for this.

Heater. · 2013-04-23 01:49

Cluso,

I'm curious, what's the reason for all the hypontenuse working in PCB layout?

Cluso99 · 2013-04-23 04:14

I often adjust the via dia and placement to fit into really small areas. Also, placement of components, etc. I always manually place and route and I need to ensure there is enough spacing around components to be able to place them. It's just my way - I have done it for too many years to change and I rarely have pcb revisions, so something is working

Heater. · 2013-04-23 05:02

Sounds good to me.

Back in the day I worked on a PCB design package, CADSTAR. It had auto-placement and I seem to remember it got better results when it used the Manhattan distance between connected points rather than the "as the crow flies" distance. Hence my curiosity.

To this day I have never come across a PCB designer that said he used any kind of auto-place so I often wondered why we bothered:)

Tracy Allen · 2013-04-23 08:18

I can't vouch for the poem, or the translation, but it does look far more romantic in the original Tamil script:

Screen shot 2013-04-23 at 8.02.37 AM.png

PS According to wiki, the Tamil for hypotenuse is

Screen shot 2013-04-23 at 8.22.11 AM.png

lonesock · 2013-04-23 08:30

Still having trouble finding the Tamil for "dogleg*". [8^)

Jonathan

* Or, for you Keanu Reeves fans, "Dogstar".

lonesock · 2013-04-23 09:58

Quick update on the double-dogleg PASM code. Here is a 16-instruction version (not counting the call and ret):

hypot         ' fast hypotenuse approximation
              ' first sort into max and min
              mov       vRes, v1
              min       v1, v2
              max       v2, vRes
              ' now v1 = max, and v2 = min

              ' Double-Dog-Leg optimized approximation, 0.8% error
              ' hypot ~= hi + 3*lo/32 + max(0,2*lo-hi)/8 + max(0,4*lo-hi)/16
              ' rewritten as
              ' hypot ~= 3*lo/32 + hi + max(0,lo/4-hi/16) + max(0,lo/4-hi/8)
              ' gain_lo
              mov       vRes, v2
              add       vRes, v2
              add       vRes, v2
              shr       vRes, #5
              ' gain_hi
              add       vRes, v1
              ' scale the sides for the dogleg calcs
              shr       v1, #4
              shr       v2, #2
              ' 1/4 dogleg
              sub       v2, v1
              mins      v2, #0
              add       vRes, v2
              ' 1/2 dogleg
              sub       v2, v1
              mins      v2, #0
              add       vRes, v2

hypot_ret     ret

Make sure you either use absolute values (or intentionally use 32-bit unsigned values, though the '3 * lo' term may overflow).

Jonathan

skrasms · 2013-09-07 18:40

I created a forum account just to say: Thank you for posting this!

Ive been working on a light-weight FPGA graphics engine for a small screen, and I needed a distance approximation to be cheap and fast in hardware. The long + (short>>1) approximation is all anyone seems to reference, but it isnt accurate enough to use to draw circles (for example).

I started down the path of a LUT approach, but I couldnt afford it because it would take a division to normalize. I needed to avoid multiplication if at all possible, as well. Your 0.77% version gets around all that and manages to look good in my applications. Its light enough to calculate a distance result on each clock cycle. Basically, that approach is exactly what I needed.

The whole thing strikes me as really clever.

cgracey · 2013-09-08 21:16

I remember hearing someone say back in 1985 that a quick hypotenuse could be calculated by:

longerside + shorterside*3/8

I believe the max error is 5.3%. The 3/8 is some right-shifts and adds.

Peter Jakacki · 2013-09-09 00:58

cgracey wrote: »

I remember hearing someone say back in 1985 that a quick hypotenuse could be calculated by:

longerside + shorterside*3/8

I believe the max error is 5.3%. The 3/8 is some right-shifts and adds.

I thought I'd try that out in Tachyon and passed it through that method and your normal calculation and had it report some results (just using trimmed values from memory as "random" data)

[FONT=courier new][COLOR=#000000]pub [/COLOR][COLOR=#000000][B]HYPOT[/B][/COLOR][COLOR=#000000] ( A B -- hypot )[/COLOR][/FONT]
[FONT=courier new][COLOR=#000000]     2DUP < IF SWAP THEN[/COLOR][/FONT]
[FONT=courier new][COLOR=#000000]     DUP DUP + + 3 SHR +    \ *3/8[/COLOR][/FONT]
[FONT=courier new][COLOR=#000000]     ; [/COLOR][/FONT]

[COLOR=#9900ff][FONT=Ubuntu Mono]pub HDEMO ( addr -- )[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]    $40 DECIMAL[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]      ADO I W@ $7FFF AND I 2 + W@ $7FFF AND[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]      CR OVER .DEC TAB DUP .DEC [/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]       TAB ." APPROX HYPOT=" 2DUP HYPOT DUP 0 REG ! .DEC TAB[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]       ." HYPOT = " ^2 SWAP ^2 + SQR DUP .DEC[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]      DUP 0 REG @ - ABS #10000 * SWAP /                 \ calulate % error scaled to 2 decimal places
      TAB ." ERROR=" <# # # "." HOLD #S #> PRINT$ ." %"[/FONT][/COLOR]

[COLOR=#9900ff][FONT=Ubuntu Mono]    4 +LOOP[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]    ;[/FONT][/COLOR]

[FONT=courier new]
[/FONT]

Here's some output

[FONT=courier new]$FC00 HDEMO 
19747   17052   APPROX HYPOT=26141      HYPOT = 26091   ERROR=0.19%
22945   0785    APPROX HYPOT=23239      HYPOT = 22958   ERROR=1.22%
20673   8200    APPROX HYPOT=23748      HYPOT = 22240   ERROR=6.78%
18753   2608    APPROX HYPOT=19731      HYPOT = 18933   ERROR=4.21%
26828   8996    APPROX HYPOT=30201      HYPOT = 28296   ERROR=6.73%
19737   0308    APPROX HYPOT=19852      HYPOT = 19739   ERROR=0.57%
8972    13140   APPROX HYPOT=16504      HYPOT = 15911   ERROR=3.72%
28425   3864    APPROX HYPOT=29874      HYPOT = 28686   ERROR=4.14%
19243   18068   APPROX HYPOT=26018      HYPOT = 26396   ERROR=1.43%
19721   0820    APPROX HYPOT=20028      HYPOT = 19738   ERROR=1.46%
18753   1592    APPROX HYPOT=19350      HYPOT = 18820   ERROR=2.81%
3848    1040    APPROX HYPOT=4238       HYPOT = 3986    ERROR=6.32%
0001    0256    APPROX HYPOT=0256       HYPOT = 0256    ERROR=0.00%
18689   1556    APPROX HYPOT=19272      HYPOT = 18754   ERROR=2.76%
19857   0036    APPROX HYPOT=19870      HYPOT = 19857   ERROR=0.06%
19345   8994    APPROX HYPOT=22717      HYPOT = 21334   ERROR=6.48% ok[/FONT][FONT=courier new]
[/FONT]

EDIT: added % error function to demo and output

lonesock · 2013-09-09 07:45

Thanks, skrasms!

Yeah, 6.8% matches the error I reported using 3/8 in the first post...dividing by 3 was better (~5.7%), but of course required a division.

Jonathan

lonesock · 2018-01-12 19:36

I was reviewing this code and realized I had not taken advantage of the propeller's flags...the code no longer needs the 'mins' commands. Here is the double-dogleg version with the 2 instructions removed:

DDLhypot      '' Double Dogleg Hypotenuse approximation
              ' hypot ~= 3*lo/32 + hi + max(0,4*lo-hi)/16 + max(0,2*lo-hi)/8
              '  ** refactored as **
              ' hypot ~= (lo/4)*3/8 + hi + max(0,lo/4-hi/16) + max(0,lo/4-hi/8)
               
              ' make sure v1 >= v2
              mov       vRes, v1
              min       v1, v2
              max       v2, vRes
              ' Hypot = (lo/4)*3/8
              shr       v2, #2 ' pre-dividing makes sure the *3 will not overflow
              mov       vRes, v2
              add       vRes, v2
              add       vRes, v2
              shr       vRes, #3
              ' hypot += hi
              add       vRes, v1
              ' hypot += max(0,lo/4-hi/16)
              shr       v1, #4 '
              sub       v2, v1    WC
if_NC         add       vRes, v2
              ' hypot += max(0,lo/4-hi/16-hi/16)
if_NC         sub       v2, v1    WC
if_NC         add       vRes, v2
              
DDLhypot_ret ret

I also pre-divided the 'lo' term so the 3*lo/32 can not possibly overflow. The function should work as long as the input and output values all fit inside 32-bit unsigned integers. This version weighs in at 14 instructions (not counting call and return) and has the same 0.8% max error (more if you use tiny integers, since it's an integer approximation, for example the hypotenuse of a 1x1 right triangle would yield 1).

thanks,
Jonathan

SaucySoliton · 2018-01-13 07:52

Thanks for sharing!

This could also be used for AM demodulation in software defined radios. In this context it's usually called magnitude or (complex) absolute value. And in other contexts, L2 norm.

Dog-leg hypotenuse approximation

Comments