#### Equip your Genius

Welcome to the Parallax Discussion Forums, sign-up to participate.

# Dog-leg hypotenuse approximation

Posts: 882
edited September 2013
For reasons that may never become clear, I needed a fast approximation to the hypot function, calculating the length of the hypotenuse given the length of the other two sides of a right triangle. Here were my requirements:
• works with integers larger than 16-bit
• doable in Propeller Assembly
• FAST
• reasonably accurate
I had heard of an ingenious function a long time ago: sort the lengths of the sides a and b such that:

hi = max( a, b )
lo = min( a, b )
hypot ~= hi + lo/2

(Note that I am using 'max' and 'min' in the C sense, not the Propeller sense.) This is extremely fast, and not too horrible. To test this out, I made a spreadsheet with a graph, a RMSE calculation, and a worst error calculation. I normalized the maximum value to 1.0, and ran the minimum value from 0 up to 1. I'm attaching the spreadsheet.

I found that the simple hi+lo/2 approximation had a worst case error of ~11.8%, and the estimate was always high. By changing the multiplier on the lo term to 0.33 you could minimize the RMSE of the approximation. If your processor has integer division, hi + lo/3 is your best bet for this (max error ~5.7%). Since I am doing this in PASM, I found 3/8 to be the best approximation for me (3 adds and a shift @ ~6.8% max error). Note that you can reduce the RMSE further by allowing the gain of the hi term to be < 1.0 (hypot ~= 0.936*hi + 0.426*lo), but I have some reasons for not doing that and the worst error is still 6.4%.

I was thinking given the shape of the curve, a piece-wise linear function would do a better job than the straight linear function...something with a dog-leg in it (bent somewhere in the middle). It occurred to me that the function max( 0, N*lo - hi) should work. E.g., if N==2, when lo < hi/2, the term will be clamped to z, and when above hi/2 it will be linear. So the form of the new equation is:

hypot ~= hi + Gain_lo * lo + Gain_dogleg * max( 0, N * lo - hi )

For N = 2, the best (integer operation) gains turned out to be = 1/5. So if you have integer division you could say:

hypot ~= hi + (lo + max( 0, lo+lo-hi ) ) / 5
(max error is ~2.0%)

However, the best solution I found with just shifts and adds was this:

hypot ~= hi + (lo + max( 0, lo+lo+lo-hi ) ) / 8

The max error is ~2.8%, and it only takes 11 PASM instructions, including the 3 to sort into hi and lo.
```hypot         ' fast hypotenuse approximation
' first sort into max and min
mov       vRes, v1
min       v1, v2
max       v2, vRes
' now that v1 = max, and v2 = min
' Dog-Leg approximation N=3, 2.8% error
' do the dog-leg term (1/3 of the way in)
mov       vRes, v2
sub       vRes, v1
mins      vRes, #0               ' clamp the dog-leg term
add       vRes, v2                ' add in the linear term
shr       vRes, #3                ' scale by 1/8th
add       vRes, v1                ' add in the max term

hypot_ret     ret
```
I welcome any optimizations or comments, and hopefully this is useful to someone.

thanks,
Jonathan
Free time status: see my avatar [8^)
F32 - fast & concise floating point: OBEX, Thread
Unrelated to the prop: KISSlicer

• Posts: 882
edited April 2013
Here is a version with a double-dog-leg, one at 1/4 and another at 1/2:

hypot ~= hi + 3*lo/32 + max(0,2*lo-hi)/8 + max(0,4*lo-hi)/16
(worst error is ~0.77%)

Here's the shortest PASM code I could find so far (18 instructions), and it requires extra scratch variables, so I really appreciate any possible improvements!
```hypot         ' fast hypotenuse approximation
' first sort into max and min
mov       vRes, v1
min       v1, v2
max       v2, vRes
' now that v1 = max, and v2 = min
' Double-Dog-Leg approximation, 0.8% error
' hypot ~= hi + 3*lo/32 + max(0,2*lo-hi)/8 + max(0,4*lo-hi)/16
' alternatively
' hypot ~= hi + 3*lo/32 + (max(0,2*lo-hi) + max(0,2*lo-hi/2) )/8
mov       vRes, v1
shr       v1, #1
mov       tmp, v2
shr       v2, #5
sub       tmp, v1
mins      tmp, #0
mov       tmp1, tmp
sub       tmp, v1
mins      tmp, #0
shr       tmp1, #3

hypot_ret     ret
```
thanks,
Jonathan
Free time status: see my avatar [8^)
F32 - fast & concise floating point: OBEX, Thread
Unrelated to the prop: KISSlicer
• Posts: 1,637
edited April 2013
lonesock wrote: »
For reasons that may never become clear, I needed a fast approximation to the hypot function, calculating the length of the hypotenuse given the length of the other two sides of a right triangle. Here were my requirements:

Oh I am intrigued as to the reason - though I can't see why a general table-lookup linear interpolation won't serve your purpose more flexibly.

One reason I've done this is using integer arithmetic to calculate CNC stepper drive signals - the hypotenuse is needed to drive the speed feedback
DDA - am I close?
• Posts: 882
edited April 2013
I can neither confirm nor deny the exactitude with which your answer approaches reality. [8^)

RAM is precious, I don't think I could fit a table in right now. How would you do a table implementation without at least one scaling division? This only costs me 72 clocks atm, and of course I'd love to reduce that. [8^)

Jonathan
Free time status: see my avatar [8^)
F32 - fast & concise floating point: OBEX, Thread
Unrelated to the prop: KISSlicer
• Posts: 8,373
edited April 2013
• Posts: 882
edited April 2013
Rayman, yes, the form there is identical to the first linear equation, simply matching the distance at min=0 case and the distance at min=max case.

Note that it's a tough gain to approximate with integer math, and of course the worst case error (~8.2%) and the RMSE are both worse than the 3/8 gain linear form. The dogleg form is different and superior. The double-dogleg form is even more so. [8^)

EDIT: I take that back...either that equation as presented has some typos, or their approximation is much worse (15.9%!). I think I charted what they *meant* to do.

Jonathan
Free time status: see my avatar [8^)
F32 - fast & concise floating point: OBEX, Thread
Unrelated to the prop: KISSlicer
• Posts: 6,142
edited April 2013
I googled "approximation to hypontenuse", and it came up with this reputedly ancient Tamil poem.
Divide the distance you've run (longer side) into 8 equal parts and discard one part from it. Then if you add the half the height to that, you'll get Hypotenuse.

That is like your first approximation,
hypot ~= hi + (lo + max( 0, lo+lo+lo-hi ) ) / 8
in the region where the max(0, lo+lo+lo-hi ) is positive. There it reduces to
hypot ~= hi + 4*lo/8 - hi/8 = 7*hi/8 + lo/2.

Your dog-leg condition max(.,.) is much better though--You have indeed surpassed the ancients!

I plotted the values with and without the dogleg along with the true sqrt(x^2+y^2). In this rendition the total length is constrained to x+y=1.

• Posts: 20,075
edited April 2013
Tamil poetry is so romantic don't you think?

Looking at the graph it seems we need to do half of the dog leg at one end of the scale and replace the other half of the dog leg with tamil poetry where it fits very well.
• Posts: 13,132
edited April 2013
This is quite fascinating. I am always calculating hypotenuse when laying out pcbs, but the good old spreadsheet is great for this.
My Prop boards:
Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
Website: www.clusos.com
• Posts: 20,075
edited April 2013
Cluso,

I'm curious, what's the reason for all the hypontenuse working in PCB layout?
• Posts: 13,132
edited April 2013
I often adjust the via dia and placement to fit into really small areas. Also, placement of components, etc. I always manually place and route and I need to ensure there is enough spacing around components to be able to place them. It's just my way - I have done it for too many years to change and I rarely have pcb revisions, so something is working ;)
My Prop boards:
Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
Website: www.clusos.com
• Posts: 20,075
edited April 2013
Sounds good to me.

Back in the day I worked on a PCB design package, CADSTAR. It had auto-placement and I seem to remember it got better results when it used the Manhattan distance between connected points rather than the "as the crow flies" distance. Hence my curiosity.

To this day I have never come across a PCB designer that said he used any kind of auto-place so I often wondered why we bothered:)
• Posts: 6,142
edited April 2013
I can't vouch for the poem, or the translation, but it does look far more romantic in the original Tamil script:

PS According to wiki, the Tamil for hypotenuse is
• Posts: 882
edited April 2013
Still having trouble finding the Tamil for "dogleg*". [8^)

Jonathan

* Or, for you Keanu Reeves fans, "Dogstar".
Free time status: see my avatar [8^)
F32 - fast & concise floating point: OBEX, Thread
Unrelated to the prop: KISSlicer
• Posts: 882
edited April 2013
Quick update on the double-dogleg PASM code. Here is a 16-instruction version (not counting the call and ret):
```hypot         ' fast hypotenuse approximation
' first sort into max and min
mov       vRes, v1
min       v1, v2
max       v2, vRes
' now v1 = max, and v2 = min

' Double-Dog-Leg optimized approximation, 0.8% error
' hypot ~= hi + 3*lo/32 + max(0,2*lo-hi)/8 + max(0,4*lo-hi)/16
' rewritten as
' hypot ~= 3*lo/32 + hi + max(0,lo/4-hi/16) + max(0,lo/4-hi/8)
' gain_lo
mov       vRes, v2
shr       vRes, #5
' gain_hi
' scale the sides for the dogleg calcs
shr       v1, #4
shr       v2, #2
' 1/4 dogleg
sub       v2, v1
mins      v2, #0
' 1/2 dogleg
sub       v2, v1
mins      v2, #0

hypot_ret     ret
```
Make sure you either use absolute values (or intentionally use 32-bit unsigned values, though the '3 * lo' term may overflow).

Jonathan
Free time status: see my avatar [8^)
F32 - fast & concise floating point: OBEX, Thread
Unrelated to the prop: KISSlicer
• Posts: 1
edited September 2013
I created a forum account just to say: Thank you for posting this!

Ive been working on a light-weight FPGA graphics engine for a small screen, and I needed a distance approximation to be cheap and fast in hardware. The long + (short>>1) approximation is all anyone seems to reference, but it isnt accurate enough to use to draw circles (for example).

I started down the path of a LUT approach, but I couldnt afford it because it would take a division to normalize. I needed to avoid multiplication if at all possible, as well. Your 0.77% version gets around all that and manages to look good in my applications. Its light enough to calculate a distance result on each clock cycle. Basically, that approach is exactly what I needed.

The whole thing strikes me as really clever.
• Posts: 8,541
edited September 2013
I remember hearing someone say back in 1985 that a quick hypotenuse could be calculated by:

longerside + shorterside*3/8

I believe the max error is 5.3%. The 3/8 is some right-shifts and adds.
• Posts: 6,778
edited September 2013
cgracey wrote: »
I remember hearing someone say back in 1985 that a quick hypotenuse could be calculated by:

longerside + shorterside*3/8

I believe the max error is 5.3%. The 3/8 is some right-shifts and adds.

I thought I'd try that out in Tachyon and passed it through that method and your normal calculation and had it report some results (just using trimmed values from memory as "random" data)
```[FONT=courier new][COLOR=#000000]pub [/COLOR][COLOR=#000000][B]HYPOT[/B][/COLOR][COLOR=#000000] ( A B -- hypot )[/COLOR][/FONT]
[FONT=courier new][COLOR=#000000]     2DUP < IF SWAP THEN[/COLOR][/FONT]
[FONT=courier new][COLOR=#000000]     DUP DUP + + 3 SHR +    \ *3/8[/COLOR][/FONT]
[FONT=courier new][COLOR=#000000]     ; [/COLOR][/FONT]

[COLOR=#9900ff][FONT=Ubuntu Mono]pub HDEMO ( addr -- )[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]    \$40 DECIMAL[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]      ADO I W@ \$7FFF AND I 2 + W@ \$7FFF AND[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]      CR OVER .DEC TAB DUP .DEC [/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]       TAB ." APPROX HYPOT=" 2DUP HYPOT DUP 0 REG ! .DEC TAB[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]       ." HYPOT = " ^2 SWAP ^2 + SQR DUP .DEC[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]      DUP 0 REG @ - ABS #10000 * SWAP /                 \ calulate % error scaled to 2 decimal places
TAB ." ERROR=" <# # # "." HOLD #S #> PRINT\$ ." %"[/FONT][/COLOR]

[COLOR=#9900ff][FONT=Ubuntu Mono]    4 +LOOP[/FONT][/COLOR]
[COLOR=#9900ff][FONT=Ubuntu Mono]    ;[/FONT][/COLOR]

[FONT=courier new]
[/FONT]
```
Here's some output
```[FONT=courier new]\$FC00 HDEMO
19747   17052   APPROX HYPOT=26141      HYPOT = 26091   ERROR=0.19%
22945   0785    APPROX HYPOT=23239      HYPOT = 22958   ERROR=1.22%
20673   8200    APPROX HYPOT=23748      HYPOT = 22240   ERROR=6.78%
18753   2608    APPROX HYPOT=19731      HYPOT = 18933   ERROR=4.21%
26828   8996    APPROX HYPOT=30201      HYPOT = 28296   ERROR=6.73%
19737   0308    APPROX HYPOT=19852      HYPOT = 19739   ERROR=0.57%
8972    13140   APPROX HYPOT=16504      HYPOT = 15911   ERROR=3.72%
28425   3864    APPROX HYPOT=29874      HYPOT = 28686   ERROR=4.14%
19243   18068   APPROX HYPOT=26018      HYPOT = 26396   ERROR=1.43%
19721   0820    APPROX HYPOT=20028      HYPOT = 19738   ERROR=1.46%
18753   1592    APPROX HYPOT=19350      HYPOT = 18820   ERROR=2.81%
3848    1040    APPROX HYPOT=4238       HYPOT = 3986    ERROR=6.32%
0001    0256    APPROX HYPOT=0256       HYPOT = 0256    ERROR=0.00%
18689   1556    APPROX HYPOT=19272      HYPOT = 18754   ERROR=2.76%
19857   0036    APPROX HYPOT=19870      HYPOT = 19857   ERROR=0.06%
19345   8994    APPROX HYPOT=22717      HYPOT = 21334   ERROR=6.48% ok[/FONT][FONT=courier new]
[/FONT]
```

EDIT: added % error function to demo and output
Tachyon Forth - compact, fast, forthwright and interactive

Tachyon Forth News Blog
TACHYON DEMONSTRATOR
Brisbane, Australia
• Posts: 882
edited September 2013
Thanks, skrasms!

Yeah, 6.8% matches the error I reported using 3/8 in the first post...dividing by 3 was better (~5.7%), but of course required a division.

Jonathan
Free time status: see my avatar [8^)
F32 - fast & concise floating point: OBEX, Thread
Unrelated to the prop: KISSlicer
• Posts: 882
I was reviewing this code and realized I had not taken advantage of the propeller's flags...the code no longer needs the 'mins' commands. Here is the double-dogleg version with the 2 instructions removed:
```DDLhypot      '' Double Dogleg Hypotenuse approximation
' hypot ~= 3*lo/32 + hi + max(0,4*lo-hi)/16 + max(0,2*lo-hi)/8
'  ** refactored as **
' hypot ~= (lo/4)*3/8 + hi + max(0,lo/4-hi/16) + max(0,lo/4-hi/8)

' make sure v1 >= v2
mov       vRes, v1
min       v1, v2
max       v2, vRes
' Hypot = (lo/4)*3/8
shr       v2, #2 ' pre-dividing makes sure the *3 will not overflow
mov       vRes, v2
shr       vRes, #3
' hypot += hi
' hypot += max(0,lo/4-hi/16)
shr       v1, #4 '
sub       v2, v1    WC
if_NC         add       vRes, v2
' hypot += max(0,lo/4-hi/16-hi/16)
if_NC         sub       v2, v1    WC
if_NC         add       vRes, v2

DDLhypot_ret ret
```
I also pre-divided the 'lo' term so the 3*lo/32 can not possibly overflow. The function should work as long as the input and output values all fit inside 32-bit unsigned integers. This version weighs in at 14 instructions (not counting call and return) and has the same 0.8% max error (more if you use tiny integers, since it's an integer approximation, for example the hypotenuse of a 1x1 right triangle would yield 1).

thanks,
Jonathan
Free time status: see my avatar [8^)
F32 - fast & concise floating point: OBEX, Thread
Unrelated to the prop: KISSlicer
• Posts: 63
Thanks for sharing!

This could also be used for AM demodulation in software defined radios. In this context it's usually called magnitude or (complex) absolute value. And in other contexts, L2 norm.
James https://github.com/SaucySoliton/

Invention is the Science of Laziness