Shop OBEX P1 Docs P2 Docs Learn Events
Dog-leg hypotenuse approximation — Parallax Forums

Dog-leg hypotenuse approximation

lonesocklonesock Posts: 917
edited 2013-09-09 07:45 in Propeller 1
For reasons that may never become clear, I needed a fast approximation to the hypot function, calculating the length of the hypotenuse given the length of the other two sides of a right triangle. Here were my requirements:
  • works with integers larger than 16-bit
  • doable in Propeller Assembly
  • FAST
  • reasonably accurate
I had heard of an ingenious function a long time ago: sort the lengths of the sides a and b such that:

hi = max( a, b )
lo = min( a, b )
hypot ~= hi + lo/2
hypot_linear_orig.png


(Note that I am using 'max' and 'min' in the C sense, not the Propeller sense.) This is extremely fast, and not too horrible. To test this out, I made a spreadsheet with a graph, a RMSE calculation, and a worst error calculation. I normalized the maximum value to 1.0, and ran the minimum value from 0 up to 1. I'm attaching the spreadsheet.

I found that the simple hi+lo/2 approximation had a worst case error of ~11.8%, and the estimate was always high. By changing the multiplier on the lo term to 0.33 you could minimize the RMSE of the approximation. If your processor has integer division, hi + lo/3 is your best bet for this (max error ~5.7%). Since I am doing this in PASM, I found 3/8 to be the best approximation for me (3 adds and a shift @ ~6.8% max error). Note that you can reduce the RMSE further by allowing the gain of the hi term to be < 1.0 (hypot ~= 0.936*hi + 0.426*lo), but I have some reasons for not doing that and the worst error is still 6.4%.
hypot_linear_opt.png


I was thinking given the shape of the curve, a piece-wise linear function would do a better job than the straight linear function...something with a dog-leg in it (bent somewhere in the middle). It occurred to me that the function max( 0, N*lo - hi) should work. E.g., if N==2, when lo < hi/2, the term will be clamped to z, and when above hi/2 it will be linear. So the form of the new equation is:

hypot ~= hi + Gain_lo * lo + Gain_dogleg * max( 0, N * lo - hi )

For N = 2, the best (integer operation) gains turned out to be = 1/5. So if you have integer division you could say:

hypot ~= hi + (lo + max( 0, lo+lo-hi ) ) / 5
(max error is ~2.0%)
hypot_dogleg2.png


However, the best solution I found with just shifts and adds was this:

hypot ~= hi + (lo + max( 0, lo+lo+lo-hi ) ) / 8
hypot_dogleg3.png


The max error is ~2.8%, and it only takes 11 PASM instructions, including the 3 to sort into hi and lo.
hypot         ' fast hypotenuse approximation
              ' first sort into max and min
              mov       vRes, v1
              min       v1, v2
              max       v2, vRes
              ' now that v1 = max, and v2 = min
              ' Dog-Leg approximation N=3, 2.8% error
              ' do the dog-leg term (1/3 of the way in)
              mov       vRes, v2
              add       vRes, v2
              add       vRes, v2
              sub       vRes, v1
              mins      vRes, #0               ' clamp the dog-leg term
              add       vRes, v2                ' add in the linear term
              shr       vRes, #3                ' scale by 1/8th
              add       vRes, v1                ' add in the max term

hypot_ret     ret
I welcome any optimizations or comments, and hopefully this is useful to someone.

thanks,
Jonathan
407 x 308 - 10K
407 x 308 - 10K
407 x 308 - 9K
407 x 308 - 9K

Comments

  • lonesocklonesock Posts: 917
    edited 2013-04-22 11:43
    Here is a version with a double-dog-leg, one at 1/4 and another at 1/2:

    hypot ~= hi + 3*lo/32 + max(0,2*lo-hi)/8 + max(0,4*lo-hi)/16
    (worst error is ~0.77%)
    hypot_double_dogleg.png


    Here's the shortest PASM code I could find so far (18 instructions), and it requires extra scratch variables, so I really appreciate any possible improvements!
    hypot         ' fast hypotenuse approximation
                  ' first sort into max and min
                  mov       vRes, v1
                  min       v1, v2
                  max       v2, vRes
                  ' now that v1 = max, and v2 = min
                  ' Double-Dog-Leg approximation, 0.8% error
                  ' hypot ~= hi + 3*lo/32 + max(0,2*lo-hi)/8 + max(0,4*lo-hi)/16
                  ' alternatively
                  ' hypot ~= hi + 3*lo/32 + (max(0,2*lo-hi) + max(0,2*lo-hi/2) )/8
                  mov       vRes, v1
                  shr       v1, #1
                  mov       tmp, v2
                  add       tmp, v2
                  add       v2, tmp
                  shr       v2, #5
                  add       vRes, v2
                  sub       tmp, v1
                  mins      tmp, #0
                  mov       tmp1, tmp
                  sub       tmp, v1
                  mins      tmp, #0
                  add       tmp1, tmp
                  shr       tmp1, #3
                  add       vRes, tmp1
    
    hypot_ret     ret
    
    thanks,
    Jonathan
    450 x 320 - 12K
  • Mark_TMark_T Posts: 1,981
    edited 2013-04-22 12:36
    lonesock wrote: »
    For reasons that may never become clear, I needed a fast approximation to the hypot function, calculating the length of the hypotenuse given the length of the other two sides of a right triangle. Here were my requirements:

    Oh I am intrigued as to the reason - though I can't see why a general table-lookup linear interpolation won't serve your purpose more flexibly.

    One reason I've done this is using integer arithmetic to calculate CNC stepper drive signals - the hypotenuse is needed to drive the speed feedback
    DDA - am I close?
  • lonesocklonesock Posts: 917
    edited 2013-04-22 14:49
    I can neither confirm nor deny the exactitude with which your answer approaches reality. [8^)

    RAM is precious, I don't think I could fit a table in right now. How would you do a table implementation without at least one scaling division? This only costs me 72 clocks atm, and of course I'd love to reduce that. [8^)

    Jonathan
  • RaymanRayman Posts: 14,646
    edited 2013-04-22 17:20
  • lonesocklonesock Posts: 917
    edited 2013-04-22 20:19
    Rayman, yes, the form there is identical to the first linear equation, simply matching the distance at min=0 case and the distance at min=max case.
    hypot_apx.png

    Note that it's a tough gain to approximate with integer math, and of course the worst case error (~8.2%) and the RMSE are both worse than the 3/8 gain linear form. The dogleg form is different and superior. The double-dogleg form is even more so. [8^)

    EDIT: I take that back...either that equation as presented has some typos, or their approximation is much worse (15.9%!). I think I charted what they *meant* to do.

    Jonathan
    407 x 308 - 10K
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2013-04-22 23:32
    I googled "approximation to hypontenuse", and it came up with this reputedly ancient Tamil poem.
    Divide the distance you've run (longer side) into 8 equal parts and discard one part from it. Then if you add the half the height to that, you'll get Hypotenuse.

    That is like your first approximation,
    hypot ~= hi + (lo + max( 0, lo+lo+lo-hi ) ) / 8
    in the region where the max(0, lo+lo+lo-hi ) is positive. There it reduces to
    hypot ~= hi + 4*lo/8 - hi/8 = 7*hi/8 + lo/2.

    Your dog-leg condition max(.,.) is much better though--You have indeed surpassed the ancients!

    I plotted the values with and without the dogleg along with the true sqrt(x^2+y^2). In this rendition the total length is constrained to x+y=1.

    Screen shot 2013-04-22 at 11.26.33 PM.png
  • Heater.Heater. Posts: 21,230
    edited 2013-04-22 23:54
    Tamil poetry is so romantic don't you think?

    Looking at the graph it seems we need to do half of the dog leg at one end of the scale and replace the other half of the dog leg with tamil poetry where it fits very well.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-04-23 00:47
    This is quite fascinating. I am always calculating hypotenuse when laying out pcbs, but the good old spreadsheet is great for this.
  • Heater.Heater. Posts: 21,230
    edited 2013-04-23 01:49
    Cluso,

    I'm curious, what's the reason for all the hypontenuse working in PCB layout?
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-04-23 04:14
    I often adjust the via dia and placement to fit into really small areas. Also, placement of components, etc. I always manually place and route and I need to ensure there is enough spacing around components to be able to place them. It's just my way - I have done it for too many years to change and I rarely have pcb revisions, so something is working ;)
  • Heater.Heater. Posts: 21,230
    edited 2013-04-23 05:02
    Sounds good to me.

    Back in the day I worked on a PCB design package, CADSTAR. It had auto-placement and I seem to remember it got better results when it used the Manhattan distance between connected points rather than the "as the crow flies" distance. Hence my curiosity.

    To this day I have never come across a PCB designer that said he used any kind of auto-place so I often wondered why we bothered:)
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2013-04-23 08:18
    I can't vouch for the poem, or the translation, but it does look far more romantic in the original Tamil script:
    Screen shot 2013-04-23 at 8.02.37 AM.png


    PS According to wiki, the Tamil for hypotenuse is
    Screen shot 2013-04-23 at 8.22.11 AM.png
  • lonesocklonesock Posts: 917
    edited 2013-04-23 08:30
    Still having trouble finding the Tamil for "dogleg*". [8^)

    Jonathan

    * Or, for you Keanu Reeves fans, "Dogstar".
  • lonesocklonesock Posts: 917
    edited 2013-04-23 09:58
    Quick update on the double-dogleg PASM code. Here is a 16-instruction version (not counting the call and ret):
    hypot         ' fast hypotenuse approximation
                  ' first sort into max and min
                  mov       vRes, v1
                  min       v1, v2
                  max       v2, vRes
                  ' now v1 = max, and v2 = min
    
                  ' Double-Dog-Leg optimized approximation, 0.8% error
                  ' hypot ~= hi + 3*lo/32 + max(0,2*lo-hi)/8 + max(0,4*lo-hi)/16
                  ' rewritten as
                  ' hypot ~= 3*lo/32 + hi + max(0,lo/4-hi/16) + max(0,lo/4-hi/8)
                  ' gain_lo
                  mov       vRes, v2
                  add       vRes, v2
                  add       vRes, v2
                  shr       vRes, #5
                  ' gain_hi
                  add       vRes, v1
                  ' scale the sides for the dogleg calcs
                  shr       v1, #4
                  shr       v2, #2
                  ' 1/4 dogleg
                  sub       v2, v1
                  mins      v2, #0
                  add       vRes, v2
                  ' 1/2 dogleg
                  sub       v2, v1
                  mins      v2, #0
                  add       vRes, v2
    
    hypot_ret     ret
    
    Make sure you either use absolute values (or intentionally use 32-bit unsigned values, though the '3 * lo' term may overflow).

    Jonathan
  • skrasmsskrasms Posts: 1
    edited 2013-09-07 18:40
    I created a forum account just to say: Thank you for posting this!

    I’ve been working on a light-weight FPGA graphics engine for a small screen, and I needed a distance approximation to be cheap and fast in hardware. The “long + (short>>1)” approximation is all anyone seems to reference, but it isn’t accurate enough to use to draw circles (for example).

    I started down the path of a LUT approach, but I couldn’t afford it because it would take a division to normalize. I needed to avoid multiplication if at all possible, as well. Your 0.77% version gets around all that and manages to look good in my applications. It’s light enough to calculate a distance result on each clock cycle. Basically, that approach is exactly what I needed.

    The whole thing strikes me as really clever.
  • cgraceycgracey Posts: 14,153
    edited 2013-09-08 21:16
    I remember hearing someone say back in 1985 that a quick hypotenuse could be calculated by:

    longerside + shorterside*3/8

    I believe the max error is 5.3%. The 3/8 is some right-shifts and adds.
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2013-09-09 00:58
    cgracey wrote: »
    I remember hearing someone say back in 1985 that a quick hypotenuse could be calculated by:

    longerside + shorterside*3/8

    I believe the max error is 5.3%. The 3/8 is some right-shifts and adds.

    I thought I'd try that out in Tachyon and passed it through that method and your normal calculation and had it report some results (just using trimmed values from memory as "random" data)
    [FONT=courier new][COLOR=#000000]pub [/COLOR][COLOR=#000000][B]HYPOT[/B][/COLOR][COLOR=#000000] ( A B -- hypot )[/COLOR][/FONT]
    [FONT=courier new][COLOR=#000000]     2DUP < IF SWAP THEN[/COLOR][/FONT]
    [FONT=courier new][COLOR=#000000]     DUP DUP + + 3 SHR +    \ *3/8[/COLOR][/FONT]
    [FONT=courier new][COLOR=#000000]     ; [/COLOR][/FONT]
    
    [COLOR=#9900ff][FONT=Ubuntu Mono]pub HDEMO ( addr -- )[/FONT][/COLOR]
    [COLOR=#9900ff][FONT=Ubuntu Mono]    $40 DECIMAL[/FONT][/COLOR]
    [COLOR=#9900ff][FONT=Ubuntu Mono]      ADO I W@ $7FFF AND I 2 + W@ $7FFF AND[/FONT][/COLOR]
    [COLOR=#9900ff][FONT=Ubuntu Mono]      CR OVER .DEC TAB DUP .DEC [/FONT][/COLOR]
    [COLOR=#9900ff][FONT=Ubuntu Mono]       TAB ." APPROX HYPOT=" 2DUP HYPOT DUP 0 REG ! .DEC TAB[/FONT][/COLOR]
    [COLOR=#9900ff][FONT=Ubuntu Mono]       ." HYPOT = " ^2 SWAP ^2 + SQR DUP .DEC[/FONT][/COLOR]
    [COLOR=#9900ff][FONT=Ubuntu Mono]      DUP 0 REG @ - ABS #10000 * SWAP /                 \ calulate % error scaled to 2 decimal places
          TAB ." ERROR=" <# # # "." HOLD #S #> PRINT$ ." %"[/FONT][/COLOR]
    
    [COLOR=#9900ff][FONT=Ubuntu Mono]    4 +LOOP[/FONT][/COLOR]
    [COLOR=#9900ff][FONT=Ubuntu Mono]    ;[/FONT][/COLOR]
    
    [FONT=courier new]
    [/FONT]
    
    Here's some output
    [FONT=courier new]$FC00 HDEMO 
    19747   17052   APPROX HYPOT=26141      HYPOT = 26091   ERROR=0.19%
    22945   0785    APPROX HYPOT=23239      HYPOT = 22958   ERROR=1.22%
    20673   8200    APPROX HYPOT=23748      HYPOT = 22240   ERROR=6.78%
    18753   2608    APPROX HYPOT=19731      HYPOT = 18933   ERROR=4.21%
    26828   8996    APPROX HYPOT=30201      HYPOT = 28296   ERROR=6.73%
    19737   0308    APPROX HYPOT=19852      HYPOT = 19739   ERROR=0.57%
    8972    13140   APPROX HYPOT=16504      HYPOT = 15911   ERROR=3.72%
    28425   3864    APPROX HYPOT=29874      HYPOT = 28686   ERROR=4.14%
    19243   18068   APPROX HYPOT=26018      HYPOT = 26396   ERROR=1.43%
    19721   0820    APPROX HYPOT=20028      HYPOT = 19738   ERROR=1.46%
    18753   1592    APPROX HYPOT=19350      HYPOT = 18820   ERROR=2.81%
    3848    1040    APPROX HYPOT=4238       HYPOT = 3986    ERROR=6.32%
    0001    0256    APPROX HYPOT=0256       HYPOT = 0256    ERROR=0.00%
    18689   1556    APPROX HYPOT=19272      HYPOT = 18754   ERROR=2.76%
    19857   0036    APPROX HYPOT=19870      HYPOT = 19857   ERROR=0.06%
    19345   8994    APPROX HYPOT=22717      HYPOT = 21334   ERROR=6.48% ok[/FONT][FONT=courier new]
    [/FONT]
    

    EDIT: added % error function to demo and output
  • lonesocklonesock Posts: 917
    edited 2013-09-09 07:45
    Thanks, skrasms!

    Yeah, 6.8% matches the error I reported using 3/8 in the first post...dividing by 3 was better (~5.7%), but of course required a division.

    Jonathan
  • I was reviewing this code and realized I had not taken advantage of the propeller's flags...the code no longer needs the 'mins' commands. Here is the double-dogleg version with the 2 instructions removed:
    DDLhypot      '' Double Dogleg Hypotenuse approximation
                  ' hypot ~= 3*lo/32 + hi + max(0,4*lo-hi)/16 + max(0,2*lo-hi)/8
                  '  ** refactored as **
                  ' hypot ~= (lo/4)*3/8 + hi + max(0,lo/4-hi/16) + max(0,lo/4-hi/8)
                   
                  ' make sure v1 >= v2
                  mov       vRes, v1
                  min       v1, v2
                  max       v2, vRes
                  ' Hypot = (lo/4)*3/8
                  shr       v2, #2 ' pre-dividing makes sure the *3 will not overflow
                  mov       vRes, v2
                  add       vRes, v2
                  add       vRes, v2
                  shr       vRes, #3
                  ' hypot += hi
                  add       vRes, v1
                  ' hypot += max(0,lo/4-hi/16)
                  shr       v1, #4 '
                  sub       v2, v1    WC
    if_NC         add       vRes, v2
                  ' hypot += max(0,lo/4-hi/16-hi/16)
    if_NC         sub       v2, v1    WC
    if_NC         add       vRes, v2
                  
    DDLhypot_ret ret
    
    I also pre-divided the 'lo' term so the 3*lo/32 can not possibly overflow. The function should work as long as the input and output values all fit inside 32-bit unsigned integers. This version weighs in at 14 instructions (not counting call and return) and has the same 0.8% max error (more if you use tiny integers, since it's an integer approximation, for example the hypotenuse of a 1x1 right triangle would yield 1).

    thanks,
    Jonathan
  • Thanks for sharing!

    This could also be used for AM demodulation in software defined radios. In this context it's usually called magnitude or (complex) absolute value. And in other contexts, L2 norm.
Sign In or Register to comment.