# Any IEEE-754 gurus out there?

RossH
Posts:

**4,696**
This is a request for assistance from any IEEE-754 Floating point experts ...

Catalina has had IEEE-754 floating point support since ... well, forever - it was one of the reasons Catalina was created.

On the Propeller 2, I now have a floating point plugin that implements the standard C floating point library functions using the P2's CORDIC solver. It is based on Cam Thompson's original floating point solution for the P1 - although with CORDIC, it only takes one cog instead of two. Think of it as a maths co-processor implemented in a cog.

The plugin seems to be fully functional - it passes all my own tests. But it does not pass the classic "paranoia.c" floating point test program without defects. The defects are not to do with the basic maths processing - they seem to be to do with how it deals with

Here is the output of the classic "paranoia.c" Floating Point Test program, as compiled by Catalina. It is the two "SERIOUS DEFECTS" that are the problem:

I really hate Catalina to have any defects at all. I am not ready to release this plugin yet, but if anyone has a deep knowledge of IEEE-754 and thinks they can help, I will send them a current snapshot of Catalina for them to investigate.

Catalina has had IEEE-754 floating point support since ... well, forever - it was one of the reasons Catalina was created.

On the Propeller 2, I now have a floating point plugin that implements the standard C floating point library functions using the P2's CORDIC solver. It is based on Cam Thompson's original floating point solution for the P1 - although with CORDIC, it only takes one cog instead of two. Think of it as a maths co-processor implemented in a cog.

The plugin seems to be fully functional - it passes all my own tests. But it does not pass the classic "paranoia.c" floating point test program without defects. The defects are not to do with the basic maths processing - they seem to be to do with how it deals with

**nan**and**inf**and**underflow/overflow**... and are really beyond my ability to solveHere is the output of the classic "paranoia.c" Floating Point Test program, as compiled by Catalina. It is the two "SERIOUS DEFECTS" that are the problem:

Diagnosis resumes after milestone Number 0 Page: 1 Users are invited to help debug and augment this program so it will cope with unanticipated and newly uncovered arithmetic pathologies. Please send suggestions and interesting results to Richard Karpinski Computer Center U-76 University of California San Francisco, CA 94143-0704, USA In doing so, please include the following information: Precision: double; Version: 10 February 1989; Computer: Compiler: Optimization level: Other relevant compiler options: To continue, press RETURN Diagnosis resumes after milestone Number 1 Page: 2 Running this program should reveal these characteristics: Radix = 1, 2, 4, 8, 10, 16, 100, 256 ... Precision = number of significant digits carried. U2 = Radix/Radix^Precision = One Ulp (OneUlpnit in the Last Place) of 1.000xxx . U1 = 1/Radix^Precision = One Ulp of numbers a little less than 1.0 . Adequacy of guard digits for Mult., Div. and Subt. Whether arithmetic is chopped, correctly rounded, or something else for Mult., Div., Add/Subt. and Sqrt. Whether a Sticky Bit used correctly for rounding. UnderflowThreshold = an underflow threshold. E0 and PseudoZero tell whether underflow is abrupt, gradual, or fuzzy. V = an overflow threshold, roughly. V0 tells, roughly, whether Infinity is represented. Comparisions are checked for consistency with subtraction and for contamination with pseudo-zeros. Sqrt is tested. Y^X is not tested. Extra-precise subexpressions are revealed but NOT YET tested. Decimal-Binary conversion is NOT YET tested for accuracy. To continue, press RETURN Diagnosis resumes after milestone Number 2 Page: 3 The program attempts to discriminate among FLAWs, like lack of a sticky bit, Serious DEFECTs, like lack of a guard digit, and FAILUREs, like 2+2 == 5 . Failures may confound subsequent diagnoses. The diagnostic capabilities of this program go beyond an earlier program called `MACHAR', which can be found at the end of the book `Software Manual for the Elementary Functions' (1980) by W. J. Cody and W. Waite. Although both programs try to discover the Radix, Precision and range (over/underflow thresholds) of the arithmetic, this program tries to cope with a wider variety of pathologies, and to say how well the arithmetic is implemented. The program is based upon a conventional radix representation for floating-point numbers, but also allows logarithmic encoding as used by certain early WANG machines. BASIC version of this program (C) 1983 by Prof. W. M. Kahan; see source comments for more history. To continue, press RETURN Diagnosis resumes after milestone Number 3 Page: 4 Program is now RUNNING tests on small integers: -1, 0, 1/2, 1, 2, 3, 4, 5, 9, 27, 32 & 240 are O.K. Searching for Radix and Precision. Radix = 2.000000 . Closest relative separation found is U1 = 5.9604645e-08 . Recalculating radix and precision confirms closest relative separation U1 . Radix confirmed. The number of significant digits of the Radix is 24.000000 . To continue, press RETURN Diagnosis resumes after milestone Number 30 Page: 5 Subtraction appears to be normalized, as it should be. Checking for guard digit in *, /, and -. *, /, and - appear to have guard digits, as they should. To continue, press RETURN Diagnosis resumes after milestone Number 40 Page: 6 Checking rounding on multiply, divide and add/subtract. Multiplication appears to round correctly. Division appears to round correctly. Addition/Subtraction appears to round correctly. Checking for sticky bit. Sticky bit used incorrectly or not at all. Does Multiplication commute? Testing on 20 random pairs. No failures found in 20 integer pairs. Running test of square root(x). Testing if sqrt(X * X) == X for 20 Integers X. Test for sqrt monotonicity. sqrt has passed a test for Monotonicity. Testing whether sqrt is rounded or chopped. Square root is neither chopped nor correctly rounded. Observed errors run from 0.0000000e+00 to 5.0000000e-01 ulps. To continue, press RETURN Diagnosis resumes after milestone Number 90 Page: 7 Testing powers Z^i for small Integers Z and i. DEFECT: computing (3.00000000000000000e+00) ^ (5.00000000000000000e+00) yielded 2.42998046875000000e+02; which compared unequal to correct 2.43000000000000000e+02 ; they differ by -1.95312500000000000e-03 . Errors like this may invalidate financial calculations involving interest rates. Similar discrepancies have occurred 49 times. To continue, press RETURN Diagnosis resumes after milestone Number 100 Page: 8 Seeking Underflow thresholds UfThold and E0. Smallest strictly positive number found is E0 = 1.4013e-45 . Since comparison denies Z = 0, evaluating (Z + Z) / Z should be safe. What the machine gets for (Z + Z) / Z is 2.00000000000000000e+00 . This is O.K., provided Over/Underflow has NOT just been signaled. Underflow is gradual; it incurs Absolute Error = (roundoff in UfThold) < E0. The Underflow threshold is 1.17549467086791992e-38, below which calculation may suffer larger Relative error than merely roundoff. Since underflow occurs below the threshold UfThold = (2.00000000000000000e+00) ^ (-1.26000000000000000e+02) only underflow should afflict the expression (2.00000000000000000e+00) ^ (-1.26000000000000000e+02); actually calculating yields: nan . SERIOUS DEFECT: this is not between 0 and underflow threshold = 1.17549467086791992e-38 . Testing X^((X + 1) / (X - 1)) vs. exp(2) = 7.38905572891235352e+00 as X -> 1. DEFECT: Calculated 4.00000000000000000e+00 for (1 + (-5.96046447753906250e-08) ^ (-3.35544320000000000e+07); differs from correct value by -3.38905572891235352e+00 . This much error may spoil financial calculations involving tiny interest rates. Testing powers Z^Q at four nearly extreme values. ... no discrepancies found. To continue, press RETURN Diagnosis resumes after milestone Number 160 Page: 9 Searching for Overflow threshold: This may generate an error. Can `Z = -Y' overflow? Trying it on Y = -inf . Seems O.K. SERIOUS DEFECT: overflow past -inf shrinks to nan . Overflow threshold is V = 3.40282350000000000e+38 . Overflow saturates at V0 = inf . No Overflow should be signaled for V * 1 = 3.40282350000000000e+38 nor for V / 1 = 3.40282350000000000e+38 . Any overflow signal separating this * from the one above is a DEFECT. DEFECT: Comparison alleges that what prints as Z = 1.40129899978637695e-45 is too far from sqrt(Z) ^ 2 = 0.00000000000000000e+00 . DEFECT: Comparison alleges that Z = inf is too far from sqrt(Z) ^ 2 (nan) . To continue, press RETURN Diagnosis resumes after milestone Number 190 Page: 10 What message and/or values does Division by Zero produce? This can interupt your program. You can skip this part if you wish. Do you wish to compute 1 / 0? Trying to compute 1 / 0 produces ... nan . Do you wish to compute 0 / 0? Trying to compute 0 / 0 produces ... nan . To continue, press RETURN Diagnosis resumes after milestone Number 220 Page: 11 The number of SERIOUS DEFECTs discovered = 2. The number of DEFECTs discovered = 4. The arithmetic diagnosed has unacceptable Serious Defects. END OF TEST.

I really hate Catalina to have any defects at all. I am not ready to release this plugin yet, but if anyone has a deep knowledge of IEEE-754 and thinks they can help, I will send them a current snapshot of Catalina for them to investigate.

## Comments

12,084Anyway, I just fixed the links:

http://www.rayslogic.com/Software/TimexUSB/Float.htm

12,084How is this being computed? Is it using LOG to do it?

232The code to implement will be smaller, simpler and more accurate. It has bee shown that 32bit Posits are as accurate, and in many cases, more accurate than 64bit IEEE-754 floats.

@cgracey There are several FPGA implementations too. The Posit design itself is "open" and many of the 3rd party implementations are open source

There are papers and example code (numerous implementations) available here:

https://posithub.org - home page

https://posithub.org/khub_community - Code

https://posithub.org/docs/PDS/PositEffortsSurvey.html -Survey of hardware & Software implementations

The latest conference slides and videos on Posits (March 2019) are here:

https://posithub.org/conga/2019/programme

J

14,747Hmm... difference can mean many things

As a ppm error that is not so large

1-2.42998046875000000e+02/2.43000000000000000e+02 = 8.0375 ppm

so maybe that's a simple series length vs precision trade off ?

3,440The most serious "defects" seem to be failing to return NaN instead of garbage, which is another thing you shouldn't really trust your FP routines with. I wouldn't consider any of these results alarming, just things one should be aware of when using the library, as one should never regard floats as if they are reals anyway.

12,0843,4404,696It is the two "Serious defects" that I would like to address:

1. This underflow:

2. This overflow:

I do not even know enough about IEEE-754 nan/inf/overflow/underflow processing to know if these are issues I need to worry about or not. Clearly, they are cases at or near the extreme range of the IEEE-754 32 bit floats. Perhaps they are not really as serious as they sound?

4,696Interesting, but I don't think posits are ready for adoption yet. The "most comprehensive" C implementation of 32 bit posits is apparently still under test:

4,957The underflow is more worrying. It may be a similar issue but with negative exponents; if the exponent is < 0 then it needs to be set to 0 and the remaining bits cleared. (One can get fancy and try to set the remaining bits correctly as if the exponent of 0 is correct; this gives a "subnormal" number, but it's not mandatory and it's OK to just set the whole 32 bits to 0 to indicate an underflow to 0). If the library doesn't do that and tries to manipulate a negative exponent it may end up setting the exponent field to all 1's, which if any other bits are set indicates nan. Underflowing from 0 to nan like this really is a bad bug, you want underflow to always go to 0 (ideally a "signed 0" with the proper sign bit of the actual, non-underflowed result) but definitely not to INF or NaN.

4,696Several of the floating point libraries I have tried exhibit the "overflow past -inf shrinks to nan" failure. And some of them are actual production C libraries (i.e. nothing to do with the Propeller). So if I can't fix that one I probably won't worry too much about it.

The other failure seems to come from Cam Thompson's original propeller libraries, which I based my CORDIC implementation on. I will look into that one and see if I can fix it.

Thanks for the clues.

3,44011,2594,696Having figured out how to fix it, I can now see why this choice was made - detecting Infinity in all the cases where it must be detected and then handling it correctly (including the sign, since infinity can be positive or negative) blows out the code size significantly. In the case of multiply alone, correctly detecting and handling Infinite values takes much more code than actually performing the multiply

I expecting I will find similar issues in a few of the floating point operations. However, if it does not break the "paranoia" test (as it does in the case of multiply), then I think I will just leave them alone!

EDIT: "overflow", not "underflow"!

4,696I doubt I will be able to fix the other (less serious) defects in the original libraries - there probably isn't enough space. But I hope to be able to fix them in the CORDIC version.

Thanks to those who offered assistance!

4,696So CORDIC is around 3 times faster than the next best option! Not bad ...

The first option (

-lmc) is the new one-cog CORDIC implementation, available only on the P2. The others are the existing P1 floating point options, as re-implemented for the P2 - i.e. either one or two cogs based on Cam Thompson's original Float32 implementation, or a software-only implementation.I wasn't sure I was going to be able to fit the CORDIC implementation into one cog, so my original plan was going to be to replace Float32 A and B with CORDIC versions. But I have now decided to leave these options intact, so that programs being ported from the P1 can be exhibit the same floating point behaviour on the P2 (if that's ever required). If not, or for new programs on the P2, there would really be no point in using anything but

-lmcif you have a spare cog, or-lmif you don't.Having the CORDIC implemented in a cog means you can use it as a true floating point co-processor, if required - i.e. you can send a floating point request to the cog, then get on with other stuff, and then check for and retrieve the result sometime later. The C library wrappers don't do this, of course - they just initiate the request and then wait for the results - but doing it the other way is trivial.

The actual program is not really important - it just exercises various floating point functions, but here it is:

3,4404073,382I was wondering the same thing. And also if/how much faster it would be that way too by doing the FP calculations directly from the calling cog. Be good to compare it with the benchmarks from above if it is possible. Why use a separate COG in the cases where you don't need to.

4,696Yes. Fairly simple to do. It would be a bit slower than the cog-based CORDIC, but it

wouldsave a cog.I will leave that one as an exercise for the reader!

917thanks,

Jonathan

4714,696I don't think I know that particular library. Can you give me a link to it?

4,696The

-lmcoption uses the built-in CORDIC solver wherever possible. There is still quite a lot of software involved to convert to/from IEEE-754 formats, and also to implement those functions not directly implemented by the CORDIC solver. However, this is all contained in the same cog, which can be used stand-alone. All that is in the Catalina library is some very simple wrapper functions that send the cog one (or two) IEEE-754 floating point values and specify the operation the cog is to perform, and then wait for the cog to respond **.** Apart from the inverse hyperbolic functions - sinh, cosh, tanh. There just wasn't enough space to squeeze these in the cog, so they remain S/W only.

917thanks,

Jonathan

4,696Hi Jonathan

Just had a quick look. I can't do a direct comparison without spending more time on it than I have to spare at the moment. But it seems you use software CORDIC, but only for the inverse trig functions (asin, acos, atan)? Not for the normal trig function (sin, cos tan)?

I think this would mean your library would just be slightly faster than Float32Full. But of course it does also save a cog. Do you have any comparison figures on how fast it is compared with Float32Full?

On the P2 you could easily replace your software CORDIC with the P2's hardware CORDIC - that would speed things up. But you would really need to replace

allthe table lookups (log, exp, sin, cos) with CORDIC to get a more dramatic improvement.312some years ago I've done a quick speed testing on F32 vs Float32, coming to the conclusion that is about 20% faster.

While I tried to account for call overhead, the methodology used might not be entirely "scientific" though.

In this thread:

https://forums.parallax.com/discussion/comment/1042781/#Comment_1042781

4,696That seem to match the "gut feel" I had after reading the code.

At some point, I will do a test for accuracy of the various Catalina options, but I have other things on my list to do first. I am currently fairly confident that the hardware CORDIC option is

at leastas accurate as any of the other options, and in some specific cases that Ididtest, it was muchmoreaccurate.