Any IEEE-754 gurus out there?

RossH · 2019-08-17 10:01

This is a request for assistance from any IEEE-754 Floating point experts ...

Catalina has had IEEE-754 floating point support since ... well, forever - it was one of the reasons Catalina was created.

On the Propeller 2, I now have a floating point plugin that implements the standard C floating point library functions using the P2's CORDIC solver. It is based on Cam Thompson's original floating point solution for the P1 - although with CORDIC, it only takes one cog instead of two. Think of it as a maths co-processor implemented in a cog.

The plugin seems to be fully functional - it passes all my own tests. But it does not pass the classic "paranoia.c" floating point test program without defects. The defects are not to do with the basic maths processing - they seem to be to do with how it deals with nan and inf and underflow/overflow ... and are really beyond my ability to solve

Here is the output of the classic "paranoia.c" Floating Point Test program, as compiled by Catalina. It is the two "SERIOUS DEFECTS" that are the problem:


Diagnosis resumes after milestone Number 0          Page: 1

Users are invited to help debug and augment this program so it will
cope with unanticipated and newly uncovered arithmetic pathologies.

Please send suggestions and interesting results to
        Richard Karpinski
        Computer Center U-76
        University of California
        San Francisco, CA 94143-0704, USA

In doing so, please include the following information:
        Precision:      double;
        Version:        10 February 1989;
        Computer:

        Compiler:

        Optimization level:

        Other relevant compiler options:

To continue, press RETURN
Diagnosis resumes after milestone Number 1          Page: 2

Running this program should reveal these characteristics:
     Radix = 1, 2, 4, 8, 10, 16, 100, 256 ...
     Precision = number of significant digits carried.
     U2 = Radix/Radix^Precision = One Ulp
        (OneUlpnit in the Last Place) of 1.000xxx .
     U1 = 1/Radix^Precision = One Ulp of numbers a little less than 1.0 .
     Adequacy of guard digits for Mult., Div. and Subt.
     Whether arithmetic is chopped, correctly rounded, or something else
        for Mult., Div., Add/Subt. and Sqrt.
     Whether a Sticky Bit used correctly for rounding.
     UnderflowThreshold = an underflow threshold.
     E0 and PseudoZero tell whether underflow is abrupt, gradual, or fuzzy.
     V = an overflow threshold, roughly.
     V0  tells, roughly, whether  Infinity  is represented.
     Comparisions are checked for consistency with subtraction
        and for contamination with pseudo-zeros.
     Sqrt is tested.  Y^X is not tested.
     Extra-precise subexpressions are revealed but NOT YET tested.
     Decimal-Binary conversion is NOT YET tested for accuracy.

To continue, press RETURN
Diagnosis resumes after milestone Number 2          Page: 3

The program attempts to discriminate among
   FLAWs, like lack of a sticky bit,
   Serious DEFECTs, like lack of a guard digit, and
   FAILUREs, like 2+2 == 5 .
Failures may confound subsequent diagnoses.

The diagnostic capabilities of this program go beyond an earlier
program called `MACHAR', which can be found at the end of the
book  `Software Manual for the Elementary Functions' (1980) by
W. J. Cody and W. Waite. Although both programs try to discover
the Radix, Precision and range (over/underflow thresholds)
of the arithmetic, this program tries to cope with a wider variety
of pathologies, and to say how well the arithmetic is implemented.

The program is based upon a conventional radix representation for
floating-point numbers, but also allows logarithmic encoding
as used by certain early WANG machines.

BASIC version of this program (C) 1983 by Prof. W. M. Kahan;
see source comments for more history.

To continue, press RETURN
Diagnosis resumes after milestone Number 3          Page: 4

Program is now RUNNING tests on small integers:
-1, 0, 1/2, 1, 2, 3, 4, 5, 9, 27, 32 & 240 are O.K.

Searching for Radix and Precision.
Radix = 2.000000 .
Closest relative separation found is U1 = 5.9604645e-08 .

Recalculating radix and precision
 confirms closest relative separation U1 .
Radix confirmed.
The number of significant digits of the Radix is 24.000000 .

To continue, press RETURN
Diagnosis resumes after milestone Number 30          Page: 5

Subtraction appears to be normalized, as it should be.
Checking for guard digit in *, /, and -.
     *, /, and - appear to have guard digits, as they should.

To continue, press RETURN
Diagnosis resumes after milestone Number 40          Page: 6

Checking rounding on multiply, divide and add/subtract.
Multiplication appears to round correctly.
Division appears to round correctly.
Addition/Subtraction appears to round correctly.
Checking for sticky bit.
Sticky bit used incorrectly or not at all.

Does Multiplication commute?  Testing on 20 random pairs.
     No failures found in 20 integer pairs.

Running test of square root(x).
Testing if sqrt(X * X) == X for 20 Integers X.
Test for sqrt monotonicity.
sqrt has passed a test for Monotonicity.
Testing whether sqrt is rounded or chopped.
Square root is neither chopped nor correctly rounded.
Observed errors run from 0.0000000e+00 to 5.0000000e-01 ulps.

To continue, press RETURN
Diagnosis resumes after milestone Number 90          Page: 7

Testing powers Z^i for small Integers Z and i.
DEFECT:  computing
        (3.00000000000000000e+00) ^ (5.00000000000000000e+00)
        yielded 2.42998046875000000e+02;
        which compared unequal to correct 2.43000000000000000e+02 ;
                they differ by -1.95312500000000000e-03 .
Errors like this may invalidate financial calculations
        involving interest rates.
Similar discrepancies have occurred 49 times.

To continue, press RETURN
Diagnosis resumes after milestone Number 100          Page: 8

Seeking Underflow thresholds UfThold and E0.
Smallest strictly positive number found is E0 = 1.4013e-45 .
Since comparison denies Z = 0, evaluating (Z + Z) / Z should be safe.
What the machine gets for (Z + Z) / Z is  2.00000000000000000e+00 .
This is O.K., provided Over/Underflow has NOT just been signaled.
Underflow is gradual; it incurs Absolute Error =
(roundoff in UfThold) < E0.
The Underflow threshold is 1.17549467086791992e-38,  below which
calculation may suffer larger Relative error than merely roundoff.
Since underflow occurs below the threshold
UfThold = (2.00000000000000000e+00) ^ (-1.26000000000000000e+02)
only underflow should afflict the expression
        (2.00000000000000000e+00) ^ (-1.26000000000000000e+02);
actually calculating yields: nan .
SERIOUS DEFECT:  this is not between 0 and underflow
   threshold = 1.17549467086791992e-38 .

Testing X^((X + 1) / (X - 1)) vs. exp(2) = 7.38905572891235352e+00 as X -> 1.
DEFECT:  Calculated 4.00000000000000000e+00 for
        (1 + (-5.96046447753906250e-08) ^ (-3.35544320000000000e+07);
        differs from correct value by -3.38905572891235352e+00 .
        This much error may spoil financial
        calculations involving tiny interest rates.
Testing powers Z^Q at four nearly extreme values.
 ... no discrepancies found.


To continue, press RETURN
Diagnosis resumes after milestone Number 160          Page: 9

Searching for Overflow threshold:
This may generate an error.
Can `Z = -Y' overflow?
Trying it on Y = -inf .
Seems O.K.
SERIOUS DEFECT:  overflow past -inf
        shrinks to nan .
Overflow threshold is V  = 3.40282350000000000e+38 .
Overflow saturates at V0 = inf .
No Overflow should be signaled for V * 1 = 3.40282350000000000e+38
                           nor for V / 1 = 3.40282350000000000e+38 .
Any overflow signal separating this * from the one
above is a DEFECT.

DEFECT:  Comparison alleges that what prints as Z = 1.40129899978637695e-45
 is too far from sqrt(Z) ^ 2 = 0.00000000000000000e+00 .
DEFECT:  Comparison alleges that Z =               inf
 is too far from sqrt(Z) ^ 2 (nan) .

To continue, press RETURN
Diagnosis resumes after milestone Number 190          Page: 10


What message and/or values does Division by Zero produce?
This can interupt your program.  You can skip this part if you wish.
Do you wish to compute 1 / 0?     Trying to compute 1 / 0 produces ...  nan .

Do you wish to compute 0 / 0?
    Trying to compute 0 / 0 produces ...  nan .

To continue, press RETURN
Diagnosis resumes after milestone Number 220          Page: 11


The number of  SERIOUS DEFECTs  discovered = 2.
The number of  DEFECTs  discovered =         4.

The arithmetic diagnosed has unacceptable Serious Defects.
END OF TEST.

I really hate Catalina to have any defects at all. I am not ready to release this plugin yet, but if anyone has a deep knowledge of IEEE-754 and thinks they can help, I will send them a current snapshot of Catalina for them to investigate.

Rayman · 2019-08-17 19:45

I used to know something about this... Was about to point to my webpage on it, but then realized all the links were broken...
Anyway, I just fixed the links:
http://www.rayslogic.com/Software/TimexUSB/Float.htm

Rayman · 2019-08-17 19:51

I'm curious how it got this wrong:

Testing powers Z^i for small Integers Z and i.
DEFECT:  computing
        (3.00000000000000000e+00) ^ (5.00000000000000000e+00)
        yielded 2.42998046875000000e+02;
        which compared unequal to correct 2.43000000000000000e+02 ;
                they differ by -1.95312500000000000e-03 .

How is this being computed? Is it using LOG to do it?

thej · 2019-08-17 21:34

Perhaps you can avoid the complications of "not a number" and "imaginary numbers" (among other things) by using Posit Arithmetic instead of IEEE-754 floats. Posits are a direct replacement.

The code to implement will be smaller, simpler and more accurate. It has bee shown that 32bit Posits are as accurate, and in many cases, more accurate than 64bit IEEE-754 floats.

@cgracey There are several FPGA implementations too. The Posit design itself is "open" and many of the 3rd party implementations are open source

There are papers and example code (numerous implementations) available here:
https://posithub.org - home page
https://posithub.org/khub_community - Code
https://posithub.org/docs/PDS/PositEffortsSurvey.html -Survey of hardware & Software implementations

The latest conference slides and videos on Posits (March 2019) are here:
https://posithub.org/conga/2019/programme

J

jmg · 2019-08-17 21:50

Rayman wrote: »

I'm curious how it got this wrong:

Testing powers Z^i for small Integers Z and i.
DEFECT:  computing
        (3.00000000000000000e+00) ^ (5.00000000000000000e+00)
        yielded 2.42998046875000000e+02;
        which compared unequal to correct 2.43000000000000000e+02 ;
                they differ by -1.95312500000000000e-03 .

How is this being computed? Is it using LOG to do it?

Hmm... difference can mean many things
As a ppm error that is not so large
1-2.42998046875000000e+02/2.43000000000000000e+02 = 8.0375 ppm

so maybe that's a simple series length vs precision trade off ?

localroger · 2019-08-17 22:04

It's hard for me to imagine any general purpose exponentiation routine capable of handling fractional exponents getting an exact integer result for 3^5. Yes, it's almost certainly using logger rhythms and if you expect an exact result it's because you don't know how floating point math works.

The most serious "defects" seem to be failing to return NaN instead of garbage, which is another thing you shouldn't really trust your FP routines with. I wouldn't consider any of these results alarming, just things one should be aware of when using the library, as one should never regard floats as if they are reals anyway.

Rayman · 2019-08-17 22:17

Seems that either you need special treatment for small integers or higher internal precision....

localroger · 2019-08-17 22:47

Higher internal precision doesn't cut it. The problem is that FP math is base 2 and there are different truncation errors than there are in base 10. In particular the fraction 1/(decimal) 10 is infinitely repeating, as 1/3 is in decimal. This causes all kinds of problems when handling money with FP math. Rounding becomes critical and isn't an exact science. I frankly cannot imagine an exponentiation routine which would reliably return neat integer results for neat integer exponents without frankly cheating to detect and handle them specially. You just need to understand that FP doesn't return exact results and deal with that.

RossH · 2019-08-17 23:30

Sorry, I should have been clearer in my description of the problem. The inaccuracy is not the issue - I can fix those if they are serious enough - and yes, they are typically because the "pow" function uses "log" and "exp" internally, even for small integers. This is done to save space, since I want to fit the entire floating point library into one cog.

It is the two "Serious defects" that I would like to address:

1. This underflow:

The Underflow threshold is 1.17549467086791992e-38,  below which
calculation may suffer larger Relative error than merely roundoff.
Since underflow occurs below the threshold
UfThold = (2.00000000000000000e+00) ^ (-1.26000000000000000e+02)
only underflow should afflict the expression
        (2.00000000000000000e+00) ^ (-1.26000000000000000e+02);
actually calculating yields: nan .
SERIOUS DEFECT:  this is not between 0 and underflow
   threshold = 1.17549467086791992e-38 .

2. This overflow:

Searching for Overflow threshold:
This may generate an error.
Can `Z = -Y' overflow?
Trying it on Y = -inf .
Seems O.K.
SERIOUS DEFECT:  overflow past -inf
        shrinks to nan .

I do not even know enough about IEEE-754 nan/inf/overflow/underflow processing to know if these are issues I need to worry about or not. Clearly, they are cases at or near the extreme range of the IEEE-754 32 bit floats. Perhaps they are not really as serious as they sound?

RossH · 2019-08-17 23:59

thej wrote: »

Perhaps you can avoid the complications of "not a number" and "imaginary numbers" (among other things) by using Posit Arithmetic instead of IEEE-754 floats. Posits are a direct replacement.

Interesting, but I don't think posits are ready for adoption yet. The "most comprehensive" C implementation of 32 bit posits is apparently still under test:

All posit8_t and posit16_t operations are exhaustively tested with exception of p16_mulAdd and q16_fdp_add/sub operations.
posit32_t operations are still being tested exhaustively for correctness. It will take weeks to months before these tests complete.

ersmith · 2019-08-18 00:17

The serious defects you saw are probably because the floating point math library you used didn't distinguish between nan and inf, and didn't do any clamping of exponents. After a floating point operation (including add, sub, multiply, etc.) if the resulting exponent is greater than or equal to 0xff then it needs to be clamped to 0xff, and all the other bits should be set to 0 (this indicates an infinite result). If some of the other bits are non-zero the result is "not a number", which is what the complaint about overflowing inf to nan is about. A lot of the Propeller float libraries didn't catch this and basically treated nan and inf as the same, which they are not. But as defects go I'd say it's only "medium" serious; it violates the spec and can cause some programs to misbehave, but at least both inf and nan are out of range for normal floating point.

The underflow is more worrying. It may be a similar issue but with negative exponents; if the exponent is < 0 then it needs to be set to 0 and the remaining bits cleared. (One can get fancy and try to set the remaining bits correctly as if the exponent of 0 is correct; this gives a "subnormal" number, but it's not mandatory and it's OK to just set the whole 32 bits to 0 to indicate an underflow to 0). If the library doesn't do that and tries to manipulate a negative exponent it may end up setting the exponent field to all 1's, which if any other bits are set indicates nan. Underflowing from 0 to nan like this really is a bad bug, you want underflow to always go to 0 (ideally a "signed 0" with the proper sign bit of the actual, non-underflowed result) but definitely not to INF or NaN.

RossH · 2019-08-18 00:51

ersmith wrote: »

A lot of the Propeller float libraries didn't catch this and basically treated nan and inf as the same, which they are not.

Several of the floating point libraries I have tried exhibit the "overflow past -inf shrinks to nan" failure. And some of them are actual production C libraries (i.e. nothing to do with the Propeller). So if I can't fix that one I probably won't worry too much about it.

The other failure seems to come from Cam Thompson's original propeller libraries, which I based my CORDIC implementation on. I will look into that one and see if I can fix it.

Thanks for the clues.

localroger · 2019-08-18 03:33

Ross, the underflow problem is probably due to a normalization failure. Stated in decimal, a number like 100 should always be expressed as 1ex2, and .001 as 1ex-2 even if there is room in the mantissa to express the number with an exponent of 1. This is so fundamental that in some FP formats (I'm not personally familiar with IEEE-754 off the top of my head) the initial binary 1 in the mantissa is assumed to save a bit. Should the library fail to NaN when it can't normalize, the resulting number may look like a real number which is very much in error because that assumed 1 bit is out there.

evanh · 2019-08-18 05:14

I was looking at why Practrand was barfing on some random data I was throwing at a year or so back, I found a NAN in a variable which was being generated by a divide by zero due to an empty array. I never worked out why the array was empty but strikes me now that maybe that should have been an INF rather than a NAN.

RossH · 2019-08-18 06:34

Ok! I can fix the "overflow" problem if I can just find enough space - the issue is that multiplying anything by an Infinite number returns NaN, whereas it should return an Infinite number with the appropriate sign.

Having figured out how to fix it, I can now see why this choice was made - detecting Infinity in all the cases where it must be detected and then handling it correctly (including the sign, since infinity can be positive or negative) blows out the code size significantly. In the case of multiply alone, correctly detecting and handling Infinite values takes much more code than actually performing the multiply

I expecting I will find similar issues in a few of the floating point operations. However, if it does not break the "paranoia" test (as it does in the case of multiply), then I think I will just leave them alone!

EDIT: "overflow", not "underflow"!

RossH · 2019-08-18 09:54

Ok! I have now also fixed the "underflow" problem as well - the issue was that "exp" was returning NaN in the case of underflow, when it should probably have returned zero. Again, I can see why - testing for underflow (and returning zero in that case) probably added enough code to the various functions to make the original code unable to fit in a cog. But fixing it in the case of "exp" at least makes "paranoia" happy!

I doubt I will be able to fix the other (less serious) defects in the original libraries - there probably isn't enough space. But I hope to be able to fix them in the CORDIC version.

Thanks to those who offered assistance!

RossH · 2019-08-19 02:44

Just to wrap this thread up, I thought it would be interesting to put up some comparisons, to show how CORDIC performs relative to the other available floating point options:

option        library type      cogs    time (secs)     code (bytes)
======       ==============     ====    ===========     ============
 -lmc         CORDIC              1         6.7            5308
 -lmb         Float32 A & B       2        17.6            5304
 -lma         Float32 A & S/W     1        19.3            7668
 -lm          S/W                 0        34.0           14244

So CORDIC is around 3 times faster than the next best option! Not bad ...

The first option (-lmc) is the new one-cog CORDIC implementation, available only on the P2. The others are the existing P1 floating point options, as re-implemented for the P2 - i.e. either one or two cogs based on Cam Thompson's original Float32 implementation, or a software-only implementation.

I wasn't sure I was going to be able to fit the CORDIC implementation into one cog, so my original plan was going to be to replace Float32 A and B with CORDIC versions. But I have now decided to leave these options intact, so that programs being ported from the P1 can be exhibit the same floating point behaviour on the P2 (if that's ever required). If not, or for new programs on the P2, there would really be no point in using anything but -lmc if you have a spare cog, or -lm if you don't.

Having the CORDIC implemented in a cog means you can use it as a true floating point co-processor, if required - i.e. you can send a floating point request to the cog, then get on with other stuff, and then check for and retrieve the result sometime later. The C library wrappers don't do this, of course - they just initiate the request and then wait for the results - but doing it the other way is trivial.

The actual program is not really important - it just exercises various floating point functions, but here it is:

/*
 * For the timing to work, you must compile this program with -C CLOCK
 */

#include <catalina_hmi.h>
#include <time.h>
#include <math.h>

#define PI 3.1414926

#ifdef __CATALINA_P2
#define ITERATIONS 50000
#else
#define ITERATIONS 10000
#endif

void test_time (float a) {
   float b, c;
   int i;

   for (i = 0; i < ITERATIONS; i++) {
      b = cos(a + i*0.01);
      c = acos(b);
      b = sin(a + i*0.01);
      c = asin(b);
      b = tan(a + i*0.01);
      c = atan(b);
      b = pow(a + i*0.01, 2.34 + i*0.01);
   }

}

void press_key_to_continue() {
   t_string(1, "\nPress any key ...");
   k_wait();
   t_char(1,'\n');
   t_char(1,'\n');
}

int main(void) {
   int i;
   int j;
   float f;
   clock_t time1, time2;

   t_string(1, "Timing Test - To start,");

   press_key_to_continue();

   t_printf("Starting ...\n");
   time1 = clock();
   test_time(PI/100);
   time2 = clock();
   t_printf("...Completed, time was %g seconds\n", ((float)(time2-time1))/CLOCKS_PER_SEC);

   press_key_to_continue();

   return 0;
}

localroger · 2019-08-19 03:05

Congrats on making "paranoia" happy, Ross. FP is a PITA every day of the week.

AJL · 2019-08-19 04:12

Given that the P2 has hubexec, and the C library wrappers don't get busy with other work while the CORDIC work is happening, is it feasible to create another option where the CORDIC engine is used without needing a spare cog?

rogloh · 2019-08-19 04:36

AJL wrote: »

Given that the P2 has hubexec, and the C library wrappers don't get busy with other work while the CORDIC work is happening, is it feasible to create another option where the CORDIC engine is used without needing a spare cog?

I was wondering the same thing. And also if/how much faster it would be that way too by doing the FP calculations directly from the calling cog. Be good to compare it with the benchmarks from above if it is possible. Why use a separate COG in the cases where you don't need to.

RossH · 2019-08-19 05:19

AJL wrote: »

Given that the P2 has hubexec, and the C library wrappers don't get busy with other work while the CORDIC work is happening, is it feasible to create another option where the CORDIC engine is used without needing a spare cog?

Yes. Fairly simple to do. It would be a bit slower than the cog-based CORDIC, but it would save a cog.

I will leave that one as an exercise for the reader!

lonesock · 2019-08-19 14:13

Would it be difficult to also compare against the F32 library? This is just a selfish request, so feel free to ignore it. [8^)

thanks,
Jonathan

ke4pjw · 2019-08-19 17:32

I have a dumb question Ross. Is this CORDIC implementation using the built-in solver in the P2 or is it a SW implementation in a Catalina library?

RossH · 2019-08-19 23:13

lonesock wrote: »

Would it be difficult to also compare against the F32 library? This is just a selfish request, so feel free to ignore it. [8^)

thanks,
Jonathan

I don't think I know that particular library. Can you give me a link to it?

RossH · 2019-08-19 23:28

ke4pjw wrote: »

I have a dumb question Ross. Is this CORDIC implementation using the built-in solver in the P2 or is it a SW implementation in a Catalina library?

The -lmc option uses the built-in CORDIC solver wherever possible. There is still quite a lot of software involved to convert to/from IEEE-754 formats, and also to implement those functions not directly implemented by the CORDIC solver. However, this is all contained in the same cog, which can be used stand-alone. All that is in the Catalina library is some very simple wrapper functions that send the cog one (or two) IEEE-754 floating point values and specify the operation the cog is to perform, and then wait for the cog to respond **.

** Apart from the inverse hyperbolic functions - sinh, cosh, tanh. There just wasn't enough space to squeeze these in the cog, so they remain S/W only.

lonesock · 2019-08-20 15:42

Sure: obex.parallax.com/object/229

thanks,
Jonathan

RossH · 2019-08-21 07:02

lonesock wrote: »

Sure: obex.parallax.com/object/229

thanks,
Jonathan

Hi Jonathan

Just had a quick look. I can't do a direct comparison without spending more time on it than I have to spare at the moment. But it seems you use software CORDIC, but only for the inverse trig functions (asin, acos, atan)? Not for the normal trig function (sin, cos tan)?

I think this would mean your library would just be slightly faster than Float32Full. But of course it does also save a cog. Do you have any comparison figures on how fast it is compared with Float32Full?

On the P2 you could easily replace your software CORDIC with the P2's hardware CORDIC - that would speed things up. But you would really need to replace all the table lookups (log, exp, sin, cos) with CORDIC to get a more dramatic improvement.

AntoineDoinel · 2019-08-21 17:21

Ross

some years ago I've done a quick speed testing on F32 vs Float32, coming to the conclusion that is about 20% faster.
While I tried to account for call overhead, the methodology used might not be entirely "scientific" though.

In this thread:

https://forums.parallax.com/discussion/comment/1042781/#Comment_1042781

RossH · 2019-08-22 00:37

AntoineDoinel wrote: »

Ross

some years ago I've done a quick speed testing on F32 vs Float32, coming to the conclusion that is about 20% faster.
While I tried to account for call overhead, the methodology used might not be entirely "scientific" though.

In this thread:

https://forums.parallax.com/discussion/comment/1042781/#Comment_1042781

That seem to match the "gut feel" I had after reading the code.

At some point, I will do a test for accuracy of the various Catalina options, but I have other things on my list to do first. I am currently fairly confident that the hardware CORDIC option is at least as accurate as any of the other options, and in some specific cases that I did test, it was much more accurate.

Any IEEE-754 gurus out there?

Comments