Indeed, who knows? But the devil is in the details, and I don't see any practical (reasonable performance) way forward without lookup tables, and lookup tables for 32 bits will be *huge*. IEEE floating point has its flaws, but at least it's clear how to implement it.

Having said that, I hope he is able to come up with a solution. But right now it's definitely not ready for prime time.

The broad idea of elastic precision is good, and I can see that FPGA's could get some ways to a practical solution, with their 'spare' memory bits.
It seems the extra information does need to have larger memory (even if sometimes that might be less, that helps little in engineering design)

eg a 36b wide memory in FPGA, has 4 spare tag bits for normal numbers.
That's a lot and allows a simple precision tag like
0000b -> simplest int_32
0001b -> simplest int_64

or is there even room for a signed bit, to make mixing

or these MSB's could simply tag the bits of precision, as multiples of 8, starting at 32, for 32..160b ?
That would be useful for Float libs using polynomial expansions, as they then know when to stop iterating.

Whatever the final mapping, hardware can instantly dispatch the number to the correct ALU, and even promote on overflow.
The many floats tagged assume there is some intermediate speed benefits possible, and float_UP is user precision, which has precision set in another following 4b field.

The cost of 'making numbers easier' is certainly felt in silicon, so at what point does this become worthwhile ?

Vendors could include ROM FPU routines, so that values not in their FPU silicon, get SW calls, and that makes handling the threshold invisible to users.

Looks like Cortex M7's have Double FPU, and are sub $10, so maybe somewhere above this, you can get benefits.
Do you get enough users tho ?

Scripts that try to allow 32b BIT level ops and default Float, do have problems with float_32, but they can 'fix' that with a move to float_64, and float_64 already exists now in upper end MCUs, so who does that leave ?

What can be packed into one P2 COG, in terms of Float support ?

Scaled integers can do a lot of what is usually thought of as needing floating point. On desktop PCs fp is apparently as fast as int processing now, otherwise I would have used scaled ints for 95% of what I use fp for.
Then there are tables.. at least for 16-bit precision (w/32-bit output, sometimes) the table sizes aren't unmanageable. Both methods are described here (Garth Wilson's site): http://wilsonminesco.com/16bitMathTables/index.html, with a focus on the tables but with an intro to scaled integer math.

Back in the day when I was working on display software for a 3-D phased array radar http://www.radartutorial.eu/19.kartei/karte112.en.html one of our team had his code fail review because he had used floats in the code. Our processors did not have float hardware but our Coral 66 compiler supported fixed point arithmetic. The project manager said:

"If you think you need floating point maths to solve the problem you don't understand the problem"

A corollary to that could be:

"If actually do need floating point to solve the problem, now you have a problem you don't understand"

Certainly use of floats gets programmers into all kind of difficulties they don't expect.

There has been some very interesting developments in the UMUM idea. Basically since this thread started Gustafson has had some brainwaves and developed what looks like a very practical way to implement a variant of his UNUMS. He calls them POSITs. These POSITs do not require variable size operands. They are very simple to implement in hardware or software emulation. Perhaps it's best to just quote Dr Gustafson's own summary:

"A new data type called a "posit" is designed for direct drop-in replacement for IEEE Standard 754 floats. Unlike unum arithmetic, posits do not require interval-type mathematics or variable size operands, and they round if an answer is inexact, much the way floats do. However, they provide compelling advantages over floats, including simpler hardware implementation that scales from as few as two-bit operands to thousands of bits. For any bit width, they have a larger dynamic range, higher accuracy, better closure under arithmetic operations, and simpler exception-handling. For example, posits never overflow to infinity or underflow to zero, and there is no "Not-a-Number" (NaN) value. Posits should take up less space to implement in silicon than an IEEE float of the same size. With fewer gate delays per operation as well as lower silicon footprint, the posit operations per second (POPS) supported by a chip can be significantly higher than the FLOPs using similar hardware resources. GPU accelerators, in particular, could do more arithmetic per watt and per dollar yet deliver superior answer quality."

I'm kind of sold on this idea. One amazing feature of POSITs is that you can do an FFT and then run the results through the reverse FFT and get back, bit for bit, exactly the input you started with. Try doing that with floats.

Seems that people are already designing POSITs into actual silicon and there is a lot of interest from the likes of Intel, Nvidia and others. Also the IEEE has already approached Gustafson asking if POSITs can be made into a IEEE standard. So far he has resisted this idea as he, quite rightly, does not want the whole idea made over complicated and compromised by the "design by committee" that would happen during such a standardization process.

There is a great introduction to POSITs by Gustafson in this video:

There is a POSIT library in C++ here: https://github.com/libcg/bfp. Looking at that code POSIT implementation does seem to be very simple.

POSITs would be great to have in a future P3.

In the meanwhile we should get to work on a POSIT implementation in PASM for the P1 and P2....

One amazing feature of POSITs is that you can do an FFT and then run the results through the reverse FFT and get back, bit for bit, exactly the input you started with. Try doing that with floats.

I'm sold. I hate floats, because they are a mess. Obviously, I use them, but always wit a jaded, guarded eye.

These will, I suspect, quickly replace them in things like CAD, where such accuracy would eliminate a bazillion judges to make geometry representations actually work.

UNUM is a cool idea, but it is not easy to implement in a generic way. It can be very helpful for specific applications where you have control of the small set of irrational coefficients you must accommodate, and also can guarantee a finite number of additions of rationals with relatively prime factors in denominators - otherwise the denominator's representation will be unbounded as the computations progress.

As for floating point: there is the idea, and there's the expression of that idea. IEEE754 describes semantics of operations on floating point representation, but also standardizes packed binary representations. Without FP hardware, the packed representations only make sense if memory savings are needed. Otherwise, it's most efficient to work on them in an unpacked representation, ie. one variable for the mantissa, another for exponent + sign or even two - one for exponent, one for sign. The exponent is of course unbiased and signed. At that point, the precision and span can be arbitrarily scaled by choosing the width of mantissa and exponent, and all the "FP" operations are very simple integer operations.

The "rounding errors" you get with floating point are worse when using fixed point or scaled integer representations, since ordinarily rounding isn't done at all. In a majority of real-time engineering applications of floating point, when the algorithm is numerically well behaved, the precision vastly exceeds the resolution of most physical measurements, so the rounding errors are not very relevant.

Part of the problem with base-2 floating point is that conversion to/from decimal representation is very far from trivial and naive implementations are typically wrong, to the extent that they introduce excess error. This is not something that is apparent without detailed scrutiny, and if you look at decimal<->binary floating point conversion code, it's not remotely simple, and it doesn't fit on a single page. Most decimal FP output as was done in legacy BASIC systems and such is broken and doesn't roundtrip no matter the length of decimal output, i.e. if you keep converting between a decimal string and binary floating point, there are many values that diverge - the conversion keeps drifting with each roundtrip.

UNUM doesn't solve the problem of base conversion, the best I can tell. It's claimed that it solves it, but I don't see how (at least not yet). It "solves it" as well as BCD floating point would, and in fact BCD floating point is not that bad of an idea if you need fast decimal I/O.

I don't think the authors have made such claims. Rounding will always diverge when iterated. You don't need to be converting between number types to be caught out on that one.

It could possibly track the history of the rounding but I suspect it would be tricky to attribute the tracking correctly since such effects can branch. Ie: It'll be application dependent.

I had to deal with it for a motion control design in a flow wrapping machine that had three motors, software gear-locked, always rotating in one direction. The solution there was to always map the generated profile shapes to an exact full revolution. Any rounding was forced to fit by the algorithm.

Another case: On the second flow wrapper there was a change to a cheaper encoder with 2000 lines instead of the 3600 lines from the first machine. This had the effect of making the number of encoder counts per sprocket tooth be a non-integer. It wasn't until we were testing just before shipment that it became clear there was a drift.

Instead of sorting out the hardware on short notice I hacked in a piece of code, for that single machine, to add, or maybe subtract, one encoder count in every third pack division of that axis. So the hack, in effect, was tracking and compensating for the rounding.

PS: On subsequent machines we also changed the sprocket from 24 teeth to 25 teeth.

EDIT: Actually, the hack might have been a little different in the details but the principle is there. My memory is a little hazy now.

## Comments

14,663The broad idea of elastic precision is good, and I can see that FPGA's could get some ways to a practical solution, with their 'spare' memory bits.

It seems the extra information does need to have larger memory (even if

sometimesthatmightbe less, that helps little in engineering design)eg a 36b wide memory in FPGA, has 4 spare tag bits for normal numbers.

That's a lot and allows a simple precision tag like

0000b -> simplest int_32

0001b -> simplest int_64

or is there even room for a signed bit, to make mixing

0000b -> simplest uint_32

0001b -> simplest int_32

0010b -> uint_64

0011b -> int_64

.. some spares..

1000b -> float_32

1001b -> float_40

1010b -> float_48

1011b -> float_56

1100b -> float_64

1101b -> float_96

1110b -> float_128

1111b -> float_UP

or these MSB's could simply tag the bits of precision, as multiples of 8, starting at 32, for 32..160b ?

That would be useful for Float libs using polynomial expansions, as they then know when to stop iterating.

Whatever the final mapping, hardware can instantly dispatch the number to the correct ALU, and even promote on overflow.

The many floats tagged assume there is some intermediate speed benefits possible, and float_UP is user precision, which has precision set in another following 4b field.

The cost of 'making numbers easier' is certainly felt in silicon, so at what point does this become worthwhile ?

Vendors could include ROM FPU routines, so that values not in their FPU silicon, get SW calls, and that makes handling the threshold invisible to users.

Looks like Cortex M7's have Double FPU, and are sub $10, so maybe somewhere above this, you can get benefits.

Do you get enough users tho ?

Scripts that try to allow 32b BIT level ops

anddefault Float, do have problems with float_32, but they can 'fix' that with a move to float_64, and float_64 already exists now in upper end MCUs, so who does that leave ?What can be packed into one P2 COG, in terms of Float support ?

2,010Then there are tables.. at least for 16-bit precision (w/32-bit output, sometimes) the table sizes aren't unmanageable. Both methods are described here (Garth Wilson's site): http://wilsonminesco.com/16bitMathTables/index.html, with a focus on the tables but with an intro to scaled integer math.

21,233"If you think you need floating point maths to solve the problem you don't understand the problem"

A corollary to that could be:

"If actually do need floating point to solve the problem, now you have a problem you don't understand"

Certainly use of floats gets programmers into all kind of difficulties they don't expect.

See here for why the corollary may be true: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html It's a long hard read...

21,233"A new data type called a "posit" is designed for direct drop-in replacement for IEEE Standard 754 floats. Unlike unum arithmetic, posits do not require interval-type mathematics or variable size operands, and they round if an answer is inexact, much the way floats do. However, they provide compelling advantages over floats, including simpler hardware implementation that scales from as few as two-bit operands to thousands of bits. For any bit width, they have a larger dynamic range, higher accuracy, better closure under arithmetic operations, and simpler exception-handling. For example, posits never overflow to infinity or underflow to zero, and there is no "Not-a-Number" (NaN) value. Posits should take up less space to implement in silicon than an IEEE float of the same size. With fewer gate delays per operation as well as lower silicon footprint, the posit operations per second (POPS) supported by a chip can be significantly higher than the FLOPs using similar hardware resources. GPU accelerators, in particular, could do more arithmetic per watt and per dollar yet deliver superior answer quality."I'm kind of sold on this idea. One amazing feature of POSITs is that you can do an FFT and then run the results through the reverse FFT and get back, bit for bit, exactly the input you started with. Try doing that with floats.

Seems that people are already designing POSITs into actual silicon and there is a lot of interest from the likes of Intel, Nvidia and others. Also the IEEE has already approached Gustafson asking if POSITs can be made into a IEEE standard. So far he has resisted this idea as he, quite rightly, does not want the whole idea made over complicated and compromised by the "design by committee" that would happen during such a standardization process.

There is a great introduction to POSITs by Gustafson in this video:

There is a POSIT library in C++ here: https://github.com/libcg/bfp. Looking at that code POSIT implementation does seem to be very simple.

POSITs would be great to have in a future P3.

In the meanwhile we should get to work on a POSIT implementation in PASM for the P1 and P2....

13,588This seems like the direction things need to go in.

10,179I'm sold. I hate floats, because they are a mess. Obviously, I use them, but always wit a jaded, guarded eye.

These will, I suspect, quickly replace them in things like CAD, where such accuracy would eliminate a bazillion judges to make geometry representations actually work.

10,9482,99410,94893As for floating point: there is the idea, and there's the expression of that idea. IEEE754 describes semantics of operations on floating point representation, but also standardizes

packedbinary representations. Without FP hardware, the packed representations only make sense if memory savings are needed. Otherwise, it's most efficient to work on them in an unpacked representation, ie. one variable for the mantissa, another for exponent + sign or even two - one for exponent, one for sign. The exponent is of course unbiased and signed. At that point, the precision and span can be arbitrarily scaled by choosing the width of mantissa and exponent, and all the "FP" operations are very simple integer operations.The "rounding errors" you get with floating point are worse when using fixed point or scaled integer representations, since ordinarily rounding isn't done at all. In a majority of real-time engineering applications of floating point, when the algorithm is numerically well behaved, the precision vastly exceeds the resolution of most physical measurements, so the rounding errors are not very relevant.

Part of the problem with base-2 floating point is that conversion to/from decimal representation is very far from trivial and naive implementations are typically wrong, to the extent that they introduce excess error. This is not something that is apparent without detailed scrutiny, and if you look at decimal<->binary floating point conversion code, it's not remotely simple, and it doesn't fit on a single page. Most decimal FP output as was done in legacy BASIC systems and such is broken and doesn't roundtrip no matter the length of decimal output, i.e. if you keep converting between a decimal string and binary floating point, there are many values that diverge - the conversion keeps drifting with each roundtrip.

UNUM doesn't solve the problem of base conversion, the best I can tell. It's claimed that it solves it, but I don't see how (at least not yet). It "solves it" as well as BCD floating point would, and in fact BCD floating point is not that bad of an idea if you need fast decimal I/O.

10,948It could possibly track the history of the rounding but I suspect it would be tricky to attribute the tracking correctly since such effects can branch. Ie: It'll be application dependent.

I had to deal with it for a motion control design in a flow wrapping machine that had three motors, software gear-locked, always rotating in one direction. The solution there was to always map the generated profile shapes to an exact full revolution. Any rounding was forced to fit by the algorithm.

10,948Instead of sorting out the hardware on short notice I hacked in a piece of code, for that single machine, to add, or maybe subtract, one encoder count in every third pack division of that axis. So the hack, in effect, was tracking and compensating for the rounding.

PS: On subsequent machines we also changed the sprocket from 24 teeth to 25 teeth.

EDIT: Actually, the hack might have been a little different in the details but the principle is there. My memory is a little hazy now.