Will P2 have an FPU?

samuell · 2016-04-22 14:04

Hi,

I noticed that the P1 doesn't have a floating-point unit, needing an external one for hardware floating-point calculations. Will the P2 have one?

Kind regards, Samuel Lourenço

Seairth · 2016-04-22 14:22

No. The new capabilities (multipliers, CORDIC solver, etc) are only for integer and fixed-point math.

Electrodude · 2016-04-22 14:26

Nope. But that doesn't mean an external one is necessary. Software floating point emulation works fine on the P1 - things that need to be fast such as the ELEV-8 flight controller get by fine with software floating point (using the engine that's normally used with Spin, not the C floating point library).

The P2 does have CORDIC hardware in the hub for doing harder 32x32 integer math, such as mul, div, trig, sqrt, log, exp, and probably some others I forgot, but it has no hardware FPU. But it should be easy do to floating point in software with help from the CORDIC. In fact, with a smart enough compiler (e.g. a human manually writing PASM), I'll bet you could do float math almost as efficiently as integer math by never packing the floats but always storing them as two separate longs, mantissa and exponent.

samuell · 2016-04-22 14:30

Seairth wrote: »

No. The new capabilities (multipliers, CORDIC solver, etc) are only for integer and fixed-point math.

Thanks Seairth. I asked, but it is kind of a non-issue. With 1MB of program memory, one can implement this in software. I was just curious.

samuell · 2016-04-22 14:41

Electrodude wrote: »

Nope. But that doesn't mean an external one is necessary. Software floating point emulation works fine on the P1 - things that need to be fast such as the ELEV-8 flight controller get by fine with software floating point (using the engine that's normally used with Spin, not the C floating point library).

The P2 does have CORDIC hardware in the hub for doing harder 32x32 integer math, such as mul, div, trig, sqrt, log, exp, and probably some others I forgot, but it has no hardware FPU. But it should be easy do to floating point in software with help from the CORDIC. In fact, with a smart enough compiler (e.g. a human manually writing PASM), I'll bet you could do float math almost as efficiently as integer math by never packing the floats but always storing them as two separate longs, mantissa and exponent.

Well, doing sqrt would be nice, although I found a method for doing integer roots (need to validate it, though, but the cryptic code for isqrt() is available on Wikipedia). It was precisely because of the sqrt I was asking this, actually. The C floating point math takes a considerable amount of space.

Nice to learn that. CORDIC is great.

Seairth · 2016-04-22 14:52

samuell wrote: »

Seairth wrote: »

No. The new capabilities (multipliers, CORDIC solver, etc) are only for integer and fixed-point math.

Thanks Seairth. I asked, but it is kind of a non-issue. With 1MB of program memory, one can implement this in software. I was just curious.

minor note: the P2 will have only 512KB hub memory. Only the FPGA image has 1MB.

samuell · 2016-04-22 14:56

Seairth wrote: »

samuell wrote: »

Seairth wrote: »

No. The new capabilities (multipliers, CORDIC solver, etc) are only for integer and fixed-point math.

Thanks Seairth. I asked, but it is kind of a non-issue. With 1MB of program memory, one can implement this in software. I was just curious.

minor note: the P2 will have only 512KB hub memory. Only the FPGA image has 1MB.

Thanks for the correction. 512KB is good too. The 32KB limitation of the P1 is a bit restraining if you are using C, but I guess that this won't be a problem in the P2.

thej · 2016-04-22 19:03

Maybe Floating Point can be leapfrogged!!

There's another (better!) option for floating point: Use a simpler number system!
John Gustafson (Think Gustafson's Law) devised "Unum", a new number format.

This is his list of problems he see's computing facing:
-Too much energy and power needed per calculation
-More hardware parallelism than we know how to use
-Not enough bandwidth (the memory wall)
-Rounding errors more treacherous than people realize
-Rounding errors prevent use of parallel methods
-Sampling errors turn physics simulations into guesswork
-Numerical methods are hard to use, require experts
-IEEE floats give different answers on different platforms

UNUMs achieve:
-Greater precision (with fewer bits)
-No rounding error
-No sampling error
-reduced memory/bandwidth requirements
-Massively parallel
-and more...

"Imagine having provable bounds on answers for the first time, yet with easier programming, less storage, less bandwidth use, less energy/power demands, and abundant parallelism."

..and it only needs integers to work !! Suddenly a muti-core microcontroller can leap into (lowend) workstation class computing !!

Some UNUM applications using "Ubox" method
-Photorealistic computer graphics
-N-Body problems
-Structural analysis
-Laplace's equation
-Perfect gas models "without" statistical mechanics

UNUMs could be implimented in software on the P2 or in *HARDWARE* on the P3.
Imagine the P3 with the ALU designed around this number system !!!

Check out his presentations here:
"A Radical Approach to Computation with Real Numbers: Unums version 2.0"
http://www.johngustafson.net/presentations/Multicore2016-JLG.pdf

"Unun Computing: An energy efficient and Massively Parallel approach to valid numerics"
http://www.slideshare.net/insideHPC/unum-computing-an-energy-efficient-and-massively-parallel-approach-to-valid-numerics

"The End of Numerical Error"
http://arith22.gforge.inria.fr/slides/06-gustafson.pdf

...and his book:

"The End of Error: Unum Computing"
https://www.crcpress.com/The-End-of-Error-Unum-Computing/Gustafson/9781482239867

Electrodude · 2016-04-22 19:40

Wow, that's ingenious.

ErNa · 2016-04-22 19:42

yeah, fascinating stuff! How long will it take to understand it. Longer then to learn forth?

KeithE · 2016-04-22 21:49

Thanks for posting those unum references. A friend did a fair amount of x86 FPU work in the past, so we'll have to dig into this idea.

cgracey · 2016-04-22 22:05

ErNa wrote: »

yeah, fascinating stuff! How long will it take to understand it. Longer then to learn forth?

Ah, the kinds of dilemmas that steer our lives. How many things do we NOT know that if we could get into our brains would open up new dimensions?

thej · 2016-04-22 22:07

I feel this work could allow Parallax to seriously leapfrog other "contenders" in the microcontroller world.
The P2's main advantage here is that it is multi-core.
I can see some seriously cool stuff being done with Unums and P2 (P3 ?!?)

;-)

cgracey · 2016-04-22 22:21

That UNUM stuff that thej posted is really interesting!

It goes to show that there are some fundamentals right under our noses that could stand a lot of improvement.

I remember the frustration rounding errors would cause when I first started programming in BASIC. Discovering integer math in assembly language was a restorative experience. That's why I've never been that enthusiastic about floating point. I could totally get on-board with this UNUM idea, though! That Parallela guy needs to redo his floating-point monster to use this paradigm.

That UNUM concept is going to be the future, I think. No problem to do in software, either.

Think how something so simple has evaded people's thinking for so long! How much grief have rounding errors caused everybody over the years? For engineers, this has got to approach five million man years' of wary uneasiness (100,000 engineers, on average, over 50 years). If we had had UNUMs, instead of standard floats, all along, who knows what kind of different directions things might have headed in and how much further along things might be? It would have lent some geometric simplification and lubrication to everything.

cgracey · 2016-04-22 22:24

ErNa wrote: »

yeah, fascinating stuff! How long will it take to understand it. Longer then to learn forth?

Hey, your head is full of ideas that are MORE difficult to understand than UNUMs.

Rayman · 2016-04-22 22:40

that stuff is either brilliant or wacky, not sure which one at first look...

jmg · 2016-04-22 23:49

thej wrote: »

Maybe Floating Point can be leapfrogged!!

There's another (better!) option for floating point: Use a simpler number system!
John Gustafson (Think Gustafson's Law) devised "Unum", a new number format.

UNUMs could be implimented in software on the P2 or in *HARDWARE* on the P3.
Imagine the P3 with the ALU designed around this number system !!!

Check out his presentations here:
"A Radical Approach to Computation with Real Numbers: Unums version 2.0"
http://www.johngustafson.net/presentations/Multicore2016-JLG.pdf

"Unun Computing: An energy efficient and Massively Parallel approach to valid numerics"
http://www.slideshare.net/insideHPC/unum-computing-an-energy-efficient-and-massively-parallel-approach-to-valid-numerics

"The End of Numerical Error"
http://arith22.gforge.inria.fr/slides/06-gustafson.pdf

...and his book:

"The End of Error: Unum Computing"
https://www.crcpress.com/The-End-of-Error-Unum-Computing/Gustafson/9781482239867

Interesting. - but no numbers on his numbers ? (no speed comparisons using real code ?)

I see he also mentions Pendulum Equations.

I recall looking for a real pendulum equation (not 'for small angles..') a while back, and it was harder than expected.
I finally found this, I kept a link to :

http://leapsecond.com/hsn2006/pendulum-period-agm.pdf

There was a recent example of a wooden pendulum clock, that calibrated itself to a GPS reference by carefully varying the amplitude of the swing - I cannot find the link.

Has anyone done these in FPGAs ?

Comments here are interesting too:
http://hackaday.com/2015/10/22/an-improvement-to-floating-point-numbers/

thej · 2016-04-23 02:18

Interesting. - but no numbers on his numbers ? (no speed comparisons using real code ?)

On page 29 of his "Radical approach..." presentation he says:
"Low-precision rigorous math is possible at 100x the speed of sloppy IEEE floats"

jmg · 2016-04-23 02:49

thej wrote: »

Interesting. - but no numbers on his numbers ? (no speed comparisons using real code ?)

On page 29 of his "Radical approach..." presentation he says:
"Low-precision rigorous math is possible at 100x the speed of sloppy IEEE floats"

Well... an arm-waving claim, is not quite what I am after.
If there was a solid basis to this, and real gains to be had, I would expect to see a library crafted to use them, and some benchmarks proving just how much faster they are, deployed in the real world.

I also see this claim :
" Unums: Correct answer to 23 decimals using an average of only 75 bits per number. Not even IEEE 128-bit precision can do that. Precision, range adjust automatically."
- more useful, would be what the peak-bits were, not so much the average.
Any HW or SW has to be sized to the peak expected, and probably needs to allocate storage for that, even if it may use less.

I see merit in variable precision, but suspect any practical solution would be more like a manual gear-box, with a choice of a few, than a continuously variable transmission.

cgracey · 2016-04-23 04:54

jmg wrote: »

thej wrote: »

Interesting. - but no numbers on his numbers ? (no speed comparisons using real code ?)

On page 29 of his "Radical approach..." presentation he says:
"Low-precision rigorous math is possible at 100x the speed of sloppy IEEE floats"

Well... an arm-waving claim, is not quite what I am after.
If there was a solid basis to this, and real gains to be had, I would expect to see a library crafted to use them, and some benchmarks proving just how much faster they are, deployed in the real world.

I also see this claim :
" Unums: Correct answer to 23 decimals using an average of only 75 bits per number. Not even IEEE 128-bit precision can do that. Precision, range adjust automatically."
- more useful, would be what the peak-bits were, not so much the average.
Any HW or SW has to be sized to the peak expected, and probably needs to allocate storage for that, even if it may use less.

I see merit in variable precision, but suspect any practical solution would be more like a manual gear-box, with a choice of a few, than a continuously variable transmission.

It seems there's some initial decisions needed on input numbers.

This just seems like a way to propagate uncertainties and also to not waste time computing beyond sufficient resolutions.

I read through the whole 2nd link, which was a .pdf, and it seemed pretty understandable. Certainly, we've all experienced frustrations due to rounding and something to get around that would be welcomed.

Heater. · 2016-04-23 07:12

I'm not sold on the UNUMs idea. And what I have read so far does not do a good job of selling it.

These articles start out by pointing out the flaws in IEEE floats, which is fair enough.

Then we get a story about the Ariane 5 rocket failure where supposedly a 65 bit float was stuffed into a 16 bit integer. This is obviously not an example of a flaw in IEEE floats but rather a stupid coding error.

The there the story of the SCUD missile where somehow floats were squashed into 24 bits causing an ever accumulating error in timekeeping and hence it's accuracy. Again not a problem with the IEEE floats as such.

The solution to those latter problems is not give the programmer a choice in number format and size. If they only had one number type, 64 bit IEEE float for example, they could not have chosen to mess it up. Javascript does this.

What of the format itself then?

Seems like a neat idea. Variable sized exponents and mantissa give you whatever precision you want. Being able to store small, low precision numbers in a few bits and hence save the massive hit of stepping out side of CPU cache could be a huge speed up in some cases.

A 100 times speed up? I don't see it. I'm sure when most calculations are going on, in a physics simulation or whatever you'll be wanting as many bits as you can get. No saving on memory bandwidth there.

I don't see UNUM being at all suitable for little micro-controllers, what with all that variable bit width stuff going on.

Seems that the onus is on Gustafson now to build a chip and demonstrate the superiority of UNUM.

What I'd like to see is an implementation of Douglas Crockford's DEC64 format. That gives precise results for decimal arithmetic as expected by most people working with money. http://dec64.com/

Heater. · 2016-04-23 07:21

jmg,

Thanks for the heads up on the pendulum formula. That is magic. I have no use for it but it's magic. I can't help thinking that AGM thing has uses though.

cgracey · 2016-04-23 07:30

The UNUM II concept outlined in the first link talks about getting adds/subs/muls/divs all into one clock via lookups. This is where the speedup occurs. There's talk of parallelizing these operations, as well, since normal float operations differ, depending on order (rounding), but these computations don't - so you can do lots at once.

Heater. · 2016-04-23 09:25

Ah thanks Chip. Somehow I skipped over that first link. The UNUM II Iooks like a workable thing, even in an MCU, fixed width and all.

ersmith · 2016-04-23 12:27

cgracey wrote: »

The UNUM II concept outlined in the first link talks about getting adds/subs/muls/divs all into one clock via lookups. This is where the speedup occurs. There's talk of parallelizing these operations, as well, since normal float operations differ, depending on order (rounding), but these computations don't - so you can do lots at once.

Of course he does seem to skip over the little detail of how big the lookup tables get. The UNUM concept seems to work well for very low precision numbers, but I'm not sold on how practical they would be for larger numbers or with more precision.

Heater. · 2016-04-23 13:20

He did mention the lookup table size issue in slide 34 of the first link.

Future Directions
• Create 32-bit and 64-bit unums with new approach; table look-up still practical?

Presumably whatever is in the look up tables can be calculated on the fly. Less performant but maybe you get that back by being able to parallelize things massively as he suggests.

Who knows?

Love to see his software implementations of UNUMs in C/C++ when they are done.

ersmith · 2016-04-23 15:03

Heater. wrote: »

He did mention the lookup table size issue in slide 34 of the first link.

Future Directions
• Create 32-bit and 64-bit unums with new approach; table look-up still practical?

Presumably whatever is in the look up tables can be calculated on the fly. Less performant but maybe you get that back by being able to parallelize things massively as he suggests.

Indeed, who knows? But the devil is in the details, and I don't see any practical (reasonable performance) way forward without lookup tables, and lookup tables for 32 bits will be *huge*. IEEE floating point has its flaws, but at least it's clear how to implement it.

Having said that, I hope he is able to come up with a solution. But right now it's definitely not ready for prime time.

Heater. · 2016-04-23 15:23

Yep. Perhaps it's even useful in software. After all those guys calculating trillions of digits of Pi are not using floating point hardware: https://en.wikipedia.org/wiki/Chronology_of_computation_of_π

Then again computation of Pi does not seem very amenable to parallel processing, they do it on off the shelf PC's!

So what about the DEC-64 thing?

Delus · 2016-04-23 18:03

wouldn't any solution resulting in an irrational number result in a unum of infinite bits?...

Electrodude · 2016-04-23 19:43

Delus wrote: »

wouldn't any solution resulting in an irrational number result in a unum of infinite bits?...

I think that's what the uncertainty bit is for.

Heater. · 2016-04-23 19:46

Delus,

Yeah, that's what bugs me about the whole idea.

As soon as you get into 1/3 or the sqrt(2) or calculating π you are into an infinite number of digits.

Where does the UNUM system draw the line on memory usage and time to do all this?

If there is a line drawn anywhere then the results are wrong!