Will P2 have an FPU?
samuell
Posts: 554
Hi,
I noticed that the P1 doesn't have a floating-point unit, needing an external one for hardware floating-point calculations. Will the P2 have one?
Kind regards, Samuel Lourenço
I noticed that the P1 doesn't have a floating-point unit, needing an external one for hardware floating-point calculations. Will the P2 have one?
Kind regards, Samuel Lourenço
Comments
The P2 does have CORDIC hardware in the hub for doing harder 32x32 integer math, such as mul, div, trig, sqrt, log, exp, and probably some others I forgot, but it has no hardware FPU. But it should be easy do to floating point in software with help from the CORDIC. In fact, with a smart enough compiler (e.g. a human manually writing PASM), I'll bet you could do float math almost as efficiently as integer math by never packing the floats but always storing them as two separate longs, mantissa and exponent.
Nice to learn that. CORDIC is great.
minor note: the P2 will have only 512KB hub memory. Only the FPGA image has 1MB.
There's another (better!) option for floating point: Use a simpler number system!
John Gustafson (Think Gustafson's Law) devised "Unum", a new number format.
This is his list of problems he see's computing facing:
-Too much energy and power needed per calculation
-More hardware parallelism than we know how to use
-Not enough bandwidth (the memory wall)
-Rounding errors more treacherous than people realize
-Rounding errors prevent use of parallel methods
-Sampling errors turn physics simulations into guesswork
-Numerical methods are hard to use, require experts
-IEEE floats give different answers on different platforms
UNUMs achieve:
-Greater precision (with fewer bits)
-No rounding error
-No sampling error
-reduced memory/bandwidth requirements
-Massively parallel
-and more...
"Imagine having provable bounds on answers for the first time, yet with easier programming, less storage, less bandwidth use, less energy/power demands, and abundant parallelism."
..and it only needs integers to work !! Suddenly a muti-core microcontroller can leap into (lowend) workstation class computing !!
Some UNUM applications using "Ubox" method
-Photorealistic computer graphics
-N-Body problems
-Structural analysis
-Laplace's equation
-Perfect gas models "without" statistical mechanics
UNUMs could be implimented in software on the P2 or in *HARDWARE* on the P3.
Imagine the P3 with the ALU designed around this number system !!!
Check out his presentations here:
"A Radical Approach to Computation with Real Numbers: Unums version 2.0"
http://www.johngustafson.net/presentations/Multicore2016-JLG.pdf
"Unun Computing: An energy efficient and Massively Parallel approach to valid numerics"
http://www.slideshare.net/insideHPC/unum-computing-an-energy-efficient-and-massively-parallel-approach-to-valid-numerics
"The End of Numerical Error"
http://arith22.gforge.inria.fr/slides/06-gustafson.pdf
...and his book:
"The End of Error: Unum Computing"
https://www.crcpress.com/The-End-of-Error-Unum-Computing/Gustafson/9781482239867
Ah, the kinds of dilemmas that steer our lives. How many things do we NOT know that if we could get into our brains would open up new dimensions?
The P2's main advantage here is that it is multi-core.
I can see some seriously cool stuff being done with Unums and P2 (P3 ?!?)
;-)
It goes to show that there are some fundamentals right under our noses that could stand a lot of improvement.
I remember the frustration rounding errors would cause when I first started programming in BASIC. Discovering integer math in assembly language was a restorative experience. That's why I've never been that enthusiastic about floating point. I could totally get on-board with this UNUM idea, though! That Parallela guy needs to redo his floating-point monster to use this paradigm.
That UNUM concept is going to be the future, I think. No problem to do in software, either.
Think how something so simple has evaded people's thinking for so long! How much grief have rounding errors caused everybody over the years? For engineers, this has got to approach five million man years' of wary uneasiness (100,000 engineers, on average, over 50 years). If we had had UNUMs, instead of standard floats, all along, who knows what kind of different directions things might have headed in and how much further along things might be? It would have lent some geometric simplification and lubrication to everything.
Hey, your head is full of ideas that are MORE difficult to understand than UNUMs.
Interesting. - but no numbers on his numbers ? (no speed comparisons using real code ?)
I see he also mentions Pendulum Equations.
I recall looking for a real pendulum equation (not 'for small angles..') a while back, and it was harder than expected.
I finally found this, I kept a link to :
http://leapsecond.com/hsn2006/pendulum-period-agm.pdf
There was a recent example of a wooden pendulum clock, that calibrated itself to a GPS reference by carefully varying the amplitude of the swing - I cannot find the link.
Has anyone done these in FPGAs ?
Comments here are interesting too:
http://hackaday.com/2015/10/22/an-improvement-to-floating-point-numbers/
Interesting. - but no numbers on his numbers ? (no speed comparisons using real code ?)
On page 29 of his "Radical approach..." presentation he says:
"Low-precision rigorous math is possible at 100x the speed of sloppy IEEE floats"
Well... an arm-waving claim, is not quite what I am after.
If there was a solid basis to this, and real gains to be had, I would expect to see a library crafted to use them, and some benchmarks proving just how much faster they are, deployed in the real world.
I also see this claim :
" Unums: Correct answer to 23 decimals using an average of only 75 bits per number. Not even IEEE 128-bit precision can do that. Precision, range adjust automatically."
- more useful, would be what the peak-bits were, not so much the average.
Any HW or SW has to be sized to the peak expected, and probably needs to allocate storage for that, even if it may use less.
I see merit in variable precision, but suspect any practical solution would be more like a manual gear-box, with a choice of a few, than a continuously variable transmission.
It seems there's some initial decisions needed on input numbers.
This just seems like a way to propagate uncertainties and also to not waste time computing beyond sufficient resolutions.
I read through the whole 2nd link, which was a .pdf, and it seemed pretty understandable. Certainly, we've all experienced frustrations due to rounding and something to get around that would be welcomed.
These articles start out by pointing out the flaws in IEEE floats, which is fair enough.
Then we get a story about the Ariane 5 rocket failure where supposedly a 65 bit float was stuffed into a 16 bit integer. This is obviously not an example of a flaw in IEEE floats but rather a stupid coding error.
The there the story of the SCUD missile where somehow floats were squashed into 24 bits causing an ever accumulating error in timekeeping and hence it's accuracy. Again not a problem with the IEEE floats as such.
The solution to those latter problems is not give the programmer a choice in number format and size. If they only had one number type, 64 bit IEEE float for example, they could not have chosen to mess it up. Javascript does this.
What of the format itself then?
Seems like a neat idea. Variable sized exponents and mantissa give you whatever precision you want. Being able to store small, low precision numbers in a few bits and hence save the massive hit of stepping out side of CPU cache could be a huge speed up in some cases.
A 100 times speed up? I don't see it. I'm sure when most calculations are going on, in a physics simulation or whatever you'll be wanting as many bits as you can get. No saving on memory bandwidth there.
I don't see UNUM being at all suitable for little micro-controllers, what with all that variable bit width stuff going on.
Seems that the onus is on Gustafson now to build a chip and demonstrate the superiority of UNUM.
What I'd like to see is an implementation of Douglas Crockford's DEC64 format. That gives precise results for decimal arithmetic as expected by most people working with money. http://dec64.com/
Thanks for the heads up on the pendulum formula. That is magic. I have no use for it but it's magic. I can't help thinking that AGM thing has uses though.
Of course he does seem to skip over the little detail of how big the lookup tables get. The UNUM concept seems to work well for very low precision numbers, but I'm not sold on how practical they would be for larger numbers or with more precision.
Future Directions
• Create 32-bit and 64-bit unums with new approach; table look-up still practical?
Presumably whatever is in the look up tables can be calculated on the fly. Less performant but maybe you get that back by being able to parallelize things massively as he suggests.
Who knows?
Love to see his software implementations of UNUMs in C/C++ when they are done.
Indeed, who knows? But the devil is in the details, and I don't see any practical (reasonable performance) way forward without lookup tables, and lookup tables for 32 bits will be *huge*. IEEE floating point has its flaws, but at least it's clear how to implement it.
Having said that, I hope he is able to come up with a solution. But right now it's definitely not ready for prime time.
Then again computation of Pi does not seem very amenable to parallel processing, they do it on off the shelf PC's!
So what about the DEC-64 thing?
I think that's what the uncertainty bit is for.
Yeah, that's what bugs me about the whole idea.
As soon as you get into 1/3 or the sqrt(2) or calculating π you are into an infinite number of digits.
Where does the UNUM system draw the line on memory usage and time to do all this?
If there is a line drawn anywhere then the results are wrong!