Floating Point
Sal Ammoniac
Posts: 213
Will Spin and the IDE ever be extended to support native floating so I can write code like:
Temperature := 75.0 * (2.7 - (3.4535/1023.0*rawdata)) - 55.5734
instead of
Temperature := fp.FSub(fp.Fmul(75.0, (fp.FSub(2.7, fp.FMul(fp.FDiv(3.4535, 1023), rawdata), 55.5734)
or having to convert everything to scaled fixed point?
Temperature := 75.0 * (2.7 - (3.4535/1023.0*rawdata)) - 55.5734
instead of
Temperature := fp.FSub(fp.Fmul(75.0, (fp.FSub(2.7, fp.FMul(fp.FDiv(3.4535, 1023), rawdata), 55.5734)
or having to convert everything to scaled fixed point?
Comments
First, if you're into robotics and use trig for kinematic transforms or position computation, a friendlier syntax can help avoid bugs. For me this is the less important issue.
Second, if you include two different Spin objects that use floating point internally there can be unwanted outcomes depending upon the internals of the included object. More RAM may be consumed with different floating point implementations, and each included object may consume an additional cog for F32's floating point co-processor.
A built in implementation could avoid these problems by coordinating the implementation and coordinating cog usage in some manner.
I might be wrong here, but doesn't switching to Catalina sacrifice the network effects Spin affords? We couldn't use other Spin objects in C/C++ programs and would have to translate them into C/C++? Also, C/C++ syntax isn't a paradise for floating point either. It's pretty easy to accidentally invoke an integer operation and get a truncation error.
Not that this is an unsolvable problem, to some degree exactly this integration is done in catalina in a way in find quite good so far (I am not an expert, I am just making myself familiar with catalina for some time now and did some test programs).
The reason why I wrote that I doubt it makes much sense in spin is a bit about the philosophy I see in the spin language. It closely resembles the features of the spin byte code - including libraries as builtins into the language goes in my oppinion a bit in the wrong direction (but that's just me).
You are of course right to argue about the network effects, but that is of course true for every language for the propeller which is not spin (propbasic for example). It seems there are people out there who prefer to use something else than spin due to different reasons (syntax which is not absolutely specific to the prop, higher speed than spin, support for external ram or whatever else).
I think your argument is similar to something I thought about some time ago: The interoperatibility between code written in non-spin and spin should be better. This is for sure not completely trivial for catalina.
About the truncation by accidantial invokation of integer operations: Do you have an example in mind, usually in C the operators are converted to float in a mixed operation and not truncated and usually the compiler should throw a warning in situations where you assign let's say a float to an integer variable where truncation can happen.
Since this post is now a bit off topic (the thread is about spin not C) I will end at this point. But I will end with some kind of warning.
Adding new language elements and features to a programming language as afterthoughts can easily end up in a cluttered result, which makes an otherwise clean language sooner or later a mess if not a lot of care is taken to avoid it.
I see more value in having something like a unified standard library (or standard libraries) for the more common tasks for spin than new syntax (but again that's just my personal point of view).
I don't have access to a C compiler to check this, but my error was something like this:
float result = intVal1 / intVal2 * realVal1
The C compiler used an integer division and then converted the int quotient to float before the multiplication. The result was almost right except for mysterious zeros that cropped up when the two integer values were close enough in value to return zero from the division. I felt pretty foolish when I found it.
Now how does one expect this floating point stuff to be done in Spin?
There for sure isn't room in the COG for the Spin interpreter to have floating point op codes added in.
That means the floating point would have to be done as sequence of integer op code generated by the compiler. Perhaps in-line or perhaps added to the binary as library functions. But this is going to be awful slow.
Alternatively the floating point would be done in PASM in a separate helper COG. But then if all your application COGs use floats then there will need to be locks around that interface. Messy.
All in all I think the fixed point idea has merit. Floating point is generally unnecessary and eats a lot of resources.
Sure, many things can be converted from floating point to fixed point, but it's often tedious to do so and this only works for smaller numbers. How do you represent 134512.3846 in fixed point if you need to maintain four digits to the right of the decimal? You'd need to use 64-bit integers, which aren't well-supported in Spin either.
Someone mentioned that the Propeller doesn't natively support FP in hardware -- so? The other two architectures I use on a regular basis, ARM Cortex-M3 and XMOS, don't either, but their development environments fully support floating point as a built-in part of the language.
Ken Gracey has said on these forums that one of Parallax's focuses in 2011 is to make the Propeller more appealing to professionals for use in more traditional embedded products. Sure, people here have proved that you can emulate an ancient Z80 or run 1980s-style video games on a Propeller, but have any big companies used them in a deeply embedded product such as an engine controller?
I see several things that will hold the Propeller back in this aspect (in no particular order):
1. Lack of native FP support in the language supported by the chip vendor. Perhaps Parallax should license one of the existing C packages and support/develop it themselves.
2. Lack of hardware debug support. For the last decade (or more), all other embedded processors have incorporated JTAG or another hardware debug mechanism in their products. The Propeller doesn't have this capability. Nor does it have boundary scan, which is how most of the big boys test their systems on the manufacturing line. There is ViewPort, but it's an add-on product and it's invasive (need to modify your code to use it).
3. The well-known memory space limitations (496 longs per cog and 32KB for hub memory). There are work-arounds to these limitations, yes, but none of them are ideal.
Use the -Wconversion switch to enable such warnings (-Wall does not enable all warnings even if its name suggests that).
Good point. How about 1.602 x 10^-19 -- how would you handle that one in fixed point?
For addition and subtraction you are of course right.
I agree, converting an existing app that uses floating point to fixed point may be a chore.
You do realize you cannot represent 134512.3846 in 32 bit floating point?
With 32 bit floats you get 24 bits of mantisa. That only allows you to count from 0 to 16777215 in the mantisa. That is only 8 digits of accuracy. You are asking for 10 digits of accuracy.
Try this simple experiment in C:
See how wrong the results are!
So yes you would need to use 64 bit floats. But, hey, then you could use some 64 bit fixed point scheme anyway.
As my old boss used to say "If you think you need floating point to solve the problem you don't understand the problem":)
Up shot is, that if you don't think your calculations through and look out for overflows/underflows of intermediate results and such you are going to have trouble floating point or not.
Thanks for the -Wconversion tip. Shows how often I have used floating point in my career:)
Also, I programmed for six years of my career in the Bliss programming language, it is also a typeless language without floating point built in. For system software I never missed it.
That's why either a compiler supported fixed or floating point solution would be helpful.
Of course. I almost always use 64-bit floating point when I use floating point at all.
That may apply to things like systems programming, and blinking LEDs, and many other things like that, but there are some applications where true floating point makes things easier and more efficient.
There is a need for floating point in the embedded industry, which is why companies like ARM are adding hardware FP units to their MCUs (the Cortex-M4 has single precision hardware floating point).
That goes without saying. Of course I do that checking with any calculations that I do, FP or not.
Fair enough. I've just assumed 64 bit float support is completely off the table for the Prop. It would be bigger and slower than F32. Notice that no one has felt the need too create an F64 yet. Besides Spin does not even have 64 bit integer data type.
At the time I was hit with that quote (often) we were building the plot processing and display for a 3 dimensional radar system. There are some pages about it here http://www.radarpages.co.uk/mob/martello/martello2.htm That was long enough ago that we only had 16 bit processors and fixed point maths. The architecture looked like a Prop only a few thousand times bigger! Don't forget that man landed on the moon with no floating point support. That's far away from flashing LEDs is it not?
It just happens that I am now involved with an embedded control system that uses a lot of floating point, it was developed on PC's where floating point is almost free and no one ever thought about it. Now for reasons of physical size and cost it is required to port this to a tiny ARM platform. I even with a 1Gh ARM Cortex-A8 that has VFPV3 it's hitting performance limits. One existing ARM device where we would like this code to run is only 200MHz with no hardware floating point, porting to that is going to be tough. If only they had used fixed point to start with...
With all overhead which comes from LMM in catalina I get for the floats about 70 kFlops so about a bit more than 1000 cycles per multiply but this is the LMM pasm version not the version with a dedicated fpu cog, I will compare that tomorrow.
math for a propeller project I would use a tiny85 as
a math processor. You could just use I2C and hook the
tiny to the same pins the eeprom chip is on. With 8kb
of flash on the tiny85 and 20mips of processing power
you could create a very powerful set of fast math routines.
This would only add about 2.00 to the project cost and
would not use up any IO pins since the eeprom pins
can handle data for both the eeprom chip and the tiny85.
You would also be able to add many other functions to
the tiny85 and make it serve many roles because the
math routines would not fill up 8kb of space.
I think it is great fun to combine several processors on
one board :-)
Spin does not have floating point math functions built in
but I still consider the spin interpreter to be a true work
of genius. It packs an enormous amount of capability
into just 512 words!
What do you mean by "for anything that fits in 32 bit"? This makes not much sense to me in this context where we speak about non integer calculations and not about 16 by 16 bit multiplies.
Even if you use only a 16.16 fixed point representation you have to calculate the full 64 bit result and take the middle 32 bit values for your 16.16 result.
After all I am still someone who thinks that in the absolute majority of cases fixed point calculations are the way to go and only a small subset of problems requires IEEE 754 values.
I was yesterday really tired so I want to add for those interested the rough numbers of float performance with catalina:
The 70 kFlops I mentioned is what is typical for me in mixed (+-*/) calculations (I tested a runge kutta method for a system of two coupled differential equations on the propeller).
If you run a testing loop with 3 multiplications and one assignment I get a performance of about 86.2 kFlops which is roughly a bit more than 900 cycles per multiplication (including all overhead like fetching variables from hub, copy back, the loop itself and so on).
Where fixed point beats floats hands down is of course addition and subtraction. So if you have a lot of that in a tight loop nothing can beat that (very careful error analysis for the summing up of rounding errors has to be done). I have used that myself for a 3d ultrasound visualization system more than 15 years ago when the fpu of the first pentium was not as performant as the current fpu's with their SSEx instructions are.
Said all this I want to add that all this is only true for MCU's and not for calculations on a modern PC. Optimized code on a modern CPU can handle up to five float operations per 1 cycle (!!!) due to the vectorisation pipelines of the SSEx.
At the end I have to correct what I said about the float support in catalina. The +-*/ operations are inbuilt part of the catalina kernel, so you do not have to expect a further speedup for the elementary operations by using the special math libraries float_a and float_b which use one and two additonal cogs respectively. Though I have not benchmarked the latter until now just checked the documentation.
Edit: I just want to mention that the catalina benchmarks are not completly off topic, but should give a rough estimate of the performance with the obex float objects in spin also. The catalina plugins are based on that objects (but I do not know exactly on which ones, I found several of them in several versions). It is also easy to cheat from the catalina source code since it is open source to see if there is something which improves perfromance for spin based applications.
Also if you only care about a 16.16 result for the 16.16 * 16.16, and do not care about what may be lost to overflow, you can use only a 32 bit result (same for 32.32 with a 64bit result).
Of cource if you count communicating through Hub mem the time required goes up quite a bit.
Impressive results mentioned for catalina. I must assume then that a floating point mul takes about 1200 clocks?? That would be quite a bit faster than I would have expected.
Any of you used these chips? I wonder how they compare to Holly's tiny85. I'd imagine (based on previous posts by Holly) the tiny85 chips are a lot less expensive. I don't want to take the time to learn how to program another chip so that option is out for me. I've already spent some time using a uM-FPU so that's it's an acceptable option (to me) if it speeds up calculations.
@Heater,
I've read this quote from you several times on the forum:
Do you have any suggestions on how we can better understand the problem.
Can one do trig without floating point? Do you have any books to suggest? I'm used to plugging numbers into my calculator. Worrying about overflowing 32 bits etc. is new to me.
Duane
I didn't want to go near it for an inverse kinematic transform as the trig is considerably more complicated.
On the surface it looks like that was the direction the propeller was taking because there's a sine table for binary values. But the compiler includes no support for it, but does include floating point values for Pi, and can compute constants like Pi/2. That's why I went in the floating point direction.