Floating Point

Sal Ammoniac · 2011-03-25 17:14

Will Spin and the IDE ever be extended to support native floating so I can write code like:

Temperature := 75.0 * (2.7 - (3.4535/1023.0*rawdata)) - 55.5734

instead of

Temperature := fp.FSub(fp.Fmul(75.0, (fp.FSub(2.7, fp.FMul(fp.FDiv(3.4535, 1023), rawdata), 55.5734)

or having to convert everything to scaled fixed point?

Zap-o · 2011-03-25 17:19

I hope not.

martinh · 2011-03-25 19:21

No idea if spin will ever support this (and I see honestly not much sense in it) but if you really need a language with inbuilt floating point support for the propeller you can have a look at the catalina c compiler.

Dave Hein · 2011-03-26 05:29

I hope so.

Martin_H · 2011-03-26 05:41

There are two values to built in floating point.

First, if you're into robotics and use trig for kinematic transforms or position computation, a friendlier syntax can help avoid bugs. For me this is the less important issue.

Second, if you include two different Spin objects that use floating point internally there can be unwanted outcomes depending upon the internals of the included object. More RAM may be consumed with different floating point implementations, and each included object may consume an additional cog for F32's floating point co-processor.

A built in implementation could avoid these problems by coordinating the implementation and coordinating cog usage in some manner.

I might be wrong here, but doesn't switching to Catalina sacrifice the network effects Spin affords? We couldn't use other Spin objects in C/C++ programs and would have to translate them into C/C++? Also, C/C++ syntax isn't a paradise for floating point either. It's pretty easy to accidentally invoke an integer operation and get a truncation error.

martinh · 2011-03-26 08:08

Martin_H wrote: »

There are two values to built in floating point.
Second, if you include two different Spin objects that use floating point internally there can be unwanted outcomes depending upon the internals of the included object. More RAM may be consumed with different floating point implementations, and each included object may consume an additional cog for F32's floating point co-processor.

A built in implementation could avoid these problems by coordinating the implementation and coordinating cog usage in some manner.

To avoid that first of all this problems have to be solved for the libraries. Including floating point support into the spin compiler does not automagically do that. Since floating point is not a feature of the spin byte code and is not a feature of the propeller, a spin compiler which has it built into the language needs to invoke some (then internal) library in some way. But where should this library come from? It will be a library written by someone and so it comes back to cleaning up the libraries to avoid the problems.
Not that this is an unsolvable problem, to some degree exactly this integration is done in catalina in a way in find quite good so far (I am not an expert, I am just making myself familiar with catalina for some time now and did some test programs).

The reason why I wrote that I doubt it makes much sense in spin is a bit about the philosophy I see in the spin language. It closely resembles the features of the spin byte code - including libraries as builtins into the language goes in my oppinion a bit in the wrong direction (but that's just me).

Martin_H wrote: »

I might be wrong here, but doesn't switching to Catalina sacrifice the network effects Spin affords? We couldn't use other Spin objects in C/C++ programs and would have to translate them into C/C++? Also, C/C++ syntax isn't a paradise for floating point either. It's pretty easy to accidentally invoke an integer operation and get a truncation error.

You are of course right to argue about the network effects, but that is of course true for every language for the propeller which is not spin (propbasic for example). It seems there are people out there who prefer to use something else than spin due to different reasons (syntax which is not absolutely specific to the prop, higher speed than spin, support for external ram or whatever else).
I think your argument is similar to something I thought about some time ago: The interoperatibility between code written in non-spin and spin should be better. This is for sure not completely trivial for catalina.

About the truncation by accidantial invokation of integer operations: Do you have an example in mind, usually in C the operators are converted to float in a mixed operation and not truncated and usually the compiler should throw a warning in situations where you assign let's say a float to an integer variable where truncation can happen.

Since this post is now a bit off topic (the thread is about spin not C) I will end at this point. But I will end with some kind of warning.
Adding new language elements and features to a programming language as afterthoughts can easily end up in a cluttered result, which makes an otherwise clean language sooner or later a mess if not a lot of care is taken to avoid it.
I see more value in having something like a unified standard library (or standard libraries) for the more common tasks for spin than new syntax (but again that's just my personal point of view).

davidsaunders · 2011-03-26 08:57

I Would rather see some fixed point support, as the Propeller does not support Floating point in HW, making fixed point math quite a bit faster, and I have not yet seen an application that needs to use the extremely large or extremely small numbers that are the only advantage of floating point over fixed point. Perhaps a fixed point implementation that supports 8.24, 16.16 bit, and 32.32 bit fixed point numbers and math.

Martin_H · 2011-03-26 09:46

@martinh, I agree that a standard library would address my second concern. The use of a function versus an operator is a syntactic sugar issue and less important to me. I also agree that language interoperability with C/C++ would be really useful, as I could re-use already debugged C code in Spin.

I don't have access to a C compiler to check this, but my error was something like this:

float result = intVal1 / intVal2 * realVal1

The C compiler used an integer division and then converted the int quotient to float before the multiplication. The result was almost right except for mysterious zeros that cropped up when the two integer values were close enough in value to return zero from the division. I felt pretty foolish when I found it.

Heater. · 2011-03-26 10:17

No warnings here with GCC for that line of code. Not even with the -Wall option.

Now how does one expect this floating point stuff to be done in Spin?

There for sure isn't room in the COG for the Spin interpreter to have floating point op codes added in.

That means the floating point would have to be done as sequence of integer op code generated by the compiler. Perhaps in-line or perhaps added to the binary as library functions. But this is going to be awful slow.

Alternatively the floating point would be done in PASM in a separate helper COG. But then if all your application COGs use floats then there will need to be locks around that interface. Messy.

All in all I think the fixed point idea has merit. Floating point is generally unnecessary and eats a lot of resources.

Sal Ammoniac · 2011-03-26 10:23

My original post did specifically mention Spin, but I'd be happy with any well-integrated, well-supported FP solution for the Propeller. Someone mentioned Catalina -- I'll take a look at that and see if it meets my needs.

Sure, many things can be converted from floating point to fixed point, but it's often tedious to do so and this only works for smaller numbers. How do you represent 134512.3846 in fixed point if you need to maintain four digits to the right of the decimal? You'd need to use 64-bit integers, which aren't well-supported in Spin either.

Someone mentioned that the Propeller doesn't natively support FP in hardware -- so? The other two architectures I use on a regular basis, ARM Cortex-M3 and XMOS, don't either, but their development environments fully support floating point as a built-in part of the language.

Ken Gracey has said on these forums that one of Parallax's focuses in 2011 is to make the Propeller more appealing to professionals for use in more traditional embedded products. Sure, people here have proved that you can emulate an ancient Z80 or run 1980s-style video games on a Propeller, but have any big companies used them in a deeply embedded product such as an engine controller?

I see several things that will hold the Propeller back in this aspect (in no particular order):

1. Lack of native FP support in the language supported by the chip vendor. Perhaps Parallax should license one of the existing C packages and support/develop it themselves.

2. Lack of hardware debug support. For the last decade (or more), all other embedded processors have incorporated JTAG or another hardware debug mechanism in their products. The Propeller doesn't have this capability. Nor does it have boundary scan, which is how most of the big boys test their systems on the manufacturing line. There is ViewPort, but it's an add-on product and it's invasive (need to modify your code to use it).

3. The well-known memory space limitations (496 longs per cog and 32KB for hub memory). There are work-arounds to these limitations, yes, but none of them are ideal.

davidsaunders · 2011-03-26 10:35

Sal Ammoniac wrote:

How do you represent 134512.3846

This would be very simple with 32.32 fixed point. If you implement support for the three fixed point types listed in my previous post there is no reason to convert from floating point to fixed point, just use fixed point. And doing 64 bit math on a thirty-two bit system is pretty simple.

martinh · 2011-03-26 10:40

Heater. wrote: »

No warnings here with GCC for that line of code. Not even with the -Wall option.

Use the -Wconversion switch to enable such warnings (-Wall does not enable all warnings even if its name suggests that).

Sal Ammoniac · 2011-03-26 10:42

davidsaunders wrote: »

This would be very simple with 32.32 fixed point. If you implement support for the three fixed point types listed in my previous post there is no reason to convert from floating point to fixed point, just use fixed point. And doing 64 bit math on a thirty-two bit system is pretty simple.

Good point. How about 1.602 x 10^-19 -- how would you handle that one in fixed point?

davidsaunders · 2011-03-26 10:52

That is one of those exceptions where you would have to use Floating point. I do not think that there many applications that meet such exceptions, it seems that probably more than 99% of applications could do just fine with fixed point (and some cases better as the in floating point you usually only have 24bit significant (and 52-bits at best in most systems), this creates some extreme rounding errors. And again fixed point is a lot faster than floating point, when implemented on a integer only system.

martinh · 2011-03-26 11:02

davidsaunders wrote: »

That is one of those exceptions where you would have to use Floating point. I do not think that there many applications that meet such exceptions, it seems that probably more than 99% of applications could do just fine with fixed point (and some cases better as the in floating point you usually only have 24bit significant (and 52-bits at best in most systems), this creates some extreme rounding errors. And again fixed point is a lot faster than floating point, when implemented on a integer only system.

I agree that the need to use floating point on a micro is and should be a rare case only to be used in very special aplications. But the idea that float is slower than fixed point would be only true if multiplication and division for the integers were a builtin functionality of the hardware. This is not true for the propeller and so a float multiply/division needs about the same effort as a fixed point multiply/division.
For addition and subtraction you are of course right.

Heater. · 2011-03-26 11:18

Sal Ammoniac,

I agree, converting an existing app that uses floating point to fixed point may be a chore.

...only works for smaller numbers. How do you represent 134512.3846 in fixed point if you need to maintain four digits to the right of the decimal? You'd need to use 64-bit integers..."

You do realize you cannot represent 134512.3846 in 32 bit floating point?

With 32 bit floats you get 24 bits of mantisa. That only allows you to count from 0 to 16777215 in the mantisa. That is only 8 digits of accuracy. You are asking for 10 digits of accuracy.

Try this simple experiment in C:

  float fVal1 = 134512.3846;
  float fVal2 = 000000.0001;
  float fSum;

  fSum = fVal1 + fVal2;

  printf("Size of float = %d\n", sizeof(fVal1));
  printf("fVal1 = %f\n", fVal1);
  printf("fVal2 = %f\n", fVal2);
  printf("sum   = %f\n", fSum);

See how wrong the results are!

So yes you would need to use 64 bit floats. But, hey, then you could use some 64 bit fixed point scheme anyway.

As my old boss used to say "If you think you need floating point to solve the problem you don't understand the problem":)

Up shot is, that if you don't think your calculations through and look out for overflows/underflows of intermediate results and such you are going to have trouble floating point or not.

Heater. · 2011-03-26 11:21

martinh,

Thanks for the -Wconversion tip. Shows how often I have used floating point in my career:)

Martin_H · 2011-03-26 12:42

Heater, the thing about Spin is that the IDE uses floating point for constants like Pi, and performs floating point math on constants. So if you are planning to do an inverse kinematic transform you will naturally be lead to a floating point solution. I looked at the Obex fixed point library, but decided on F32 because it seemed easier to deal, better supported, and integrated better with IDE constants. As near as I can tell it has the multi-cog problem I mentioned above, so I fire walled off all my run time floating point to a single module. But that likely won't scale as my program grows.

Also, I programmed for six years of my career in the Bliss programming language, it is also a typeless language without floating point built in. For system software I never missed it.

Mike Green · 2011-03-26 12:51

F32 has two variables in a VAR section. By changing the VAR section to an equivalent DAT section like

DAT
f32_Cmd    long   0
cog        byte  0

you'll be able to include the object in as many places as you want as long as you only call the start method once (in your initialization) and you use locks if you call the floating point routines from more than one cog.

Martin_H · 2011-03-26 13:00

Mike, agreed as that topic came up in another thread. After we discussed it, I initially did just that. But I changed my mind and fire walled off the floating point and reverted back to the stock F32. The problem I realized is that if I include another Obex object which uses F32 it won't be playing by those rules. It will call start and not use locks! I would need to re-code that object which would limit the usefulness of the imported code.

That's why either a compiler supported fixed or floating point solution would be helpful.

Sal Ammoniac · 2011-03-26 13:42

Heater. wrote: »

Sal Ammoniac,
You do realize you cannot represent 134512.3846 in 32 bit floating point?

Of course. I almost always use 64-bit floating point when I use floating point at all.

As my old boss used to say "If you think you need floating point to solve the problem you don't understand the problem":)

That may apply to things like systems programming, and blinking LEDs, and many other things like that, but there are some applications where true floating point makes things easier and more efficient.

There is a need for floating point in the embedded industry, which is why companies like ARM are adding hardware FP units to their MCUs (the Cortex-M4 has single precision hardware floating point).

Up shot is, that if you don't think your calculations through and look out for overflows/underflows of intermediate results and such you are going to have trouble floating point or not.

That goes without saying. Of course I do that checking with any calculations that I do, FP or not.

Heater. · 2011-03-26 18:43

Sal Ammoniac,

..I almost always use 64-bit floating point.

Fair enough. I've just assumed 64 bit float support is completely off the table for the Prop. It would be bigger and slower than F32. Notice that no one has felt the need too create an F64 yet. Besides Spin does not even have 64 bit integer data type.

As my old boss used to say "If you think you need floating point to solve the problem you don't understand the problem"

That may apply to things like systems programming, and blinking LEDs, and many other things like that, but there are some applications where true floating point makes things easier and more efficient.

At the time I was hit with that quote (often) we were building the plot processing and display for a 3 dimensional radar system. There are some pages about it here http://www.radarpages.co.uk/mob/martello/martello2.htm That was long enough ago that we only had 16 bit processors and fixed point maths. The architecture looked like a Prop only a few thousand times bigger! Don't forget that man landed on the moon with no floating point support. That's far away from flashing LEDs is it not?

There is a need for floating point in the embedded industry, which is why companies like ARM are adding hardware FP units to their MCUs (the Cortex-M4 has single precision hardware floating point).

It just happens that I am now involved with an embedded control system that uses a lot of floating point, it was developed on PC's where floating point is almost free and no one ever thought about it. Now for reasons of physical size and cost it is required to port this to a tiny ARM platform. I even with a 1Gh ARM Cortex-A8 that has VFPV3 it's hitting performance limits. One existing ARM device where we would like this code to run is only 200MHz with no hardware floating point, porting to that is going to be tough. If only they had used fixed point to start with...

davidsaunders · 2011-03-26 19:05

Most applications that use FP do not need it, and many of them would have extreme gains in speed by converting over to fixed point. As to the speed difference between Floating point and fixed point for multiplication, a fixed point multiplication with 32.32 takes 544 clock cycles, and for anything that fits in 32 bits takes 276. Are you able to get similar results using floating point math? If so I really want to see your code. Of course some integer multipliers may increase or decrease these numbers by up to 12 clock cycles.

martinh · 2011-03-26 19:54

Comparing the speed of floating vs. fixed point is a bit comparing apples and oranges, they do not easily fit into the same category. But by simple logic: What exactly is a single precision float multiply? It is a 24bit * 24bit integer multiply plus one signed integer addition for the exponent and a minimal overhead of bit twiddling. Today I am to tired but I will happily count the clock cycles for that tomorrow.
With all overhead which comes from LMM in catalina I get for the floats about 70 kFlops so about a bit more than 1000 cycles per multiply but this is the LMM pasm version not the version with a dedicated fpu cog, I will compare that tomorrow.

M. K. Borri · 2011-03-26 23:06

I wrote a singleton type math library to avoid ram duplication and to allow sharing a "fpu" cog by more than one caller cog if it helps any, it's called dynamic math library

HollyMinkowski · 2011-03-26 23:35

If for some reason I needed lots of intense floating point
math for a propeller project I would use a tiny85 as
a math processor. You could just use I2C and hook the
tiny to the same pins the eeprom chip is on. With 8kb
of flash on the tiny85 and 20mips of processing power
you could create a very powerful set of fast math routines.
This would only add about 2.00 to the project cost and
would not use up any IO pins since the eeprom pins
can handle data for both the eeprom chip and the tiny85.

You would also be able to add many other functions to
the tiny85 and make it serve many roles because the
math routines would not fill up 8kb of space.

I think it is great fun to combine several processors on
one board :-)

Spin does not have floating point math functions built in
but I still consider the spin interpreter to be a true work
of genius. It packs an enormous amount of capability
into just 512 words!

Martin_H · 2011-03-27 06:58

M. K. Borri, that sounds interesting. Is it in the Obex?

martinh · 2011-03-27 09:26

davidsaunders wrote: »

Most applications that use FP do not need it, and many of them would have extreme gains in speed by converting over to fixed point. As to the speed difference between Floating point and fixed point for multiplication, a fixed point multiplication with 32.32 takes 544 clock cycles, and for anything that fits in 32 bits takes 276. Are you able to get similar results using floating point math? If so I really want to see your code. Of course some integer multipliers may increase or decrease these numbers by up to 12 clock cycles.

I think you have a typo here a 32.32 fixed point multiply can never be done in 136 instructions (=544 cycles). So I guess you mention a 32bit times 32bit multiply with a 64bit result. A 32.32 number would be a 64bit value which when multiplied has an intermediate 128 bit result from which you have to take the mid 64 bits as the final result. Even if you unroll the complete loop which has to be passed 64 times this would leave 2 operations per bit, since you have to shift and add and that not for a single long but even in the best implementation of a 32.32 value for two of them you end up with much more cycles to perform this.

What do you mean by "for anything that fits in 32 bit"? This makes not much sense to me in this context where we speak about non integer calculations and not about 16 by 16 bit multiplies.
Even if you use only a 16.16 fixed point representation you have to calculate the full 64 bit result and take the middle 32 bit values for your 16.16 result.

After all I am still someone who thinks that in the absolute majority of cases fixed point calculations are the way to go and only a small subset of problems requires IEEE 754 values.

I was yesterday really tired so I want to add for those interested the rough numbers of float performance with catalina:
The 70 kFlops I mentioned is what is typical for me in mixed (+-*/) calculations (I tested a runge kutta method for a system of two coupled differential equations on the propeller).
If you run a testing loop with 3 multiplications and one assignment I get a performance of about 86.2 kFlops which is roughly a bit more than 900 cycles per multiplication (including all overhead like fetching variables from hub, copy back, the loop itself and so on).

Where fixed point beats floats hands down is of course addition and subtraction. So if you have a lot of that in a tight loop nothing can beat that (very careful error analysis for the summing up of rounding errors has to be done). I have used that myself for a 3d ultrasound visualization system more than 15 years ago when the fpu of the first pentium was not as performant as the current fpu's with their SSEx instructions are.

Said all this I want to add that all this is only true for MCU's and not for calculations on a modern PC. Optimized code on a modern CPU can handle up to five float operations per 1 cycle (!!!) due to the vectorisation pipelines of the SSEx.

At the end I have to correct what I said about the float support in catalina. The +-*/ operations are inbuilt part of the catalina kernel, so you do not have to expect a further speedup for the elementary operations by using the special math libraries float_a and float_b which use one and two additonal cogs respectively. Though I have not benchmarked the latter until now just checked the documentation.

Edit: I just want to mention that the catalina benchmarks are not completly off topic, but should give a rough estimate of the performance with the obex float objects in spin also. The catalina plugins are based on that objects (but I do not know exactly on which ones, I found several of them in several versions). It is also easy to cheat from the catalina source code since it is open source to see if there is something which improves perfromance for spin based applications.

davidsaunders · 2011-03-27 09:53

My numbers may be a little off as I just counted up the instructions for a 64bit * 64bit and a 32Bit * 32Bit multiply and multiplied by 4 (in my head), and added enough to correct for the shift offset. I do not think that anyone is willing to take the hit that using a loop would take. I do think that every one has pretty close to the same code to do this.
Also if you only care about a 16.16 result for the 16.16 * 16.16, and do not care about what may be lost to overflow, you can use only a 32 bit result (same for 32.32 with a 64bit result).

Of cource if you count communicating through Hub mem the time required goes up quite a bit.

Impressive results mentioned for catalina. I must assume then that a floating point mul takes about 1200 clocks?? That would be quite a bit faster than I would have expected.

Duane Degn · 2011-03-27 10:18

I have a couple of uM-FPU V3.1 coprocessors. I keep meaning to compare the time it takes to send a received messages to this chip with using F32. Maybe today's a good day to try this.

Any of you used these chips? I wonder how they compare to Holly's tiny85. I'd imagine (based on previous posts by Holly) the tiny85 chips are a lot less expensive. I don't want to take the time to learn how to program another chip so that option is out for me. I've already spent some time using a uM-FPU so that's it's an acceptable option (to me) if it speeds up calculations.

@Heater,

I've read this quote from you several times on the forum:

As my old boss used to say "If you think you need floating point to solve the problem you don't understand the problem"

Do you have any suggestions on how we can better understand the problem.

Can one do trig without floating point? Do you have any books to suggest? I'm used to plugging numbers into my calculator. Worrying about overflowing 32 bits etc. is new to me.

Duane

Martin_H · 2011-03-27 10:34

Duane, I've done trig on the Basic Stamp 2 using 16 bit integer arithmetic. Rather than using a fractional value from 1 to 0 to -1 for Sine or Cosine, the BS2 uses an integer value from 127 to 0 to -127. For angular measures the stamp uses binary radians which are also a signed byte. After all your calculations are complete you rescale the results by dividing by 127. It's a bit awkward, particularly when divisions involve negative numbers, but it works and support is built into PBasic.

I didn't want to go near it for an inverse kinematic transform as the trig is considerably more complicated.

On the surface it looks like that was the direction the propeller was taking because there's a sine table for binary values. But the compiler includes no support for it, but does include floating point values for Pi, and can compute constants like Pi/2. That's why I went in the floating point direction.

Floating Point

Comments