Floating point in PropGCC
ypapelis
Posts: 99
I am a big confused about the 'under-the-hood' operation of floating point when using C in SimpleIDE/PropGCC. In the old SPIN days (which I have long abandoned for C), there were libraries to explicitly implement floating point math operations and it was pretty clear if these 'functions' operated on the same cog or you could allocate one or two cogs to do them math. It was awkward but it was clear. I can't find anything that mentions how this is supposed to work in SimpleIDE.
On the surface, it all works ok. I can declare floats and use inline math and print them using printf() but it seems to take a bit over 90 microSeconds to do an addition/assignment as per the piece of code below (used a scope to check the timing). Furthermore, I can check the number of free cogs and there are 7 free cogs which means these operations are not utilizing a separate cog. I read some old posts making mention of fpucog.c and a related function but that function and/or file is nowhere to be found in the current SimpleIDE (last version I used is the version 1 release candidate).
Does anyone know if it is possible to enhance floating point performance in C by allocating a COG to it when using SimpleIDE or if there is a library to do the math operations through function calls? I am not too excited about taking old SPIN libraries and converting them using SPIN2C so that is not really an option for now.
Example code:
On the surface, it all works ok. I can declare floats and use inline math and print them using printf() but it seems to take a bit over 90 microSeconds to do an addition/assignment as per the piece of code below (used a scope to check the timing). Furthermore, I can check the number of free cogs and there are 7 free cogs which means these operations are not utilizing a separate cog. I read some old posts making mention of fpucog.c and a related function but that function and/or file is nowhere to be found in the current SimpleIDE (last version I used is the version 1 release candidate).
Does anyone know if it is possible to enhance floating point performance in C by allocating a COG to it when using SimpleIDE or if there is a library to do the math operations through function calls? I am not too excited about taking old SPIN libraries and converting them using SPIN2C so that is not really an option for now.
Example code:
#include <propeller.h> volatile float sum = 0; int main() // Main function { float x = 0.02f; DIRA |= (1<<11); while(1) { x = sum; OUTA |= (1<<11); sum = sum + x; OUTA ^= (1<<11); } }
Comments
I have no idea what it would take to integrate a faster alternative that "just works".
The original floating point emulation for PropGCC was written in C and not propeller specific, so it was quite slow. Newer PropGCC releases have optimized assembly language which is a lot faster. I don't think SimpleIDE has updated to that PropGCC version yet, so in the meantime you can use fpucog.c (which has pretty much the same code as the new PropGCC, and also can use another COG).
Also, make sure you check the option for 32 bit doubles if you really need performance (and don't need 64 bit precision and/or strict C standards compliance). 32 bit floating point is much faster than 64 bit.
Since the original question is answered, I don't feel too bad drawing this way off topic. Why was this feature implemented? Why aren't users just informed "use float, not double"? Why was the solution to completely disable 64-bit floating point numbers rather than educate users on the difference between `float` and `double` and let them choose for themselves?
The only guess is have deals with importing libraries. If I go download the source code for library X and use it with PropGCC, then the existing solution does not require that I modify the source code to use `float` instead of `double`. Seems like an extreme edge case, that anything would import directly onto a Propeller without any other source code modifications. Is this the only the reason or is there something else?
I suspect that choice comes down to library size.
In an ideal world, you can support both and users access either, but in the finite-resource world of MCU's libraries have a significant size, relative to the overall system and a Dual or twin library is an overhead most would want to avoid.
That said, there may well be corner cases where both 32b and 64b would be useful, and the resource consumed would be tolerated.
Another fair question is why that option isn't the default. Actually I think it is the default in SimpleIDE, but for the C compiler proper we wanted the default to be standards-compliant, and the C standard has requirements on double that aren't satisfied by 32 bits (double has to support a certain number of decimal digits of accuracy).
Note also that in PropGCC "float" is always 32 bits, and "long double" is always 64 bits, so it's possible to force the precision of your variables regardless of which compiler option is used. Only the size of "double" is affected by the -m32bit-double flag.
So you can mix now ?
What are the relative Library overheads of using
* only Float,
* only long double, and
* a mix of the two ?
Actually you could always mix -- long double has always been 64 bits. The floating point libraries in the 1.0 version of PropGCC (which SimpleIDE uses) are pretty large, so it's best to just stick with the default size (float and double for -m32bit-double, double and long double otherwise). In later versions of PropGCC the libraries are much smaller -- the core floating point routines are small enough to be overlaid in COG memory. So it's not so bad to use the "other" size for simple calculations. If you start pulling in math functions like cos and sin with the "wrong" size (e.g. using cosl for long double when double is 32 bits, or mixing both cos and cosf when double is 64 bits) then you'll approximately double the library footprint, which will hurt.
Thanks! Very helpful
Thank you, that was exactly what I was looking for. It seems to improve performance by about a factor of 5, including the case of using transcendental functions, that's great. Hope this makes it into the SimpleIDE release soon.
So I think you are saying the 32b and 64b core floating point, will both fit into one COG, meaning there is no real LIB size or speed change for using 64b in a couple of places.
or is that 'overlaid' dynamic reload (slower), rather than 'both can fit' ?
They don't fit at the same time, so it is a dynamic overlay. And not all of the code goes into the overlay, some is run in LMM space (like any other C code). Frequently used helper functions (e.g. to split a float or double into sign, exponent, and mantissa) are in the overlay. For 32 bit some of the core math functions (like add and multiply) also fit in the overlay; the 64 bit code is too big, although I think the multiply and divide inner loops are in there.
Bottom line: there is a cost for using two different float sizes. It's relatively smaller in the newest PropGCC, something like "your code will be 2K bigger if you use 64 bits" instead of "your code will be 5K bigger is you use 64 bits".