Which method is more efficient (C programming)

twm47099 · 2016-02-20 19:00

I've written a C program (using SimpleIDE) for reading the temperature and pressure from an MS5067 altimeter module (parallax 29124). I based it on the SPI code given in the manufacturer's application note AN520 (available from the parallax product page.)

In that code there are a number of formulas used to calculate the T and P values from the raw data and calibration constants obtained from the sensor.

The calculations are all floating point and application note uses the pow(x,y) function where x = 2 and y is various integers up to 23.

The equations are:

// calcualte 1st order pressure and temperature (MS5607 1st order algorithm)

dT=D2-C[5]*pow(2,8);
OFF=C[2]*pow(2,17)+dT*C[4]/pow(2,6);
SENS=C[1]*pow(2,16)+dT*C[3]/pow(2,7);
T=(2000+(dT*C[6])/pow(2,23))/100;
P=(((D1*SENS)/pow(2,21)-OFF)/pow(2,15))/100;

Since these are repeated a number of times to get an average value of T and P (I'm using 10 times in my code, but others have used more), I thought it would be useful to replace the pow function with something faster.

I decided to use the left shift function (1<< 17 for example). It was necessary to make each instance of the shift into a float to get the correct answer. This is my code:

      dT = D2 - (C[5] * (float)(1 << 8 )); 
      OFF = C[2] * (float)(1 << 17) + (dT * C[4]) / (float)(1 << 6);
      SENS = C[1] * (float)(1 << 16) + (dT * C[3]) / (float)(1 << 7);
      T = (2000. + (dT * C[6]) / (float)(1 << 23)) / 100.;
      P = ((D1 * SENS / (float)(1 << 21) - OFF) / (float)(1 << 15)) / 100.;

If I was working with integers, I believe the shift method would be faster than pow, but since I have to convert the power of 2 into a float, is it more efficient?

(I also tried using the calculated values of pow(2,y) with a decimal point at the end of each, but I prefer to calculate them in the program (again not sure which is faster, particularly since it is repeated so many times). Would it be better to just declare them as constants (floating point) at the beginning of main(), or even #define them?

Thanks
Tom

JasonDorie · 2016-02-20 19:24

C has a very good optimizer, so anything you code that can be evaluated to a constant probably will be. In this case, all the pow() calls with constant inputs will very likely be replaced with their result. The same would happen with your (float)(1<<17) versions, so in your case, neither is faster.

If the arguments were variable, the shift then cast to float version would be significantly faster. Using pow() involves several steps internally, I think converting the inputs to their base-2 log values, multiplying, then converting back using exp. It's expensive. Casting to float, by comparison, is pretty cheap.

ersmith · 2016-02-21 02:55

I think Jason covered it pretty well. I'll also note that if you do have to calculate floating point powers of two at run time, the most efficient function is ldexp; it would be even faster than shift + convert. ldexp(x, n), where x is a double and n is an integer, calculates x * 2^n, so for example ldexp(1.0, n) will be 2 to the n. It's very fast because internally floats are stored as a mantissa times a power of 2, so ldexp just has to manipulate the exponent part of the float.

twm47099 · 2016-02-21 04:24

Jason & Eric,
Thanks for the information.
Using the ldexp function could I just change the original equations to:


dT=D2 - ldex(C[5],8);
OFF= ldex(C[2],17) + dT*C[4]/pow(2,6);
SENS= ldex(C[1],16) + dT*C[3]/pow(2,7);
T=(2000 + (dT*C[6])/pow(2,23))/100;
P=(((D1*SENS)/pow(2,21) - OFF)/pow(2,15))/100

Can floating point variables (32bit on the prop) be used with ldexp or do they have to be doubles (64bit??)

Also when dividing by 2^n (for example in the second equation above) can I just use ldex(c[4],-6) ?

Thanks
Tom

Heater. · 2016-02-21 05:32

Note that dividing a signed integer by 2^n is not the same as shifting.

-1 / 2 results in 0

-1 >> 2 results in -1

ersmith · 2016-02-21 12:03

Yes, Tom, that looks approximately right (except the function is "ldexp", not "ldex"). Yes, you can replace "x/pow(2, 6)" with "ldexp(x, -6)". To use floats instead of doubles, use "ldexpf" (and in the original code you could have used "powf" instead of "pow" to work with floats). I checked and the compiler is smart enough to replace pow(2,8) with the appropriate floating point constant, so it's actually rather moot.

Now, there is a huge caveat to all of this: while in theory ldexpf(x, n) could be faster than x * (float)(1<<n) (because it has less work to do) in practice it might not be: the floating point multiplication routine is heavily optimized assembly code, whereas ldexpf is implemented as a C function. So if you really care about squeezing every bit of performance out then you probably should time both ways of doing it and compare the results.

In practice it may be more valuable to keep your code easy to read (which I think the pow version does), and leave the optimization to the compiler.

ersmith · 2016-02-21 12:15

An interesting note on how good GCC's optimizer is: I wrote some sample code to verify that it would replace both pow(2,8) and (float)(1<<8) with constants. Here is the source code:

#include <math.h>

float f;

void calc1()
{
    f = f / powf(2, 8);
}

void calc2()
{
    f = f / (float)(1<<8);
}

and here is the output assembly (obtained with "propeller-elf-gcc -Os -o foo.s -S foo.c"):

        .text
        .balign 4
        .global _calc2
_calc2
        mov     __TMP0,#(2<<4)+14
        call    #__LMM_PUSHM
        mvi     r14,#_f
        mvi     r1,#998244352
        rdlong  r0, r14
        lcall   #___mulsf3
        wrlong  r0, r14
        mov     __TMP0,#(2<<4)+15
        call    #__LMM_POPRET
        '' never returns
        .balign 4
        .global _calc1
_calc1
        brl     #_calc2
        .comm   _f,4,4

Note that GCC was not only smart enough to calculate the constants at compile time, and convert the divide into a (faster) multiplication, it recognized that the two functions ended up being the same and converted "calc1" into a jump to calc2 in order to save space. (This is with gcc 6; gcc 4 performed all the constant folding and divide->multiply, but it still output two copies of the function.)

Electrodude · 2016-02-22 21:36

Why doesn't it put the _calc1 and _calc2 labels in the same place? The "brl #_calc2" is a waste.

JasonDorie · 2016-02-22 21:43

It's possible for code to take the address of a function, and other code to test if it's the same address of another function. If the two functions were merged, the addresses would be identical even though the user believed they were different. It's a weird edge case, but without global knowledge, there's no way for the compiler to know if you've done that or not. Link-time code generators can do this kind of thing because they have that knowledge.

Which method is more efficient (C programming)

Comments