Fast way to get float from byte array ??

Anubispod · 2013-04-14 03:47

Hi , i got stuck on some code here , where i receive 4 bytes from serial and have to convert it to float.

uint8_t test[4] = {9,236,127,63};
float testme ;
printf("Hello World %1.2f \n", testme);

I already tried it with mem copy function i found but no luck
The array should be 1.000261 as float.

Heater. · 2013-04-14 04:05

Are you sure those four bytes are in the correct order?

float testme;
.
.
.
testme = *(float*) test;

That is. test is the address of a byte array. Use a cast (float*) to make it the address of a float.
Dereference that * to get the float value at that address.

kuroneko · 2013-04-14 04:10

Anubispod wrote: »

I already tried it with mem copy function i found but no luck
The array should be 1.000261 as float.

memcpy(&testme, &test[0], sizeof(test)); does in fact work when printed as %f (you're cutting stuff off with %1.2f) although this is not 1.000261. It's probably more convenient using a union for this, e.g.

#include <stdio.h>
#include <stdlib.h>

union {
    char test[4];
    float testme;
} mixed = {0x8D, 0x08, 0x80, 0x3F};  /* 1.000261, little endian */

int main(int argc, char **argv) {
    printf("%f\n", mixed.testme);
    exit(0);
}

ersmith · 2013-04-14 04:12

Anubispod wrote: »
Hi , i got stuck on some code here , where i receive 4 bytes from serial and have to convert it to float.
uint8_t test[4] = {9,236,127,63};
float testme ;
printf("Hello World %1.2f \n", testme);
I already tried it with mem copy function i found but no luck
The array should be 1.000261 as float.

The memcpy should have worked, although I think your math is off (those bytes represent 0.999695 as a float). Another, simpler way is to use a union, which lets you represent the same area of memory in two different ways:

#include <stdio.h>
#include <stdint.h>

union f_or_b {
    float f;
    uint8_t b[4];
} testme;

int main()
{
    testme.b[0] = 9;
    testme.b[1] = 236;
    testme.b[2] = 127;
    testme.b[3] = 63;
    printf("testme {%u,%u,%u,%u} = %f\n",
           testme.b[0], testme.b[1], testme.b[2], testme.b[3],
           testme.f);
    return 0;
}

(Edit: I see that kuroneko beat me to it!)

Anubispod · 2013-04-14 04:13

Ok thanks i give that a try and zeah it ca be that i grabed the wrong value but 0.9 sound like what sould come there thanks a lot i hope this works now

SRLM · 2013-04-15 01:00

Since this appears solved for the OP, I'd like to ask a question about the answers: isn't using the union as a "type converter" a bad idea, since it breaks aliasing? There seem to be numerous web pages devoted to that problem.

Heater. · 2013-04-15 01:18

Using a union as a type converter is a bad idea because:
1) Endianness issues. If you move your code to a different endian machine it will fail.
2) Padding issues. Possibly your compiler moves things around inside structs ensure some memory alignment.
3) Alignment issues. Moving code to machines that don't like unaligned access can fail. With bus errors or silently.
4) I don't much like the idead of the same thing having multiple names.

I'm sure there are more reasons. But messing up portability due to 1), 2), 3) are the ones I have seen crop up in real world projects most.

ersmith · 2013-04-15 04:47

SRLM wrote: »

Since this appears solved for the OP, I'd like to ask a question about the answers: isn't using the union as a "type converter" a bad idea, since it breaks aliasing? There seem to be numerous web pages devoted to that problem.

I think using the union is the approved way to do it (when it has to be done... as Heater points out it is a dangerous and unportable practice in general, but there are particular times when it has to be done). Dereferencing pointers cast to a different type is more dangerous and does explicitly break aliasing analysis. With a union the compiler at least knows that the memory could be accessed in different ways.

Eric

Heater. · 2013-04-15 05:33

The problem comes when you are interfacing to other machines via network or serial line or whatever. As in the case of the first post. There will be an ordering of bytes on the line that you had better be aware of and may not match your machines ordering. Hopefully the ordering is specified in your network protocol and not just whatever the machine at the other end decides to do.

So, no matter if you use casts or unions the general problem remains.

For this reason we have library functions like htonl(), htons(), ntohl(), ntohs() to convert from host to network byte ordering and back again. They do what ever is required and make your code portable.

jac_goudsmit · 2013-04-15 06:10

I would say a union is a more structured way to solve problems like this. Yes, endianness is an issue but if you disregard that (i.e. if you can assume that the bytes are stored host-endian) it's much better to have the types declared in one place (the union) than to have possibly many typecasts. Dereferencing pointers for typecasting is (arguably) error-prone and if the underlying type changes from e.g. a float to a double, you only have to change it in the union declaration.

Basically, in my opinion, typecasts should be regarded as a directive to the compiler that mean "whatever this is, use it as <type>", but unions say it in a much more structured way "this can be a <type> but it can also be a <type>" (there is no wildcard). My experience is that casts can lead to subtle bugs in cases where the architecture is modified later, and unions are easier to maintain and prevent such bugs to a large extent.

===Jac

Heater. · 2013-04-15 06:56

I'm inclined to agree. Using a union clearly raises a red flag as to where portability issues may reside. Better than scattering them all over your code with casts. Unless you only have one such conversion in your code and then a union might seem heavy weight.

Seems though that standard networking code assumes that the structures in its API are in network order and you have to be careful to take care of endianess in your application.

serv_addr.sin_port = htons(portno);   // Be sure to fix endianess here.
     if (bind(sockfd, (struct sockaddr *) &serv_addr,
              sizeof(serv_addr)) < 0) 
              error("ERROR on binding");

Rayman · 2013-04-15 16:12

maybe it's easier like this:

char testme[4]  = {0x8D, 0x08, 0x80, 0x3F};  /* 1.000261, little endian */
printf("Hello World %d %f\n", n,*((float*)testme));

kuroneko · 2013-04-15 16:29

Rayman wrote: »

maybe it's easier like this:

char testme[4]  = {0x8D, 0x08, 0x80, 0x3F};  /* 1.000261, little endian */
printf("Hello World %d %f\n", n,*((float*)testme));

Can you guarantee that testme[4] is aligned so that the resulting float access doesn't throw an exception? Unaligned access isn't always an option.

Rayman · 2013-04-15 16:36

It seems to work for me... But, maybe this would make it absolute:

char __attribute__((aligned(4))) testme[4] = {0x8D, 0x08, 0x80, 0x3F}; /* 1.000261, little endian */

jazzed · 2013-04-15 17:09

The basic subject here seems to be Marshalling which can get really ugly (CORBA or XDR for example).

XDR is used by NFS RPC and is a good general serialization. See http://en.wikipedia.org/wiki/External_Data_Representation

Anything beyond some primitives is too much for an MCU though.

Heater's notes about htonx() and ntohx() are on the money for me, except that we don't have arpa/inet.h in the library.

A 32 bit Float can be converted using inet.h conversions htonl()/ntohl() in the same way that Spin's F32 operates.

It would be fairly simple to create a set of these conversion function/macros which could be extended to htonf()/ntohf() for 32 bit floats or htond()/ntohd() for 64 bit doubles. Using functions diminish porting problems.

Heater. · 2013-04-15 23:53

Yep, Marshaling is the issue. I would not want to use the standard htonX / ntohX function names unless dealing with standard networking code. After all your serial protocol may have a different network byte order. But yes a set of such functions/macros for you particular case would be a good idea.

Anything beyond some primitives is too much for an MCU though.

Luckily things like CORBA managed to become so convoluted and top heavy with standards committees that they have mostly been abandoned. Then the world moved to XML which started out as a simple idea and rapidly went horribly wrong as well.

What I'd like is a very simple JSON Marshaller.

ersmith · 2013-04-16 04:44

Rayman wrote: »

maybe it's easier like this:

char testme[4]  = {0x8D, 0x08, 0x80, 0x3F};  /* 1.000261, little endian */
printf("Hello World %d %f\n", n,*((float*)testme));

It's easier, but not always safe. Casting a pointer to the "wrong" type can confuse the compiler's analysis during optimization. If you're not optimizing, or explicitly turn off alias analysis, it'll be OK. Otherwise it may or may not work. I think the union method should always work.

Rayman · 2013-04-16 06:50

I find it hard to imagine any optimization that would make that not work...
Anyway, I just tried it in xmmc mode and -O2 and that works too...
Can you think of a simple example that breaks it?

kuroneko · 2013-04-16 07:14

Rayman wrote: »

Can you think of a simple example that breaks it?

char a[4] = {0, 1, 2, 3};
char e = 5;

This will place a[] on an odd address (mingw gcc -O2). Unaligned long/int32 access isn't allowed for all architectures.

Full example:

#include <stdio.h>
#include <stdlib.h>

union {
    unsigned char test[4];
    float testme;
} mixed = {{0x8D, 0x08, 0x80, 0x3F}};

char a[4] = {0, 1, 2, 3};
char e = 5;

int main(int argc, char **argv) {
    printf("%f\n", mixed.testme);
    printf("%08X\n", &a[0]);
    printf("%08X\n", &e);
    exit(0);
}

ersmith · 2013-04-16 07:20

Rayman wrote: »

I find it hard to imagine any optimization that would make that not work...
Anyway, I just tried it in xmmc mode and -O2 and that works too...
Can you think of a simple example that breaks it?

Kuroneko provided a good example of how differing alignments for char and float can cause problems. Even if the alignment works out the cast can confuse alias analysis in the optimizer. I don't have an example ready to hand, but gcc's manual warns against this.

Unions fix both of these issues (union alignment is determined by the strictest alignment required for any of its types).

Eric

Rayman · 2013-04-16 09:52

Doesn't this force the alignement:

char __attribute__((aligned(4))) testme[4] = {0x8D, 0x08, 0x80, 0x3F}; /* 1.000261, little endian */

Also, since the data is being treated as a pointer and also cast as a pointer, I don't think there is an optimization issue....
But, say I'm wrong, and there is an optimization issue...
Would marking testme as "volatile" prevent optimization problems?

ersmith · 2013-04-16 13:47

Using a "float *" to refer to a character array is not only dangerous for optimization, but actually violates the C standard. The relevant portion of the standard is:

(C99; ISO/IEC 9899:1999 6.5/7):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object,

a qualiﬁed version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the object,

a type that is the signed or unsigned type corresponding to a qualiﬁed version of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

There's a nice detailed discussion at http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule.

According to the standard it is OK to access a float via a "char *", but not the other way around.

Rayman · 2013-04-16 15:09

it could be that "what works" and "the proper way" are seperate things...

Rayman · 2013-04-16 15:30

Actually, is this much different than what spin2cpp does for a PASM driver:

uint8_t tvSpin::dat[] = {
0x5c, 0x2a, 0xfe, 0xa0, 0x0a,

and then:

Okay = (Cog = (cognew((int32_t)(&(*(int32_t *)&dat[0])), Tvptr) + 1));

jazzed · 2013-04-16 16:09

Any interest in creating a serialization API for common data types?

Tor · 2013-04-17 00:50

Rayman wrote: »

it could be that "what works" and "the proper way" are seperate things...

Yes, but unfortunately what separates them is only clear when you reword the expressions like so: "what works, some of the time" and "works, all of the time"
The aliasing restrictions added to modern compilers are there so that the compiler is free to optimize properly. There are safe, well-established and standard methods available for converting and re-mapping of all types of variables, casting one type to an incompatible type was something done back in the beginnings of C programming (back in K&R or worse). It's not a good way, and that's why it's not done anymore.

-Tor

ersmith · 2013-04-17 03:47

Rayman wrote: »

Actually, is this much different than what spin2cpp does for a PASM driver:

That's a bug in spin2cpp. I'm not sure how to work around it, but it's probably one of the reasons spin2cpp generated code doesn't always work properly when optimized. (The problem with volatile variables not being marked is another.)
Eric

Anubispod · 2013-04-17 12:40

Thanks for all the usefull infos on float conversion

Heater. · 2013-04-17 12:52

What? I thought this was C we were talking about. Back in the day a C compiler would faithfully compile any old line noise you gave it. Perhaps the resulting executable would even run. If you need mama's hand to make the walk to the shops then perhaps Ada is more like the programming language for you:)

Me, I'm into JavaScript now. It faithfully runs what ever whatever line noise I feed it, like the good old days.

I guess the programming language world came full circle.

jazzed · 2013-04-17 20:06

Here is the beginning of a marshalling library.
It doesn't do float, double, structs, or arrays.
If someone wants to contribute those I would be most honored.

Fast way to get float from byte array ??

Comments