Double-Float (64-bit) to Single-Float (32-bit) Conversion

WalterW · 2007-03-19 20:58

I'm trying to convert a 64-bit double precision floating number (GPS coordinates in radians) into a 32-bit single precision floating number (GPS coordinates in Semicircles). My major hurdle is doing the down-conversion from double to single.

My theory so far has been to put the 64-bit data into an 8 byte array, and then move it into 3 longs. The first long (the sign bit) I'm trying to shift down >>= (right) then shift back up <<= (left, to clear the bits), but when i perform the operation, i end up with the sign bit in the last byte: 00 00 00 80 instead of: 80 00 00 00. The next long (middle long) I want to shift 4 left <<= (to knock off the top 4 bits) then shift right >>= 1 to clear what will be the sign bit, and leaving the bottom 3 bits open. The last long contains the extra 3 bits i need from the lower-32 bits of the 64-bit number. For the last long I want to shift the top 3 bits to the bottom >>= 29 (shift right) so that can be added to the middle long. When I perform the shift operation, I end up with 00 00 00 00 instead of 00 00 00 05.

Since the difference between a 64-bit double and 32-bit single is 3 extra exponent bits, and 29 extra mantissa bits, I'm hoping that shuffling the bits around will allow me to get the resultant 32-bit single without having to code some overly complex routine.

I'm guessing that the numbers are stored in PropRAM MSB... What I can't tell is if the chip is trying to perform its own 'signing' and is interfering with my bit-manipulation. Anyone have any-other bitwise manipulations that could yield the same result?

pgbpsu · 2008-03-18 02:24

WalterW-

Are you still out there?

Did you ever get this working? I'm facing a similar problem: GPS binary data that I'd like to parse contains the seconds as a double. How on earth (or heavens since we're talking gps) they think a 64-bit float is required to store the seconds, I can't understand. It seems like a 32-bit float is more than accurate enough for gps time stamps. In any case, did you get this working? If so would you mind sharing?

Thanks,
Peter

Mike Green · 2008-03-18 02:38

WalterW,
You really need to show the code you've written for this. It's hard to tell what you're doing wrong from your description.

All of the shift operations except the arithmetic shift right are logical shifts. Maybe you've typed the operator wrong in your program.

pgbpsu · 2008-03-18 02:51

Hi Mike-

WalterW's post is just about a year old so I'm not optimistic I'll hear back from him. I agree that what he suggested didn't sound difficult from a bit manipulation standpoint. I was hoping he'd give me some insight on how to go about this conversion. I'm reading wikipedia right now to see how to code this.

I'll see what I come up with... Please stay tuned since I'll no doubt need some pointers.....

Thanks,
pb

Mike Green · 2008-03-18 03:03

pgbpsu,
Yes, I didn't look that closely at the date. I may be wrong, but I think you can basically unpack the double precision value into sign, exponent, and mantissa, then pack it into single precision form. The mantissa is left justified, so it just needs to be shifted into the new position and the extra rightmost bits discarded. The sign bit is left as the most significant bit. When you unpack and pack the exponent, there's an adjustment that depends on the size of the field that you have to remove. For single precision, I think it's 127. For double precision, I think it's 1023.

The Wikipedia entry has nice pictures and explanations of the various formats.

pgbpsu · 2008-03-18 04:02

Thanks Mike-

I'm not concerned about the bit manipulation, but I'm still trying to figure out the biasing.

I'll be starting out with all 64 bits of my double in a byte wide array (double_array[noparse][[/noparse]0-7].

If sign_bit is a long, this should retain the sign:
sign_bit := double_array[noparse][[/noparse]0] & $80 ' get bit-63 of double from double_array[noparse][[/noparse]0]
sign_bit := sign_bit << 24 ' place this into MSB of sign_bit

The next thing to do is get the exponent. It spans two of my byte wide array cells so I'll need to grab the two parts and put it back together. Get 8 bits from byte_arrary for the exponent. From the original double these should be bits 62-55

exponent := ( double_array[noparse][[/noparse]0] & $7f ) << 1 ' get bits 62-56 of double from bits 6-0 double_array[noparse][[/noparse]0] shift Left one to make room for 8th bit to come next
exponent := exponent | ( double_array & $80 ) >> 7 ' get bit 55 of double from bit 7 double_array shift it right so it will land in proper location within exponent
exponent := exponent << 23 ' shift this up to bits 30-23; its final resting place

Finally the fractional part. Get 23 bits from byte_arrary for the fractional part. From the original double these should be bits 51-29

fraction := (double_array & $0f)  << 19                        ' get bits 51-48 of double from 3-0 of double_array; 22-19
fraction := fraction | ( (double_array & $FF)  << 11)      ' get bits 47-40 of double from 7-0 of double_array; 18-11
fraction := fraction | ( (double_array & $FF)  << 3)        ' get bits 39-32 of double from 7-0 of double_array; 10-3
fraction := fraction | ( (double_array & $E0)  >> 5)       ' get bits 31-29 of double from 7-5 of double_array; 2-0

float_val := sign_bit | exponent | fraction

However this doesn't account for the biasing for the exponent. When I get that figured out I'll be done. This isn't as simple as subtracting 1023 from my new exponent, then adding 127 is it?

Thanks,
pb

Post Edited (pgbpsu) : 3/18/2008 4:09:38 AM GMT

Mike Green · 2008-03-18 04:16

As I understand the exponent from a cursory glance at the Wikipedia article, it is indeed as simple as subtracting 1023, then adding 127 and, of course, packing the result back into the proper field.

It would be easier to actually use an array of longs for the double. You could do:

exponent := ((double[noparse][[/noparse] 0 ] & $7FF00000) >> 20) - 1023
fraction := ((double[noparse][[/noparse] 0 ] & $000FFFFF) << 3) | ((double[noparse][[/noparse] 1 ] & $E0000000) >> 29)
float_val := (double[noparse][[/noparse] 0 ] & $8000000) | ((exponent + 127) << 23) | fraction

Strictly speaking, you'd need a test for an invalid exponent and conversion to NaN.

hippy · 2008-03-18 13:27

pgbpsu said...
WalterW's post is just about a year old

Very confusing, in my Browser ...

WalterW's post shows "Posted Today 1:58 PM (GMT -7)"

Your first post shows "Posted Yesterday 7:24 PM (GMT -7)"

Edit : This post shows "Posted Today 6:27 AM (GMT -7)"

pgbpsu · 2008-03-18 14:37

That is very strange. I searched for this problem of double to float last night and finally found Walter's post. But I'm sure the original date was 3/19/2007. And with only one post at that time, I didn't expect to hear from him. Mike and I went back and forth last night so those dates are correct. I have no idea how the posted time on Walter's post has moved to today.

Maybe an error in converting from double to float ; )
p

stevenmess2004 · 2008-03-20 08:10

While looking at some of the first posts I get the same problem with the date changing to today or yesterday.

Must be some problem with the forum software.

Double-Float (64-bit) to Single-Float (32-bit) Conversion

Comments