Double-Float (64-bit) to Single-Float (32-bit) Conversion
WalterW
Posts: 1
I'm trying to convert a 64-bit double precision floating number (GPS coordinates in radians) into a 32-bit single precision floating number (GPS coordinates in Semicircles). My major hurdle is doing the down-conversion from double to single.
My theory so far has been to put the 64-bit data into an 8 byte array, and then move it into 3 longs. The first long (the sign bit) I'm trying to shift down >>= (right) then shift back up <<= (left, to clear the bits), but when i perform the operation, i end up with the sign bit in the last byte: 00 00 00 80 instead of: 80 00 00 00. The next long (middle long) I want to shift 4 left <<= (to knock off the top 4 bits) then shift right >>= 1 to clear what will be the sign bit, and leaving the bottom 3 bits open. The last long contains the extra 3 bits i need from the lower-32 bits of the 64-bit number. For the last long I want to shift the top 3 bits to the bottom >>= 29 (shift right) so that can be added to the middle long. When I perform the shift operation, I end up with 00 00 00 00 instead of 00 00 00 05.
Since the difference between a 64-bit double and 32-bit single is 3 extra exponent bits, and 29 extra mantissa bits, I'm hoping that shuffling the bits around will allow me to get the resultant 32-bit single without having to code some overly complex routine.
I'm guessing that the numbers are stored in PropRAM MSB... What I can't tell is if the chip is trying to perform its own 'signing' and is interfering with my bit-manipulation. Anyone have any-other bitwise manipulations that could yield the same result?
My theory so far has been to put the 64-bit data into an 8 byte array, and then move it into 3 longs. The first long (the sign bit) I'm trying to shift down >>= (right) then shift back up <<= (left, to clear the bits), but when i perform the operation, i end up with the sign bit in the last byte: 00 00 00 80 instead of: 80 00 00 00. The next long (middle long) I want to shift 4 left <<= (to knock off the top 4 bits) then shift right >>= 1 to clear what will be the sign bit, and leaving the bottom 3 bits open. The last long contains the extra 3 bits i need from the lower-32 bits of the 64-bit number. For the last long I want to shift the top 3 bits to the bottom >>= 29 (shift right) so that can be added to the middle long. When I perform the shift operation, I end up with 00 00 00 00 instead of 00 00 00 05.
Since the difference between a 64-bit double and 32-bit single is 3 extra exponent bits, and 29 extra mantissa bits, I'm hoping that shuffling the bits around will allow me to get the resultant 32-bit single without having to code some overly complex routine.
I'm guessing that the numbers are stored in PropRAM MSB... What I can't tell is if the chip is trying to perform its own 'signing' and is interfering with my bit-manipulation. Anyone have any-other bitwise manipulations that could yield the same result?
Comments
Are you still out there?
Did you ever get this working? I'm facing a similar problem: GPS binary data that I'd like to parse contains the seconds as a double. How on earth (or heavens since we're talking gps) they think a 64-bit float is required to store the seconds, I can't understand. It seems like a 32-bit float is more than accurate enough for gps time stamps. In any case, did you get this working? If so would you mind sharing?
Thanks,
Peter
You really need to show the code you've written for this. It's hard to tell what you're doing wrong from your description.
All of the shift operations except the arithmetic shift right are logical shifts. Maybe you've typed the operator wrong in your program.
WalterW's post is just about a year old so I'm not optimistic I'll hear back from him. I agree that what he suggested didn't sound difficult from a bit manipulation standpoint. I was hoping he'd give me some insight on how to go about this conversion. I'm reading wikipedia right now to see how to code this.
I'll see what I come up with... Please stay tuned since I'll no doubt need some pointers.....
Thanks,
pb
Yes, I didn't look that closely at the date. I may be wrong, but I think you can basically unpack the double precision value into sign, exponent, and mantissa, then pack it into single precision form. The mantissa is left justified, so it just needs to be shifted into the new position and the extra rightmost bits discarded. The sign bit is left as the most significant bit. When you unpack and pack the exponent, there's an adjustment that depends on the size of the field that you have to remove. For single precision, I think it's 127. For double precision, I think it's 1023.
The Wikipedia entry has nice pictures and explanations of the various formats.
I'm not concerned about the bit manipulation, but I'm still trying to figure out the biasing.
I'll be starting out with all 64 bits of my double in a byte wide array (double_array[noparse][[/noparse]0-7].
If sign_bit is a long, this should retain the sign:
sign_bit := double_array[noparse][[/noparse]0] & $80 ' get bit-63 of double from double_array[noparse][[/noparse]0]
sign_bit := sign_bit << 24 ' place this into MSB of sign_bit
The next thing to do is get the exponent. It spans two of my byte wide array cells so I'll need to grab the two parts and put it back together. Get 8 bits from byte_arrary for the exponent. From the original double these should be bits 62-55
exponent := ( double_array[noparse][[/noparse]0] & $7f ) << 1 ' get bits 62-56 of double from bits 6-0 double_array[noparse][[/noparse]0] shift Left one to make room for 8th bit to come next
exponent := exponent | ( double_array & $80 ) >> 7 ' get bit 55 of double from bit 7 double_array shift it right so it will land in proper location within exponent
exponent := exponent << 23 ' shift this up to bits 30-23; its final resting place
Finally the fractional part. Get 23 bits from byte_arrary for the fractional part. From the original double these should be bits 51-29
However this doesn't account for the biasing for the exponent. When I get that figured out I'll be done. This isn't as simple as subtracting 1023 from my new exponent, then adding 127 is it?
Thanks,
pb
Post Edited (pgbpsu) : 3/18/2008 4:09:38 AM GMT
It would be easier to actually use an array of longs for the double. You could do:
Strictly speaking, you'd need a test for an invalid exponent and conversion to NaN.
Very confusing, in my Browser ...
WalterW's post shows "Posted Today 1:58 PM (GMT -7)"
Your first post shows "Posted Yesterday 7:24 PM (GMT -7)"
Edit : This post shows "Posted Today 6:27 AM (GMT -7)"
Maybe an error in converting from double to float ; )
p
Must be some problem with the forum software.