Shop OBEX P1 Docs P2 Docs Learn Events
Double and float variable max limit problem — Parallax Forums

Double and float variable max limit problem

Hi everyone,

I have a quicstart board. I am using double and float variable for meter calculating, but when I use float variable , it is get max value 5242,00. Then I use double variable it is get max value 8192,00. I know this is not the maximum value of these variables. I want to calculate higher meter values.

Please help me.
Thanks in advance.

Comments

  • Something somewhere else must be wrong. Can you post your code?
  • todieforteotodieforteo Posts: 7
    edited 2018-02-13 09:14
    Hi Electrodude , I shared my code. I count pulse from rotary encoder and print to serial monitor.but it is stop when meter get 8192,00 value.
    #include "simpletools.h"
    #include "propeller.h"
    #include "fdserial.h"
    #include "simpletext.h"
    
    #define pinA 9
    #define pinB 8
    
    volatile static float pulse_distance=0.000305;
    volatile static double metre1=0;
    
    static int encoderStack[128]; 
    
    void encoder(void *par)
    {
      int a=0;
      int b=0;
      unsigned int pinstate;
      int onceki_durum;
      while(1)
      {
        
     pinstate = INA; 
      a=(pinstate >> pinA) & 1;
      b=(pinstate >> pinB) & 1;
    
        if(b!=onceki_durum){
          
          if(b==0)
          {
            
          if(a==1)
          {
            sayac++;
            metre1=(pulse_distance+metre1);
            }
            else
            {
            metre1=(metre1-pulse_distance);
              
              }
            }
          }
          onceki_durum=b;
       }
     }
     
     int main()
    {
      fdserial *read;
      read=fdserial_open(31,30,0,115200);
      cogstart(&encoder,NULL,encoderStack,sizeof encoderStack);
    
      while(1)
    {
      
      
     switch (fdserial_rxCheck(read)){
         case 'a':{
           
        metre1=0.0;
        
       break;
      
         
          }
         }  
    
      dprint(read,"%.2f\n",metre1);
     
      pause(4);
      
    }  
    }  
    
    
    
    
    
  • Float and double have only so many bits of precision. When you add such a small number to such a big number, the small number gets rounded to zero, and the addition or subtraction does nothing. Also, I'm pretty sure doubles and floats are usually the same on PropGCC, i.e. they both only have a 24 bit mantissa.

    To get some more precision, try using fixed-point arithmetic: use 32-bit ints for metre1 and pulse_distance, and set pulse_distance = 1. When you interpret the value of metre1, remember that the stored value is the true value divided by 0.000305: you need to multiply it by 0.000305 before displaying it. You can do this last multiplication using floats if you're lazy, but there are ways to do it without any floats.
  • evanhevanh Posts: 15,916
    edited 2018-02-13 22:29
    Double is giving correct result so will be using the full 64-bit (48-bit mantissa) standard.

    As Electrodude says, a Single's more limited mantissa is getting in the way. You need about 34 bits but a Single only provides 24 bits. How it will be working is it'll start off accumulating correctly but as you exceed the 24-bit capacity the lowest bits get rounded off and the new additions will also be rounded, maybe truncated. Result is rubbish.

  • evanhevanh Posts: 15,916
    Oops, I read that wrong. Double is also not correct. 8192 looks bad for Doubles also. I would have expected more accumulation range with them.

    Aside from that detail, what result were you expecting? 1000.00 presumably.

    The hard coded 305 micron preset for the encoder pulse increments looks suspiciously like the number of millimetres in a foot rather than something suited to a metre.

  • mikeologistmikeologist Posts: 337
    edited 2018-02-14 03:33
    Consider using a uint64_t instead of floating point.
    You'll have to multiply your entire system by 10^6, but just for the setup. Once it's running the math is the same only with whole numbers.
    Since you are only printing 2 decimal digits:
    1) Print all but the last 6 digits
    2) Print a decimal
    3) Print the the two "decimal" digits

    Your print will be slower, but your math will be faster and you will experience no rounding errors.

    Your max range will be 18,446,744,000,000
  • evanhevanh Posts: 15,916
    Yes, that's the crux of it for sure - The more bits in play, the more lossless accumulation that can be done. The tiny increments are limiting the dynamic range the floats can represent. This is fundamentally why the accumulation stops at a certain upper value.

    If this was applied to an integer the adder circuits will roll over back to zero and carry on. It's clear when a roll over occurs and needs more bits. With floats it's not so clear you've hit the precision limit. The errors just grow until it stops accumulating altogether.

    To summarise, floating-point doesn't suit this application, fixed-point (Integers) is the better choice.

    Only after the data is captured should you be processing into floats for display/analysis.

  • mikeologistmikeologist Posts: 337
    edited 2018-02-14 03:48
    #include "simpletools.h"
    #include "propeller.h"
    #include "fdserial.h"
    #include "simpletext.h"
    
    #define pinA 9
    #define pinB 8
    
    volatile static int pulse_distance = 305;
    volatile static uint64_t metre1 = 0;
    
    static int encoderStack[128]; 
    
    void encoder(void *par) {
      int a=0;
      int b=0;
      unsigned int pinstate;
      int onceki_durum;
      while(1) {
        pinstate = INA; 
        a=(pinstate >> pinA) & 1;
        b=(pinstate >> pinB) & 1;
    
        if(b!=onceki_durum) {
          
          if(b==0){
            if(a==1) {
              //sayac++;
              metre1 += pulse_distance;
            } else {
              metre1 -= pulse_distance;
            }
          }
        }
        onceki_durum=b;
      }
    }
     
     int main() {
      fdserial *read;
      read=fdserial_open(31,30,0,115200);
      cogstart(&encoder,NULL,encoderStack,sizeof encoderStack);
    
      while(1) {
        switch (fdserial_rxCheck(read)){
          case 'a':{
            metre1=0;
            break;
          }
        }  
        dprint(read,"%.2f\n",(double)metre1/1000000);
        pause(4);
      }  
    }
    

    This compiles just fine. Think I got it all. There was an errant variable sayac that I commented out.
  • Heater.Heater. Posts: 21,230
    mikeologist,

    That does not work. You have metre1 as type uint64_t. That means it can not be negative. The "metre1 -= pulse_distance;" will underflow!

    Personally I would just count the pulses up and down in steps of 1. No point in wasting bits by incrementing/decrementing by anything bigger than 1.
    static int64_t pulseCount = 0;
    .
    .
    .
        if (a==1)
        {
            pulseCount++;
        }
        else
        {
            pulseCount--;
        }
    .
    .
    .
    
    When you want to use that count as a distance in meters multiply it by the appropriate scaling factor.
    double getDistance()
    {
        return pulseCount * 0.000305;
    }
    
    Better still. Don't use floats or doubles. Use fixed point arithmetic. Everything can be done using integers scaled up by an appropriate power of 2.


  • evanhevanh Posts: 15,916
    edited 2018-02-14 04:39
    Heater. wrote: »
    You have metre1 as type uint64_t. That means it can not be negative. The "metre1 -= pulse_distance;" will underflow!

    It'll work fine because there isn't any attention paid to the arithmetic flags in the ALU. The 2's-complement circuit works the same with signed and unsigned. There'll probably be a compile time warning, but no ill effects.

    Your increment by one is ideal solution though.


    EDIT: Typo
  • Heater.Heater. Posts: 21,230
    Not quite evanh. You are right that the ALU works the same but this is C we are using. C treats signed and unsigned values appropriately. Subtracting 1 from a uint32_t value of zero underflows and C will treat it the largest 32 bit unsigned integer, 4294967295.

    Consider the following code:
        uint64_t count = 0;
        float distance;
    
        count++;
        distance = count * 0.000305;
        printf("%f\n", distance);
        count -= 2;
        distance = count * 0.000305;
        printf("%f\n", distance);
    
    Which outputs:

    0.000305
    5626256833904640.000000

    Changing count to int64_t produce the right result.

    There are no warning from the compiler in either case.

  • evanhevanh Posts: 15,916
    That's still working fine.

  • Heater.Heater. Posts: 21,230
    There must be some weird physics going on on your planet, where 5626256833904640.0 = -0.000305 :)

    You might have to explain that one.




  • evanhevanh Posts: 15,916
    One is an error (over/underflow), while the other is normal operation (rollover).

  • Heater.Heater. Posts: 21,230
    I'm not following your logic there evanh.

    Certainly the C language defines what should happen when unsigned ints overflow or underflow.

    On the other hand overflow of signed integers is undefined behavior in C. Anything could happen and it would all be C compliant.

    But, even if that unsigned underflow I show above is defined behavior and legitimate C code, the code as a whole is not doing what we want. It's wrong, a bug.
  • Heater. wrote: »
    Certainly the C language defines what should happen when unsigned ints overflow or underflow.

    Actually it doesn't. C follows the usual CPU convention of being agnostic as to whether ints are signed or unsigned for addition and subtraction, so those operations are "circular." Whether MSB set means negative or bigger is determined by how you print it out and how you do other operations like multiply and divide. The actual addition and subtraction operations are the same and roll over. Confusion on this point is why 32-bit systems are often considered to be limited to 2 gigabytes instead of 4 -- much code invisibly uses signed math for multiplication, causing errors in the calculation of addresses with MSB set.

  • localroger wrote: »
    Heater. wrote: »
    Certainly the C language defines what should happen when unsigned ints overflow or underflow.

    Actually it doesn't. C follows the usual CPU convention of being agnostic as to whether ints are signed or unsigned for addition and subtraction, so those operations are "circular." Whether MSB set means negative or bigger is determined by how you print it out and how you do other operations like multiply and divide. The actual addition and subtraction operations are the same and roll over. Confusion on this point is why 32-bit systems are often considered to be limited to 2 gigabytes instead of 4 -- much code invisibly uses signed math for multiplication, causing errors in the calculation of addresses with MSB set.

    I do not know all the definitions. Can you please recommend a pdf on the topic?

    I have not replied yet because I feared that I had made a mistake. So I tested it. Regardless of the definitions, Heater was correct on two points:
    1) The underflow that I was counting on failed and an int64_t is the correct choice
    2) I should have made the logical jump to incrementing and decrementing by 1 as it allows for the maximum range and any adjustments can be made right before printing as that only happens once

    So same code basically, just make it an int64_t, increment by 1, cast double and multiply the result by 305/10^6

    Did I understand the fix correctly @heater?
  • Heater.Heater. Posts: 21,230
    edited 2018-02-15 06:58
    It's all in the C and C++ standards documents.

    https://www.iso.org/standard/57853.html
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf
    https://isocpp.org/std/the-standard

    Contrary to what localroger says, wraparound behavior using unsigned integers is legal and well-defined, and there are code idioms that deliberately use it. On the other hand, C and C++ have undefined semantics for signed overflow and shift past bitwidth.

    The above phrasing shamelessly borrowed from this great document on the subject:
    https://www.cs.utah.edu/~regehr/papers/overflow12.pdf

    As localroger says, there is confusion on this point.

    Sounds like you have the hang of it mikeologist.
  • Heater. wrote: »
    It's all in the C and C++ standards documents.

    https://www.iso.org/standard/57853.html
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf
    https://isocpp.org/std/the-standard

    Contrary to what localroger says, wraparound behavior using unsigned integers is legal and well-defined, and there are code idioms that
    deliberately use it. On the other hand, C and C++ have undefined semantics for signed overflow and shift past bitwidth.

    The above phrasing shamelessly borrowed from this great document on the subject:
    https://www.cs.utah.edu/~regehr/papers/overflow12.pdf

    Sounds like you have the hang of it mikeologist.

    Sweet, thanks
    I'll try to update OPs code and tag you
  • #include "simpletools.h"
    #include "propeller.h"
    #include "fdserial.h"
    #include "simpletext.h"
    
    #define pinA 9
    #define pinB 8
    
    volatile static int pulse_distance = 305;
    volatile static uint64_t metre1 = 0;
    
    static int encoderStack[128]; 
    
    void encoder(void *par) {
      int a=0;
      int b=0;
      unsigned int pinstate;
      int onceki_durum;
      while(1) {
        pinstate = INA; 
        a=(pinstate >> pinA) & 1;
        b=(pinstate >> pinB) & 1;
    
        if(b!=onceki_durum) {
          
          if(b==0){
            if(a==1) {
              //sayac++;
              metre1 += pulse_distance;
            } else {
              metre1 -= pulse_distance;
            }
          }
        }
        onceki_durum=b;
      }
    }
     
     int main() {
      fdserial *read;
      read=fdserial_open(31,30,0,115200);
      cogstart(&encoder,NULL,encoderStack,sizeof encoderStack);
    
      while(1) {
        switch (fdserial_rxCheck(read)){
          case 'a':{
            metre1=0;
            break;
          }
        }  
        dprint(read,"%.2f\n",(double)metre1/1000000);
        pause(4);
      }  
    }
    

    This compiles just fine. Think I got it all. There was an errant variable sayac that I commented out.

    Hi everyone,
    firstly, I am thank you for your helps. I solved my problem using uint64_t variable.but I use a lot of mathematical operation on another cogs and ı don't want to slowing my program. Another problem, I use a 2500 pulse incremental rotary encoder. quick start board works good until 120 m/min.but it is can not count the pulses when the speed exceeds 120 m / min. does the quick start board have a high speed counter pin?
  • Hi everyone,
    firstly, I am thank you for your helps. I solved my problem using uint64_t variable.but I use a lot of mathematical operation on another cogs and ı don't want to slowing my program. Another problem, I use a 2500 pulse incremental rotary encoder. quick start board works good until 120 m/min.but it is can not count the pulses when the speed exceeds 120 m / min. does the quick start board have a high speed counter pin?

    Here you are reading the pin and using a total of 4 clocks to do so.
    Read about the counter in the prop manual. You can find loads of posts here on how to use it. The counter is waaaay faster.
    I don't have a C example handy. I use PASM directly.
Sign In or Register to comment.