Forum Update - Announcement about May 10th, 2018 update and your password.

Any ways to speed up Float32?

135

Comments

  • lonesocklonesock Posts: 887
    edited September 2010 Vote Up0Vote Down
    Hmmm, I plotted the errors for sine, and the results are weird and localized, as in it's most likely a programming bug on my part. I will look into this. I should have the same accuracy as the original version. And, of course, the cos and tan code use the same algorithm, so hopefully if I fix one I'll fix them all.

    Thanks so much for the testing!

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • lonesocklonesock Posts: 887
    edited September 2010 Vote Up0Vote Down
    I just updated the attachment in the first post. I had a bug in my sine table lookup/interpolation code, which only happened in 2 of the 4 quadrants. Given that error, I just went through all the functions I modified, and checked the values of my code and the original code against Excel's authoritative answer key. Results:

    Same error, but faster: sin, cos, tan, sqrt, mul, div, log, exp

    Less error: ATan2, ATan, ASin, ACos

    There were no code changes to the other functions (that I remember). So, basically for the arc* routines, the CORDIC routine is both faster and more accurate than the 5th order polynomial approximation (and the needed sqrt(1-x*2) conversions pre and post).

    Thanks again for kicking over this issue, John!

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • lonesocklonesock Posts: 887
    edited September 2010 Vote Up0Vote Down
    New version uploaded. I changed the name to F32 (where the brevity is _supposed_ to imply smaller code). The CORDIC routine is faster for all the arc* stuff (used a longer table but shortened the inner loop so it's faster). The table interpolation code is a bit faster, so all the log/pow/trig functions should be a bit faster, and Tan is faster in that I reused a portion of the processing results from _Sin when computing _Cos.

    I did implement a full CORDIC version of Sin/Cos/Tan, and it did deliver an order of magnitude accuracy improvement, but the speed was ~3x slower (also slowed down the arc* stuff, as I had to share the CORDIC code to get it to fit in-cog), so I pulled that code back out. If somebody wants a high-precision F32, I can make that an option, but would require more testing, and I would want to convert the Pow/Log stuff to CORDIC as well. Any *need* for this? I'm not really inclined to go much farther on it.

    Finally, I have not implemented the user-defined function mechanism. Float32Full has it, and I was wondering if anybody needs and/or wants it. It seems a reasonably useful thing to have, and I'm inclined to use the remaining 26 longs in-cog to yield this functionality...any votes? (Note that the exact mechanism will probably have to change a bit.)

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • Heater.Heater. Posts: 20,781
    edited September 2010 Vote Up0Vote Down
    Lonesock,

    More amazing progress.

    One thing I noticed missing is a rounding that returns a float as appears in the C maths library.
    float roundf(float x);

    I can make one from cmdFTruncRound and cmdFFloat but it is slow.

    Any chance you can convert F32.spin to normal ASCII so that simple text editors don't throw up?

    Zog votes for speed over accuracy. If a C user wants the accuracy with Zog they can use the normal math library sin/cos/tan and take the speed hit.

    No opinion on user defined functions. Does anybody use them?
  • lonesocklonesock Posts: 887
    edited September 2010 Vote Up0Vote Down
    OK, FloatTrunc and FloatRound added, ASCII only, CRLF endings (so notepad likes it).

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • jazzedjazzed Posts: 11,803
    edited September 2010 Vote Up0Vote Down
    Nice work Jonathan. Can't wait to use it with ZOG.
  • John AbshierJohn Abshier Posts: 1,066
    edited September 2010 Vote Up0Vote Down
    What is the difference between FTrunc and FloatTrunc? Comments say they do the same thing, but b has a different value.

    John Abshier
  • lonesocklonesock Posts: 887
    edited September 2010 Vote Up0Vote Down
    Thanks, all. Sorry, John...copy-n-paste laziness has struck again. [8^) the FTrunc & FRound versions return a whole number as a signed integer. FloatTrunc and FloatRound return the whole number in floating-point format. Comments have now been updated...sorry again.

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • Heater.Heater. Posts: 20,781
    edited September 2010 Vote Up0Vote Down
    I hope the comments are wrong.

    One of them should return the integer result as an actual integer type, the other should return it in floating point format.

    Not sure if we need them both.
    The GCC soft-float calls are need are:

    int __fixsfsi (float a) - convert a to a signed integer, rounding toward zero

    unsigned int __fixunssfsi (float a) - convert a to an unsigned integer, rounding toward zero. Negative values all become zero.

    The C standard math functions are:

    float truncf(float x) - round x to the nearest integer not larger in absolute value.

    float roundf(float x) - round x to the nearest integer, but round halfway cases away from zero
  • lonesocklonesock Posts: 887
    edited September 2010 Vote Up0Vote Down
    Heater, right now the only thing you are missing is the "unsigned int __fixunssfsi (float a)" functionality...do you need this? (You can get most of the way there with FTrunc followed by a PASM min statement, but of course you lose that top bit of your range.)

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • Heater.Heater. Posts: 20,781
    edited September 2010 Vote Up0Vote Down
    Does Zog need __fixunssfsi?

    Possibly yes:

    Using the ZPU soft float implementation of __fixunssfsi takes 116 zpu instructions which make 4 calls. So it's big and slow.

    That top bit of range seems quite important.

    Just now I have no way to get any PASM in the path. As I'm just using C code to read/write the memory interface LONGs.

    Just for fun here is the ZPU softfloat __fixunssfsi:

    __fixunssfsi():
        193d:    8c              im 12
        193e:    08              load
        193f:    02              pushsp
        1940:    8c              im 12
        1941:    0c              store
        1942:    fd              im -3
        1943:    3d              pushspadd
        1944:    0d              popsp
        1945:    81              im 1
        1946:    f2              im -14
        1947:    0a              flip
        1948:    0b              nop
        1949:    83              im 3
        194a:    3d              pushspadd
        194b:    0c              store
        194c:    8c              im 12
        194d:    08              load
        194e:    88              im 8
        194f:    05              add
        1950:    08              load
        1951:    02              pushsp
        1952:    84              im 4
        1953:    05              add
        1954:    0c              store
        1955:    e0              im -32
        1956:    e2              im -30
        1957:    3f              callpcrel
        1958:    80              im 0
        1959:    08              load
        195a:    53              storesp 12
        195b:    72              loadsp 8
        195c:    80              im 0
        195d:    25              lessthanorequal
        195e:    83              im 3
        195f:    38              neqbranch
        1960:    ae              im 46
        1961:    39              poppcrel
    
    00001962 <.L3>:
    .L3():
        1962:    81              im 1
        1963:    f3              im -13
        1964:    0a              flip
        1965:    0b              nop
        1966:    83              im 3
        1967:    3d              pushspadd
        1968:    0c              store
        1969:    8c              im 12
        196a:    08              load
        196b:    88              im 8
        196c:    05              add
        196d:    08              load
        196e:    02              pushsp
        196f:    84              im 4
        1970:    05              add
        1971:    0c              store
        1972:    db              im -37
        1973:    d1              im -47
        1974:    3f              callpcrel
        1975:    80              im 0
        1976:    08              load
        1977:    53              storesp 12
        1978:    72              loadsp 8
        1979:    02              pushsp
        197a:    84              im 4
        197b:    05              add
        197c:    0c              store
        197d:    dd              im -35
        197e:    c5              im -59
        197f:    3f              callpcrel
        1980:    80              im 0
        1981:    08              load
        1982:    81              im 1
        1983:    0a              flip
        1984:    11              addsp 4
        1985:    70              loadsp 0
        1986:    8c              im 12
        1987:    08              load
        1988:    fc              im -4
        1989:    05              add
        198a:    0c              store
        198b:    51              storesp 4
        198c:    53              storesp 12
        198d:    96              im 22
        198e:    39              poppcrel
    
    0000198f <.L2>:
    .L2():
        198f:    8c              im 12
        1990:    08              load
        1991:    88              im 8
        1992:    05              add
        1993:    08              load
        1994:    02              pushsp
        1995:    84              im 4
        1996:    05              add
        1997:    0c              store
        1998:    dd              im -35
        1999:    aa              im 42
        199a:    3f              callpcrel
        199b:    80              im 0
        199c:    08              load
        199d:    70              loadsp 0
        199e:    8c              im 12
        199f:    08              load
        19a0:    fc              im -4
        19a1:    05              add
        19a2:    0c              store
        19a3:    53              storesp 12
    
    000019a4 <.L1>:
    .L1():
        19a4:    8c              im 12
        19a5:    08              load
        19a6:    fc              im -4
        19a7:    05              add
        19a8:    08              load
        19a9:    80              im 0
        19aa:    0c              store
        19ab:    85              im 5
        19ac:    3d              pushspadd
        19ad:    0d              popsp
        19ae:    8c              im 12
        19af:    0c              store
        19b0:    04              poppc
    
  • lonesocklonesock Posts: 887
    edited September 2010 Vote Up0Vote Down
    OK, I just uploaded a new version with an added function: UintTrunc. It returns an unsigned int from the floating point input. (I'm still not sure how Zog will handle this, as I thought all Zog variables were treated as signed ints, but that's probably much easier for GCC to workaround than the semi-hideous cast code you showed me [8^).

    There are only 4 longs free in the cog right now...some more can be saved by re-slowing some of the functions, but unless more functionality needs to be crammed it there, we might be at the release point (after some serious testing).

    Speaking of release, I have a question (especially if any mods are watching this thread): What is the best way to publish this? I could stick it in the OBEX, but it might be nice to bundle it with the Propeller Tool (and maybe replace the Float32/Float32All code once it get's approved...any way to reach of Cam?).

    thanks for your input,
    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • Mike GreenMike Green Posts: 22,735
    edited September 2010 Vote Up0Vote Down
    I'd start by posting it to the OBEX. That way it'd be available in a formal fashion in the same place as Float32.

    You might use Float32's documentation as a model to document your version, copy what you can, change what needs to be changed, and leave out what doesn't apply.
  • Heater.Heater. Posts: 20,781
    edited September 2010 Vote Up0Vote Down
    lonesock,

    You are right, the ZPU is tuned for signed 32 bit ints.

    I have noticed that zpu-gcc generates some pretty bloaty code when using unsigned ints. For example it will insert long winded routines for unsigned mod and div operators. I have not really explored this fully.

    This is something we just have to live with. It could be tackled by adding yet another COG to handle all those GCC generated routines instead. Or they could be handled by LMM PASM by the Zog GOG. I'm really not into doing either of those.

    So it is possible I have put you to the trouble of creating UIntTrunc for nothing, that is to say the benefit it gives to Zog is overshadowed by all the other bloat that using unsigned its entails. My apologies if so.

    Now back to integrating and testing...
  • lonesocklonesock Posts: 887
    edited September 2010 Vote Up0Vote Down
    Hey, Heater, no worries, I'm just glad everything fit into a single cog!

    And thanks, Mike. Once all the features are in, I'll polish up the documentation, call for final testing, then post it to the OBEX.

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • Heater.Heater. Posts: 20,781
    edited October 2010 Vote Up0Vote Down
    I've just been testing the C soft-float function __fixunssfsi which is using the cmdUintTrunc command from F32.

    Now as far as I can tell it works as per the GCC soft float documentation "... convert a to an unsigned integer, rounding toward zero. Negative values all become zero. "

    But I found that the using the zpu-gcc softfloat version cause my test to fail. More strangely calling __fixunssfsi on my Debian Linux box also fails!!!

    Here is a cut down version of my test which fails on my PC:
    #include <stdio.h>
    
    void passif(char* test, int status)
    {
      printf("%s ", test);
      if (status)
        printf ("OK\n");
      else
        printf ("FAIL\n");
    }
    
    int main (int c, char* argv[])
    {
      float t;
      unsigned int u;
    
      // Test __fixunssfsi
      t = 147.6;
      u = __fixunssfsi(t);
      passif ("__fixunssfsi (147.6)", u == 147); 
      printf("%u\n", u);
      
      t = -147.6;
      u = __fixunssfsi(t);
      passif ("__fixunssfsi (-147.6)", u == 0); 
      printf("%u\n", u);
    
      return(0);
    }
    

    and here is the result:
    __fixunssfsi (147.6) FAIL
    2
    __fixunssfsi (-147.6) FAIL
    2
    
    

    Looking at some assembler produced when compiling with -msoft-float I can see that __fixunssfsi is indeed what GCC inserts to assign a float to an unsigned it.

    Anyone got any idea what goes on here?
  • lonesocklonesock Posts: 887
    edited October 2010 Vote Up0Vote Down
    That is bizarre...any updates on this issue?

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • Heater.Heater. Posts: 20,781
    edited October 2010 Vote Up0Vote Down
    Lonesock,

    Not yet. I've been stuck for ages tracking down a total Zog failure which now looks like it's down to the zpu-gcc optimizer screwing up on -Os (optimise for size). Hope I can start to move forward again now.
  • lonesocklonesock Posts: 887
    edited October 2010 Vote Up0Vote Down
    Gotcha. Out of curiosity, are you using Zylin's "latest stable" or "latest snapshot" (or something else entirely) version of the toolchain?

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • Heater.Heater. Posts: 20,781
    edited October 2010 Vote Up0Vote Down
    Don't remember. It was a source snapshot from a while back I think. I'm doing a git clone at this moment.
  • Heater.Heater. Posts: 20,781
    edited October 2010 Vote Up0Vote Down
    No luck with the latest zpu-gcc still fails with -Os.
  • LawsonLawson Posts: 870
    edited January 2012 Vote Up0Vote Down
    lonesock wrote: »
    I did implement a full CORDIC version of Sin/Cos/Tan, and it did deliver an order of magnitude accuracy improvement, but the speed was ~3x slower (also slowed down the arc* stuff, as I had to share the CORDIC code to get it to fit in-cog), so I pulled that code back out. If somebody wants a high-precision F32, I can make that an option, but would require more testing, and I would want to convert the Pow/Log stuff to CORDIC as well. Any *need* for this? I'm not really inclined to go much farther on it.

    Jonathan

    I'd really love a CORDIC log function. I'm using F32 with high precision thermistor measurements and I'm linearising with the Stienhart-Hart equation. I've got at least 16 stable bits in my measurements so getting the full ~23-bits of precision of a 32-bit float is important to keep rounding errors small.

    Also, I found a bug in Log. It's highly localized to numbers between 4095 and 4096 but the error is large. (input is almost 2^12... some issue with log-table wrap around?)
    input                 F32 result         Matlab result
    4095.761		7.791079		8.317707815
    4095.466		7.995431		8.317635787
    4095.135		8.224144		8.317554963
    4095.196		8.18219 		8.317569858
    4095.934		7.670633		8.317750053
    4095.947		7.662513		8.317753227
    4095.066		8.272864		8.317538113
    4095.27 		8.130764		8.317587928
    4095.477		7.987311		8.317638473
    4095.596		7.904758		8.317667529
    4095.745		7.801905		8.317703909
    4095.971		7.646273		8.317759087
    4095.674		7.850625		8.317686574
    4095.308		8.105051		8.317597207
    4095.651		7.866865		8.317680958
    4095.977		7.640859		8.317760551
    

    Lawson
    micro-power experiments with the propeller.
    Drivers for TAOS TSL3301 line sensor Forum thread, and OBEX
    Lumen Electronic Jewelery Website
    My AWD motorcycle Website and action video
    What I'm paid to work on. UW Lidar Group.
    FME, a Spin-only floating point library with trig, exponential, and logarithm functions. OBEX and Forum.
  • LawsonLawson Posts: 870
    edited January 2012 Vote Up0Vote Down
    Since Lonesock seems to be missing from the internet, I went ahead and fixed the LOG function.

    As I suspected, the problem was with the table interpolation code. It did not correctly deal with the second table address exceeding the bounds of the table. The fix was made more complex because the LOG and ALOG tables are 2048 entries saturating at $1_0000, while the SINE table is 2049 entries saturating at $FFFF. The fix uses 3 instructions and 2 masks using up all the free space in the F32 cog.

    The attached .zip file contains my test program that I used to verify that LOG was fixed, without messing up the accuracy of SIN.
    The attached F32.spin contains my fixes. (along with a version update note, and comments on my fixes)

    Is there any way to get the first post of this thread and OBEX entries for F32.spin updated with these fixes?

    Lawson
    micro-power experiments with the propeller.
    Drivers for TAOS TSL3301 line sensor Forum thread, and OBEX
    Lumen Electronic Jewelery Website
    My AWD motorcycle Website and action video
    What I'm paid to work on. UW Lidar Group.
    FME, a Spin-only floating point library with trig, exponential, and logarithm functions. OBEX and Forum.
  • lonesocklonesock Posts: 887
    edited January 2012 Vote Up0Vote Down
    Awesome! My son is due any day now, so it will be a little while before I do the update, but this will definitely go in...thanks!

    Jonathan
    Free time status: see my avatar [8^)
    F32 - fast & concise floating point: OBEX, Thread
    Unrelated to the prop: KISSlicer
  • Duane DegnDuane Degn Posts: 10,002
    edited January 2012 Vote Up0Vote Down
    Thanks Lawson and Jonathan,

    I use F32 a lot. I just replaced my earlier version with this new one.

    @Jonathan, I hope all goes well on your son's birthday.
  • Duane DegnDuane Degn Posts: 10,002
    edited December 2012 Vote Up0Vote Down
    I've been working on some inverse kinematic code for my hexapod and kept getting a glitch across certain servo angles. I've tracked it down to what looks like a bug in F32.

    Here's the output from a little test program I wrote.
    Tan2(31, 24.5) | F32 = 0.7548) | Float32Full = 0.902
    Tan2(31.1, 24.5) | F32 = 0.7548) | Float32Full = 0.9035
    Tan2(31.2, 24.5) | F32 = 0.7548) | Float32Full = 0.9051
    Tan2(31.3, 24.5) | F32 = 0.7548) | Float32Full = 0.9067
    Tan2(31.4, 24.5) | F32 = 0.7548) | Float32Full = 0.9082
    Tan2(31.5, 24.5) | F32 = 0.7548) | Float32Full = 0.9098
    Tan2(31.6, 24.5) | F32 = 0.7548) | Float32Full = 0.9113
    Tan2(31.7, 24.5) | F32 = 0.7548) | Float32Full = 0.9128
    Tan2(31.8, 24.5) | F32 = 0.7548) | Float32Full = 0.9143
    Tan2(31.9, 24.5) | F32 = 0.7548) | Float32Full = 0.9159
    Tan2(31, 24.4) | F32 = 1.005) | Float32Full = 0.904
    Tan2(31.1, 24.4) | F32 = 0.7548) | Float32Full = 0.9055
    Tan2(31.2, 24.4) | F32 = 0.7548) | Float32Full = 0.9071
    Tan2(31.3, 24.4) | F32 = 0.7548) | Float32Full = 0.9086
    Tan2(31.4, 24.4) | F32 = 0.7548) | Float32Full = 0.9102
    Tan2(31.5, 24.4) | F32 = 0.7548) | Float32Full = 0.9117
    Tan2(31.6, 24.4) | F32 = 0.7548) | Float32Full = 0.9133
    Tan2(31.7, 24.4) | F32 = 0.7548) | Float32Full = 0.9148
    Tan2(31.8, 24.4) | F32 = 0.7548) | Float32Full = 0.9163
    Tan2(31.9, 24.4) | F32 = 0.7548) | Float32Full = 0.9178
    Tan2(31, 24.3) | F32 = 1.005) | Float32Full = 0.906
    Tan2(31.1, 24.3) | F32 = 0.7548) | Float32Full = 0.9075
    Tan2(31.2, 24.3) | F32 = 0.7548) | Float32Full = 0.9091
    Tan2(31.3, 24.3) | F32 = 0.7548) | Float32Full = 0.9106
    Tan2(31.4, 24.3) | F32 = 0.7548) | Float32Full = 0.9122
    Tan2(31.5, 24.3) | F32 = 0.7548) | Float32Full = 0.9137
    Tan2(31.6, 24.3) | F32 = 0.7548) | Float32Full = 0.9153
    Tan2(31.7, 24.3) | F32 = 0.7548) | Float32Full = 0.9168
    Tan2(31.8, 24.3) | F32 = 0.7548) | Float32Full = 0.9183
    Tan2(31.9, 24.3) | F32 = 0.7548) | Float32Full = 0.9198
    Tan2(31, 24.2) | F32 = 1.005) | Float32Full = 0.908
    Tan2(31.1, 24.2) | F32 = 1.005) | Float32Full = 0.9095
    Tan2(31.2, 24.2) | F32 = 0.7548) | Float32Full = 0.9111
    Tan2(31.3, 24.2) | F32 = 0.7548) | Float32Full = 0.9126
    Tan2(31.4, 24.2) | F32 = 0.7548) | Float32Full = 0.9142
    Tan2(31.5, 24.2) | F32 = 0.7548) | Float32Full = 0.9157
    Tan2(31.6, 24.2) | F32 = 0.7548) | Float32Full = 0.9172
    Tan2(31.7, 24.2) | F32 = 0.7548) | Float32Full = 0.9188
    Tan2(31.8, 24.2) | F32 = 0.7548) | Float32Full = 0.9203
    Tan2(31.9, 24.2) | F32 = 0.7548) | Float32Full = 0.9218
    Tan2(31, 24.1) | F32 = 1.005) | Float32Full = 0.91
    Tan2(31.1, 24.1) | F32 = 1.005) | Float32Full = 0.9115
    Tan2(31.2, 24.1) | F32 = 1.005) | Float32Full = 0.9131
    Tan2(31.3, 24.1) | F32 = 0.7548) | Float32Full = 0.9146
    Tan2(31.4, 24.1) | F32 = 0.7548) | Float32Full = 0.9162
    Tan2(31.5, 24.1) | F32 = 0.7548) | Float32Full = 0.9177
    Tan2(31.6, 24.1) | F32 = 0.7548) | Float32Full = 0.9192
    Tan2(31.7, 24.1) | F32 = 0.7548) | Float32Full = 0.9208
    Tan2(31.8, 24.1) | F32 = 0.7548) | Float32Full = 0.9223
    Tan2(31.9, 24.1) | F32 = 0.7548) | Float32Full = 0.9238
    

    You can see how Float32Full has a nice gradual change in the angle from the function while F32 has a rather sharp jump.

    Here's my test program:
    CON
      _CLKMODE = XTAL1 + PLL16X 
      _XINFREQ = 5_000_000
     
    OBJ
     
      F32 : "F32V1_3"
      FloatFull : "Float32Full"
      Pst : "Parallax Serial Terminal"
      FloatStr : "FloatString"
        
    PUB Setup | localIndex
                                                                                  
      Pst.Start(115200)                                                      
      F32.start                                                                  
      FloatFull.start                                                              
      FloatStr.SetPrecision(4)
                                                  
      MainLoop
       
    PUB MainLoop | index1, index2
        
      'Pst.str(string(13, "Start of MainLoop Method "))
      index1 := 25.0 
      repeat
        index2 := 31.0
        repeat 10
          Pst.str(string(13, "Tan2("))
          Pst.str(FloatStr.FloatToString(index2))
          Pst.str(string(", "))
          Pst.str(FloatStr.FloatToString(index1))
          Pst.str(string(") | F32 = "))
          result := F32.ATan2(index2, index1)
          Pst.str(FloatStr.FloatToString(result))
          Pst.str(string(") | Float32Full = "))
          result := FloatFull.ATan2(index2, index1)
          Pst.str(FloatStr.FloatToString(result))
          index2 := F32.FAdd(index2, 0.1)
        index1 := F32.FSub(index1, 0.1)
        Pst.charin
     
    

    I usually use version 1.5 posted in this thread. Version 1.3 from the OBEX gives the same results.

    The "25" and "31" in the above test just happen to be the numbers I was using when I noticed this problem.

    I'll attempt to understand what the ATan2 method is doing, but I'm afraid this fix if beyond me.

    I really like how F32 only uses one cog and I'm hoping I don't have to resort to using the two cog Float32Full.

    If Jonathan or someone else can fix this bug, I'd really appreciate it.

    Thanks,
  • Duane DegnDuane Degn Posts: 10,002
    edited December 2012 Vote Up0Vote Down
    In an attempt to give better test numbers, I changed the test program to use "1.0" as the "b" value in the ATan2 calls. By using "1.0" (essentially turning it into an ATAN call, it returned expected (correct) results.

    It also appears to work correctly using "-1.0" for the second argument.

    For now, I'll just just divide both arguments by the absolute value of the "b" term.
  • Duane DegnDuane Degn Posts: 10,002
    edited December 2012 Vote Up0Vote Down
    Here's my ugly patch of the ATan2 method.

    Edit: The patch doesn't check for a zero "b" term. This is not a good fix. I'll try to fix my patch.
    See post #92 below for a better patch.

    PUB ATan2(a, b)
    {{
      Arc Tangent of vector a, b (in radians, no division is performed, so b==0 is legal).
      Parameters:
        a        32-bit floating point value
        b        32-bit floating point value
      Returns:   32-bit floating point value (angle in radians)
    }}
     [B] result := b & $7FFF_FFFF      ' absolute value
      if b == result
        b := 1.0
      else
        b := -1.0  
      a := FDiv(a, result)
    [/B]result  := cmdATan2
      f32_Cmd := @result
      repeat
      while f32_Cmd
     
    

    Testing to see if the absolute value of b equals b is 1632 clocks faster than just dividing b by the absolute value of b.

    I hope Jonathan doesn't mind my posting a patched version of F32 here until it can be fixed properly.

    I also have a modified version of F32 that allows it to be used from multiple objects and multiple cogs. In order to do so, I added a lock. I'm sure the lock slows things down a bit so I wouldn't suggest using my modified version unless you need it.

    Edit: I haven't tested the multi cog version well enough. I suspect it has problems so I removed it.
    Edit(12/17/12): I've also removed the modified F32 object from this post. A better modified version is attached to post #92.
  • Duane DegnDuane Degn Posts: 10,002
    edited December 2012 Vote Up0Vote Down
    It looks like one person has downloaded my supposed "fix". My patch doesn't check for a zero "b" term. I'll try to fix my patch soon, but be warned the current version doesn't handle all possible values correctly.
  • SRLMSRLM Posts: 5,045
    edited December 2012 Vote Up0Vote Down
    Duane Degn wrote: »
    It looks like one person has downloaded my supposed "fix". My patch doesn't check for a zero "b" term. I'll try to fix my patch soon, but be warned the current version doesn't handle all possible values correctly.

    I'm interested :) I have a version of F32 that I'm getting ready to post that has a built in "interpreter" to do sequences of float operations at PASM speed without needing the Spin calling cog. However, it relies on the fact that all of the F32 commands are standardized, and don't have extra functionality outside of the PASM.
Sign In or Register to comment.