Hmmm, I plotted the errors for sine, and the results are weird and localized, as in it's most likely a programming bug on my part. I will look into this. I should have the same accuracy as the original version. And, of course, the cos and tan code use the same algorithm, so hopefully if I fix one I'll fix them all.
I just updated the attachment in the first post. I had a bug in my sine table lookup/interpolation code, which only happened in 2 of the 4 quadrants. Given that error, I just went through all the functions I modified, and checked the values of my code and the original code against Excel's authoritative answer key. Results:
Same error, but faster: sin, cos, tan, sqrt, mul, div, log, exp
Less error: ATan2, ATan, ASin, ACos
There were no code changes to the other functions (that I remember). So, basically for the arc* routines, the CORDIC routine is both faster and more accurate than the 5th order polynomial approximation (and the needed sqrt(1-x*2) conversions pre and post).
New version uploaded. I changed the name to F32 (where the brevity is _supposed_ to imply smaller code). The CORDIC routine is faster for all the arc* stuff (used a longer table but shortened the inner loop so it's faster). The table interpolation code is a bit faster, so all the log/pow/trig functions should be a bit faster, and Tan is faster in that I reused a portion of the processing results from _Sin when computing _Cos.
I did implement a full CORDIC version of Sin/Cos/Tan, and it did deliver an order of magnitude accuracy improvement, but the speed was ~3x slower (also slowed down the arc* stuff, as I had to share the CORDIC code to get it to fit in-cog), so I pulled that code back out. If somebody wants a high-precision F32, I can make that an option, but would require more testing, and I would want to convert the Pow/Log stuff to CORDIC as well. Any *need* for this? I'm not really inclined to go much farther on it.
Finally, I have not implemented the user-defined function mechanism. Float32Full has it, and I was wondering if anybody needs and/or wants it. It seems a reasonably useful thing to have, and I'm inclined to use the remaining 26 longs in-cog to yield this functionality...any votes? (Note that the exact mechanism will probably have to change a bit.)
Thanks, all. Sorry, John...copy-n-paste laziness has struck again. [8^) the FTrunc & FRound versions return a whole number as a signed integer. FloatTrunc and FloatRound return the whole number in floating-point format. Comments have now been updated...sorry again.
Heater, right now the only thing you are missing is the "unsigned int __fixunssfsi (float a)" functionality...do you need this? (You can get most of the way there with FTrunc followed by a PASM min statement, but of course you lose that top bit of your range.)
OK, I just uploaded a new version with an added function: UintTrunc. It returns an unsigned int from the floating point input. (I'm still not sure how Zog will handle this, as I thought all Zog variables were treated as signed ints, but that's probably much easier for GCC to workaround than the semi-hideous cast code you showed me [8^).
There are only 4 longs free in the cog right now...some more can be saved by re-slowing some of the functions, but unless more functionality needs to be crammed it there, we might be at the release point (after some serious testing).
Speaking of release, I have a question (especially if any mods are watching this thread): What is the best way to publish this? I could stick it in the OBEX, but it might be nice to bundle it with the Propeller Tool (and maybe replace the Float32/Float32All code once it get's approved...any way to reach of Cam?).
I'd start by posting it to the OBEX. That way it'd be available in a formal fashion in the same place as Float32.
You might use Float32's documentation as a model to document your version, copy what you can, change what needs to be changed, and leave out what doesn't apply.
You are right, the ZPU is tuned for signed 32 bit ints.
I have noticed that zpu-gcc generates some pretty bloaty code when using unsigned ints. For example it will insert long winded routines for unsigned mod and div operators. I have not really explored this fully.
This is something we just have to live with. It could be tackled by adding yet another COG to handle all those GCC generated routines instead. Or they could be handled by LMM PASM by the Zog GOG. I'm really not into doing either of those.
So it is possible I have put you to the trouble of creating UIntTrunc for nothing, that is to say the benefit it gives to Zog is overshadowed by all the other bloat that using unsigned its entails. My apologies if so.
I've just been testing the C soft-float function __fixunssfsi which is using the cmdUintTrunc command from F32.
Now as far as I can tell it works as per the GCC soft float documentation "... convert a to an unsigned integer, rounding toward zero. Negative values all become zero. "
But I found that the using the zpu-gcc softfloat version cause my test to fail. More strangely calling __fixunssfsi on my Debian Linux box also fails!!!
Here is a cut down version of my test which fails on my PC:
#include <stdio.h>
void passif(char* test, int status)
{
printf("%s ", test);
if (status)
printf ("OK\n");
else
printf ("FAIL\n");
}
int main (int c, char* argv[])
{
float t;
unsigned int u;
// Test __fixunssfsi
t = 147.6;
u = __fixunssfsi(t);
passif ("__fixunssfsi (147.6)", u == 147);
printf("%u\n", u);
t = -147.6;
u = __fixunssfsi(t);
passif ("__fixunssfsi (-147.6)", u == 0);
printf("%u\n", u);
return(0);
}
Looking at some assembler produced when compiling with -msoft-float I can see that __fixunssfsi is indeed what GCC inserts to assign a float to an unsigned it.
Not yet. I've been stuck for ages tracking down a total Zog failure which now looks like it's down to the zpu-gcc optimizer screwing up on -Os (optimise for size). Hope I can start to move forward again now.
I did implement a full CORDIC version of Sin/Cos/Tan, and it did deliver an order of magnitude accuracy improvement, but the speed was ~3x slower (also slowed down the arc* stuff, as I had to share the CORDIC code to get it to fit in-cog), so I pulled that code back out. If somebody wants a high-precision F32, I can make that an option, but would require more testing, and I would want to convert the Pow/Log stuff to CORDIC as well. Any *need* for this? I'm not really inclined to go much farther on it.
Jonathan
I'd really love a CORDIC log function. I'm using F32 with high precision thermistor measurements and I'm linearising with the Stienhart-Hart equation. I've got at least 16 stable bits in my measurements so getting the full ~23-bits of precision of a 32-bit float is important to keep rounding errors small.
Also, I found a bug in Log. It's highly localized to numbers between 4095 and 4096 but the error is large. (input is almost 2^12... some issue with log-table wrap around?)
Since Lonesock seems to be missing from the internet, I went ahead and fixed the LOG function.
As I suspected, the problem was with the table interpolation code. It did not correctly deal with the second table address exceeding the bounds of the table. The fix was made more complex because the LOG and ALOG tables are 2048 entries saturating at $1_0000, while the SINE table is 2049 entries saturating at $FFFF. The fix uses 3 instructions and 2 masks using up all the free space in the F32 cog.
The attached .zip file contains my test program that I used to verify that LOG was fixed, without messing up the accuracy of SIN.
The attached F32.spin contains my fixes. (along with a version update note, and comments on my fixes)
Is there any way to get the first post of this thread and OBEX entries for F32.spin updated with these fixes?
I've been working on some inverse kinematic code for my hexapod and kept getting a glitch across certain servo angles. I've tracked it down to what looks like a bug in F32.
Here's the output from a little test program I wrote.
In an attempt to give better test numbers, I changed the test program to use "1.0" as the "b" value in the ATan2 calls. By using "1.0" (essentially turning it into an ATAN call, it returned expected (correct) results.
It also appears to work correctly using "-1.0" for the second argument.
For now, I'll just just divide both arguments by the absolute value of the "b" term.
Edit: The patch doesn't check for a zero "b" term. This is not a good fix. I'll try to fix my patch.
See post #92 below for a better patch.
PUB ATan2(a, b)
{{
Arc Tangent of vector a, b (in radians, no division is performed, so b==0 is legal).
Parameters:
a 32-bit floating point value
b 32-bit floating point value
Returns: 32-bit floating point value (angle in radians)
}}
[B] result := b & $7FFF_FFFF ' absolute value
if b == result
b := 1.0
else
b := -1.0
a := FDiv(a, result)
[/B]result := cmdATan2
f32_Cmd := @result
repeat
while f32_Cmd
Testing to see if the absolute value of b equals b is 1632 clocks faster than just dividing b by the absolute value of b.
I hope Jonathan doesn't mind my posting a patched version of F32 here until it can be fixed properly.
I also have a modified version of F32 that allows it to be used from multiple objects and multiple cogs. In order to do so, I added a lock. I'm sure the lock slows things down a bit so I wouldn't suggest using my modified version unless you need it.
Edit: I haven't tested the multi cog version well enough. I suspect it has problems so I removed it.
Edit(12/17/12): I've also removed the modified F32 object from this post. A better modified version is attached to post #92.
It looks like one person has downloaded my supposed "fix". My patch doesn't check for a zero "b" term. I'll try to fix my patch soon, but be warned the current version doesn't handle all possible values correctly.
It looks like one person has downloaded my supposed "fix". My patch doesn't check for a zero "b" term. I'll try to fix my patch soon, but be warned the current version doesn't handle all possible values correctly.
I'm interested I have a version of F32 that I'm getting ready to post that has a built in "interpreter" to do sequences of float operations at PASM speed without needing the Spin calling cog. However, it relies on the fact that all of the F32 commands are standardized, and don't have extra functionality outside of the PASM.
Comments
Thanks so much for the testing!
Jonathan
Same error, but faster: sin, cos, tan, sqrt, mul, div, log, exp
Less error: ATan2, ATan, ASin, ACos
There were no code changes to the other functions (that I remember). So, basically for the arc* routines, the CORDIC routine is both faster and more accurate than the 5th order polynomial approximation (and the needed sqrt(1-x*2) conversions pre and post).
Thanks again for kicking over this issue, John!
Jonathan
I did implement a full CORDIC version of Sin/Cos/Tan, and it did deliver an order of magnitude accuracy improvement, but the speed was ~3x slower (also slowed down the arc* stuff, as I had to share the CORDIC code to get it to fit in-cog), so I pulled that code back out. If somebody wants a high-precision F32, I can make that an option, but would require more testing, and I would want to convert the Pow/Log stuff to CORDIC as well. Any *need* for this? I'm not really inclined to go much farther on it.
Finally, I have not implemented the user-defined function mechanism. Float32Full has it, and I was wondering if anybody needs and/or wants it. It seems a reasonably useful thing to have, and I'm inclined to use the remaining 26 longs in-cog to yield this functionality...any votes? (Note that the exact mechanism will probably have to change a bit.)
Jonathan
More amazing progress.
One thing I noticed missing is a rounding that returns a float as appears in the C maths library.
I can make one from cmdFTruncRound and cmdFFloat but it is slow.
Any chance you can convert F32.spin to normal ASCII so that simple text editors don't throw up?
Zog votes for speed over accuracy. If a C user wants the accuracy with Zog they can use the normal math library sin/cos/tan and take the speed hit.
No opinion on user defined functions. Does anybody use them?
Jonathan
John Abshier
Jonathan
One of them should return the integer result as an actual integer type, the other should return it in floating point format.
Not sure if we need them both.
The GCC soft-float calls are need are:
int __fixsfsi (float a) - convert a to a signed integer, rounding toward zero
unsigned int __fixunssfsi (float a) - convert a to an unsigned integer, rounding toward zero. Negative values all become zero.
The C standard math functions are:
float truncf(float x) - round x to the nearest integer not larger in absolute value.
float roundf(float x) - round x to the nearest integer, but round halfway cases away from zero
Jonathan
Possibly yes:
Using the ZPU soft float implementation of __fixunssfsi takes 116 zpu instructions which make 4 calls. So it's big and slow.
That top bit of range seems quite important.
Just now I have no way to get any PASM in the path. As I'm just using C code to read/write the memory interface LONGs.
Just for fun here is the ZPU softfloat __fixunssfsi:
__fixunssfsi(): 193d: 8c im 12 193e: 08 load 193f: 02 pushsp 1940: 8c im 12 1941: 0c store 1942: fd im -3 1943: 3d pushspadd 1944: 0d popsp 1945: 81 im 1 1946: f2 im -14 1947: 0a flip 1948: 0b nop 1949: 83 im 3 194a: 3d pushspadd 194b: 0c store 194c: 8c im 12 194d: 08 load 194e: 88 im 8 194f: 05 add 1950: 08 load 1951: 02 pushsp 1952: 84 im 4 1953: 05 add 1954: 0c store 1955: e0 im -32 1956: e2 im -30 1957: 3f callpcrel 1958: 80 im 0 1959: 08 load 195a: 53 storesp 12 195b: 72 loadsp 8 195c: 80 im 0 195d: 25 lessthanorequal 195e: 83 im 3 195f: 38 neqbranch 1960: ae im 46 1961: 39 poppcrel 00001962 <.L3>: .L3(): 1962: 81 im 1 1963: f3 im -13 1964: 0a flip 1965: 0b nop 1966: 83 im 3 1967: 3d pushspadd 1968: 0c store 1969: 8c im 12 196a: 08 load 196b: 88 im 8 196c: 05 add 196d: 08 load 196e: 02 pushsp 196f: 84 im 4 1970: 05 add 1971: 0c store 1972: db im -37 1973: d1 im -47 1974: 3f callpcrel 1975: 80 im 0 1976: 08 load 1977: 53 storesp 12 1978: 72 loadsp 8 1979: 02 pushsp 197a: 84 im 4 197b: 05 add 197c: 0c store 197d: dd im -35 197e: c5 im -59 197f: 3f callpcrel 1980: 80 im 0 1981: 08 load 1982: 81 im 1 1983: 0a flip 1984: 11 addsp 4 1985: 70 loadsp 0 1986: 8c im 12 1987: 08 load 1988: fc im -4 1989: 05 add 198a: 0c store 198b: 51 storesp 4 198c: 53 storesp 12 198d: 96 im 22 198e: 39 poppcrel 0000198f <.L2>: .L2(): 198f: 8c im 12 1990: 08 load 1991: 88 im 8 1992: 05 add 1993: 08 load 1994: 02 pushsp 1995: 84 im 4 1996: 05 add 1997: 0c store 1998: dd im -35 1999: aa im 42 199a: 3f callpcrel 199b: 80 im 0 199c: 08 load 199d: 70 loadsp 0 199e: 8c im 12 199f: 08 load 19a0: fc im -4 19a1: 05 add 19a2: 0c store 19a3: 53 storesp 12 000019a4 <.L1>: .L1(): 19a4: 8c im 12 19a5: 08 load 19a6: fc im -4 19a7: 05 add 19a8: 08 load 19a9: 80 im 0 19aa: 0c store 19ab: 85 im 5 19ac: 3d pushspadd 19ad: 0d popsp 19ae: 8c im 12 19af: 0c store 19b0: 04 poppcThere are only 4 longs free in the cog right now...some more can be saved by re-slowing some of the functions, but unless more functionality needs to be crammed it there, we might be at the release point (after some serious testing).
Speaking of release, I have a question (especially if any mods are watching this thread): What is the best way to publish this? I could stick it in the OBEX, but it might be nice to bundle it with the Propeller Tool (and maybe replace the Float32/Float32All code once it get's approved...any way to reach of Cam?).
thanks for your input,
Jonathan
You might use Float32's documentation as a model to document your version, copy what you can, change what needs to be changed, and leave out what doesn't apply.
You are right, the ZPU is tuned for signed 32 bit ints.
I have noticed that zpu-gcc generates some pretty bloaty code when using unsigned ints. For example it will insert long winded routines for unsigned mod and div operators. I have not really explored this fully.
This is something we just have to live with. It could be tackled by adding yet another COG to handle all those GCC generated routines instead. Or they could be handled by LMM PASM by the Zog GOG. I'm really not into doing either of those.
So it is possible I have put you to the trouble of creating UIntTrunc for nothing, that is to say the benefit it gives to Zog is overshadowed by all the other bloat that using unsigned its entails. My apologies if so.
Now back to integrating and testing...
And thanks, Mike. Once all the features are in, I'll polish up the documentation, call for final testing, then post it to the OBEX.
Jonathan
Now as far as I can tell it works as per the GCC soft float documentation "... convert a to an unsigned integer, rounding toward zero. Negative values all become zero. "
But I found that the using the zpu-gcc softfloat version cause my test to fail. More strangely calling __fixunssfsi on my Debian Linux box also fails!!!
Here is a cut down version of my test which fails on my PC:
#include <stdio.h> void passif(char* test, int status) { printf("%s ", test); if (status) printf ("OK\n"); else printf ("FAIL\n"); } int main (int c, char* argv[]) { float t; unsigned int u; // Test __fixunssfsi t = 147.6; u = __fixunssfsi(t); passif ("__fixunssfsi (147.6)", u == 147); printf("%u\n", u); t = -147.6; u = __fixunssfsi(t); passif ("__fixunssfsi (-147.6)", u == 0); printf("%u\n", u); return(0); }and here is the result:
Looking at some assembler produced when compiling with -msoft-float I can see that __fixunssfsi is indeed what GCC inserts to assign a float to an unsigned it.
Anyone got any idea what goes on here?
Jonathan
Not yet. I've been stuck for ages tracking down a total Zog failure which now looks like it's down to the zpu-gcc optimizer screwing up on -Os (optimise for size). Hope I can start to move forward again now.
Jonathan
I'd really love a CORDIC log function. I'm using F32 with high precision thermistor measurements and I'm linearising with the Stienhart-Hart equation. I've got at least 16 stable bits in my measurements so getting the full ~23-bits of precision of a 32-bit float is important to keep rounding errors small.
Also, I found a bug in Log. It's highly localized to numbers between 4095 and 4096 but the error is large. (input is almost 2^12... some issue with log-table wrap around?)
Lawson
As I suspected, the problem was with the table interpolation code. It did not correctly deal with the second table address exceeding the bounds of the table. The fix was made more complex because the LOG and ALOG tables are 2048 entries saturating at $1_0000, while the SINE table is 2049 entries saturating at $FFFF. The fix uses 3 instructions and 2 masks using up all the free space in the F32 cog.
The attached .zip file contains my test program that I used to verify that LOG was fixed, without messing up the accuracy of SIN.
The attached F32.spin contains my fixes. (along with a version update note, and comments on my fixes)
Is there any way to get the first post of this thread and OBEX entries for F32.spin updated with these fixes?
Lawson
Jonathan
I use F32 a lot. I just replaced my earlier version with this new one.
@Jonathan, I hope all goes well on your son's birthday.
Here's the output from a little test program I wrote.
You can see how Float32Full has a nice gradual change in the angle from the function while F32 has a rather sharp jump.
Here's my test program:
CON _CLKMODE = XTAL1 + PLL16X _XINFREQ = 5_000_000 OBJ F32 : "F32V1_3" FloatFull : "Float32Full" Pst : "Parallax Serial Terminal" FloatStr : "FloatString" PUB Setup | localIndex Pst.Start(115200) F32.start FloatFull.start FloatStr.SetPrecision(4) MainLoop PUB MainLoop | index1, index2 'Pst.str(string(13, "Start of MainLoop Method ")) index1 := 25.0 repeat index2 := 31.0 repeat 10 Pst.str(string(13, "Tan2(")) Pst.str(FloatStr.FloatToString(index2)) Pst.str(string(", ")) Pst.str(FloatStr.FloatToString(index1)) Pst.str(string(") | F32 = ")) result := F32.ATan2(index2, index1) Pst.str(FloatStr.FloatToString(result)) Pst.str(string(") | Float32Full = ")) result := FloatFull.ATan2(index2, index1) Pst.str(FloatStr.FloatToString(result)) index2 := F32.FAdd(index2, 0.1) index1 := F32.FSub(index1, 0.1) Pst.charinI usually use version 1.5 posted in this thread. Version 1.3 from the OBEX gives the same results.
The "25" and "31" in the above test just happen to be the numbers I was using when I noticed this problem.
I'll attempt to understand what the ATan2 method is doing, but I'm afraid this fix if beyond me.
I really like how F32 only uses one cog and I'm hoping I don't have to resort to using the two cog Float32Full.
If Jonathan or someone else can fix this bug, I'd really appreciate it.
Thanks,
It also appears to work correctly using "-1.0" for the second argument.
For now, I'll just just divide both arguments by the absolute value of the "b" term.
Edit: The patch doesn't check for a zero "b" term. This is not a good fix. I'll try to fix my patch.
See post #92 below for a better patch.
PUB ATan2(a, b) {{ Arc Tangent of vector a, b (in radians, no division is performed, so b==0 is legal). Parameters: a 32-bit floating point value b 32-bit floating point value Returns: 32-bit floating point value (angle in radians) }} [B] result := b & $7FFF_FFFF ' absolute value if b == result b := 1.0 else b := -1.0 a := FDiv(a, result) [/B]result := cmdATan2 f32_Cmd := @result repeat while f32_CmdTesting to see if the absolute value of b equals b is 1632 clocks faster than just dividing b by the absolute value of b.
I hope Jonathan doesn't mind my posting a patched version of F32 here until it can be fixed properly.
I also have a modified version of F32 that allows it to be used from multiple objects and multiple cogs. In order to do so, I added a lock. I'm sure the lock slows things down a bit so I wouldn't suggest using my modified version unless you need it.
Edit: I haven't tested the multi cog version well enough. I suspect it has problems so I removed it.
Edit(12/17/12): I've also removed the modified F32 object from this post. A better modified version is attached to post #92.
I'm interested