Hmmm, I plotted the errors for sine, and the results are weird and localized, as in it's most likely a programming bug on my part. I will look into this. I should have the same accuracy as the original version. And, of course, the cos and tan code use the same algorithm, so hopefully if I fix one I'll fix them all.
I just updated the attachment in the first post. I had a bug in my sine table lookup/interpolation code, which only happened in 2 of the 4 quadrants. Given that error, I just went through all the functions I modified, and checked the values of my code and the original code against Excel's authoritative answer key. Results:
Same error, but faster: sin, cos, tan, sqrt, mul, div, log, exp
Less error: ATan2, ATan, ASin, ACos
There were no code changes to the other functions (that I remember). So, basically for the arc* routines, the CORDIC routine is both faster and more accurate than the 5th order polynomial approximation (and the needed sqrt(1-x*2) conversions pre and post).
New version uploaded. I changed the name to F32 (where the brevity is _supposed_ to imply smaller code). The CORDIC routine is faster for all the arc* stuff (used a longer table but shortened the inner loop so it's faster). The table interpolation code is a bit faster, so all the log/pow/trig functions should be a bit faster, and Tan is faster in that I reused a portion of the processing results from _Sin when computing _Cos.
I did implement a full CORDIC version of Sin/Cos/Tan, and it did deliver an order of magnitude accuracy improvement, but the speed was ~3x slower (also slowed down the arc* stuff, as I had to share the CORDIC code to get it to fit in-cog), so I pulled that code back out. If somebody wants a high-precision F32, I can make that an option, but would require more testing, and I would want to convert the Pow/Log stuff to CORDIC as well. Any *need* for this? I'm not really inclined to go much farther on it.
Finally, I have not implemented the user-defined function mechanism. Float32Full has it, and I was wondering if anybody needs and/or wants it. It seems a reasonably useful thing to have, and I'm inclined to use the remaining 26 longs in-cog to yield this functionality...any votes? (Note that the exact mechanism will probably have to change a bit.)
Thanks, all. Sorry, John...copy-n-paste laziness has struck again. [8^) the FTrunc & FRound versions return a whole number as a signed integer. FloatTrunc and FloatRound return the whole number in floating-point format. Comments have now been updated...sorry again.
Heater, right now the only thing you are missing is the "unsigned int __fixunssfsi (float a)" functionality...do you need this? (You can get most of the way there with FTrunc followed by a PASM min statement, but of course you lose that top bit of your range.)
OK, I just uploaded a new version with an added function: UintTrunc. It returns an unsigned int from the floating point input. (I'm still not sure how Zog will handle this, as I thought all Zog variables were treated as signed ints, but that's probably much easier for GCC to workaround than the semi-hideous cast code you showed me [8^).
There are only 4 longs free in the cog right now...some more can be saved by re-slowing some of the functions, but unless more functionality needs to be crammed it there, we might be at the release point (after some serious testing).
Speaking of release, I have a question (especially if any mods are watching this thread): What is the best way to publish this? I could stick it in the OBEX, but it might be nice to bundle it with the Propeller Tool (and maybe replace the Float32/Float32All code once it get's approved...any way to reach of Cam?).
I'd start by posting it to the OBEX. That way it'd be available in a formal fashion in the same place as Float32.
You might use Float32's documentation as a model to document your version, copy what you can, change what needs to be changed, and leave out what doesn't apply.
You are right, the ZPU is tuned for signed 32 bit ints.
I have noticed that zpu-gcc generates some pretty bloaty code when using unsigned ints. For example it will insert long winded routines for unsigned mod and div operators. I have not really explored this fully.
This is something we just have to live with. It could be tackled by adding yet another COG to handle all those GCC generated routines instead. Or they could be handled by LMM PASM by the Zog GOG. I'm really not into doing either of those.
So it is possible I have put you to the trouble of creating UIntTrunc for nothing, that is to say the benefit it gives to Zog is overshadowed by all the other bloat that using unsigned its entails. My apologies if so.
I've just been testing the C soft-float function __fixunssfsi which is using the cmdUintTrunc command from F32.
Now as far as I can tell it works as per the GCC soft float documentation "... convert a to an unsigned integer, rounding toward zero. Negative values all become zero. "
But I found that the using the zpu-gcc softfloat version cause my test to fail. More strangely calling __fixunssfsi on my Debian Linux box also fails!!!
Here is a cut down version of my test which fails on my PC:
#include <stdio.h>
void passif(char* test, int status)
{
printf("%s ", test);
if (status)
printf ("OK\n");
else
printf ("FAIL\n");
}
int main (int c, char* argv[])
{
float t;
unsigned int u;
// Test __fixunssfsi
t = 147.6;
u = __fixunssfsi(t);
passif ("__fixunssfsi (147.6)", u == 147);
printf("%u\n", u);
t = -147.6;
u = __fixunssfsi(t);
passif ("__fixunssfsi (-147.6)", u == 0);
printf("%u\n", u);
return(0);
}
Looking at some assembler produced when compiling with -msoft-float I can see that __fixunssfsi is indeed what GCC inserts to assign a float to an unsigned it.
Not yet. I've been stuck for ages tracking down a total Zog failure which now looks like it's down to the zpu-gcc optimizer screwing up on -Os (optimise for size). Hope I can start to move forward again now.
I did implement a full CORDIC version of Sin/Cos/Tan, and it did deliver an order of magnitude accuracy improvement, but the speed was ~3x slower (also slowed down the arc* stuff, as I had to share the CORDIC code to get it to fit in-cog), so I pulled that code back out. If somebody wants a high-precision F32, I can make that an option, but would require more testing, and I would want to convert the Pow/Log stuff to CORDIC as well. Any *need* for this? I'm not really inclined to go much farther on it.
Jonathan
I'd really love a CORDIC log function. I'm using F32 with high precision thermistor measurements and I'm linearising with the Stienhart-Hart equation. I've got at least 16 stable bits in my measurements so getting the full ~23-bits of precision of a 32-bit float is important to keep rounding errors small.
Also, I found a bug in Log. It's highly localized to numbers between 4095 and 4096 but the error is large. (input is almost 2^12... some issue with log-table wrap around?)
Since Lonesock seems to be missing from the internet, I went ahead and fixed the LOG function.
As I suspected, the problem was with the table interpolation code. It did not correctly deal with the second table address exceeding the bounds of the table. The fix was made more complex because the LOG and ALOG tables are 2048 entries saturating at $1_0000, while the SINE table is 2049 entries saturating at $FFFF. The fix uses 3 instructions and 2 masks using up all the free space in the F32 cog.
The attached .zip file contains my test program that I used to verify that LOG was fixed, without messing up the accuracy of SIN.
The attached F32.spin contains my fixes. (along with a version update note, and comments on my fixes)
Is there any way to get the first post of this thread and OBEX entries for F32.spin updated with these fixes?
I've been working on some inverse kinematic code for my hexapod and kept getting a glitch across certain servo angles. I've tracked it down to what looks like a bug in F32.
Here's the output from a little test program I wrote.
In an attempt to give better test numbers, I changed the test program to use "1.0" as the "b" value in the ATan2 calls. By using "1.0" (essentially turning it into an ATAN call, it returned expected (correct) results.
It also appears to work correctly using "-1.0" for the second argument.
For now, I'll just just divide both arguments by the absolute value of the "b" term.
Edit: The patch doesn't check for a zero "b" term. This is not a good fix. I'll try to fix my patch.
See post #92 below for a better patch.
PUB ATan2(a, b)
{{
Arc Tangent of vector a, b (in radians, no division is performed, so b==0 is legal).
Parameters:
a 32-bit floating point value
b 32-bit floating point value
Returns: 32-bit floating point value (angle in radians)
}}
[B] result := b & $7FFF_FFFF ' absolute value
if b == result
b := 1.0
else
b := -1.0
a := FDiv(a, result)
[/B]result := cmdATan2
f32_Cmd := @result
repeat
while f32_Cmd
Testing to see if the absolute value of b equals b is 1632 clocks faster than just dividing b by the absolute value of b.
I hope Jonathan doesn't mind my posting a patched version of F32 here until it can be fixed properly.
I also have a modified version of F32 that allows it to be used from multiple objects and multiple cogs. In order to do so, I added a lock. I'm sure the lock slows things down a bit so I wouldn't suggest using my modified version unless you need it.
Edit: I haven't tested the multi cog version well enough. I suspect it has problems so I removed it.
Edit(12/17/12): I've also removed the modified F32 object from this post. A better modified version is attached to post #92.
It looks like one person has downloaded my supposed "fix". My patch doesn't check for a zero "b" term. I'll try to fix my patch soon, but be warned the current version doesn't handle all possible values correctly.
It looks like one person has downloaded my supposed "fix". My patch doesn't check for a zero "b" term. I'll try to fix my patch soon, but be warned the current version doesn't handle all possible values correctly.
I'm interested I have a version of F32 that I'm getting ready to post that has a built in "interpreter" to do sequences of float operations at PASM speed without needing the Spin calling cog. However, it relies on the fact that all of the F32 commands are standardized, and don't have extra functionality outside of the PASM.
Comments
Thanks so much for the testing!
Jonathan
Same error, but faster: sin, cos, tan, sqrt, mul, div, log, exp
Less error: ATan2, ATan, ASin, ACos
There were no code changes to the other functions (that I remember). So, basically for the arc* routines, the CORDIC routine is both faster and more accurate than the 5th order polynomial approximation (and the needed sqrt(1-x*2) conversions pre and post).
Thanks again for kicking over this issue, John!
Jonathan
I did implement a full CORDIC version of Sin/Cos/Tan, and it did deliver an order of magnitude accuracy improvement, but the speed was ~3x slower (also slowed down the arc* stuff, as I had to share the CORDIC code to get it to fit in-cog), so I pulled that code back out. If somebody wants a high-precision F32, I can make that an option, but would require more testing, and I would want to convert the Pow/Log stuff to CORDIC as well. Any *need* for this? I'm not really inclined to go much farther on it.
Finally, I have not implemented the user-defined function mechanism. Float32Full has it, and I was wondering if anybody needs and/or wants it. It seems a reasonably useful thing to have, and I'm inclined to use the remaining 26 longs in-cog to yield this functionality...any votes? (Note that the exact mechanism will probably have to change a bit.)
Jonathan
More amazing progress.
One thing I noticed missing is a rounding that returns a float as appears in the C maths library.
I can make one from cmdFTruncRound and cmdFFloat but it is slow.
Any chance you can convert F32.spin to normal ASCII so that simple text editors don't throw up?
Zog votes for speed over accuracy. If a C user wants the accuracy with Zog they can use the normal math library sin/cos/tan and take the speed hit.
No opinion on user defined functions. Does anybody use them?
Jonathan
John Abshier
Jonathan
One of them should return the integer result as an actual integer type, the other should return it in floating point format.
Not sure if we need them both.
The GCC soft-float calls are need are:
int __fixsfsi (float a) - convert a to a signed integer, rounding toward zero
unsigned int __fixunssfsi (float a) - convert a to an unsigned integer, rounding toward zero. Negative values all become zero.
The C standard math functions are:
float truncf(float x) - round x to the nearest integer not larger in absolute value.
float roundf(float x) - round x to the nearest integer, but round halfway cases away from zero
Jonathan
Possibly yes:
Using the ZPU soft float implementation of __fixunssfsi takes 116 zpu instructions which make 4 calls. So it's big and slow.
That top bit of range seems quite important.
Just now I have no way to get any PASM in the path. As I'm just using C code to read/write the memory interface LONGs.
Just for fun here is the ZPU softfloat __fixunssfsi:
There are only 4 longs free in the cog right now...some more can be saved by re-slowing some of the functions, but unless more functionality needs to be crammed it there, we might be at the release point (after some serious testing).
Speaking of release, I have a question (especially if any mods are watching this thread): What is the best way to publish this? I could stick it in the OBEX, but it might be nice to bundle it with the Propeller Tool (and maybe replace the Float32/Float32All code once it get's approved...any way to reach of Cam?).
thanks for your input,
Jonathan
You might use Float32's documentation as a model to document your version, copy what you can, change what needs to be changed, and leave out what doesn't apply.
You are right, the ZPU is tuned for signed 32 bit ints.
I have noticed that zpu-gcc generates some pretty bloaty code when using unsigned ints. For example it will insert long winded routines for unsigned mod and div operators. I have not really explored this fully.
This is something we just have to live with. It could be tackled by adding yet another COG to handle all those GCC generated routines instead. Or they could be handled by LMM PASM by the Zog GOG. I'm really not into doing either of those.
So it is possible I have put you to the trouble of creating UIntTrunc for nothing, that is to say the benefit it gives to Zog is overshadowed by all the other bloat that using unsigned its entails. My apologies if so.
Now back to integrating and testing...
And thanks, Mike. Once all the features are in, I'll polish up the documentation, call for final testing, then post it to the OBEX.
Jonathan
Now as far as I can tell it works as per the GCC soft float documentation "... convert a to an unsigned integer, rounding toward zero. Negative values all become zero. "
But I found that the using the zpu-gcc softfloat version cause my test to fail. More strangely calling __fixunssfsi on my Debian Linux box also fails!!!
Here is a cut down version of my test which fails on my PC:
and here is the result:
Looking at some assembler produced when compiling with -msoft-float I can see that __fixunssfsi is indeed what GCC inserts to assign a float to an unsigned it.
Anyone got any idea what goes on here?
Jonathan
Not yet. I've been stuck for ages tracking down a total Zog failure which now looks like it's down to the zpu-gcc optimizer screwing up on -Os (optimise for size). Hope I can start to move forward again now.
Jonathan
I'd really love a CORDIC log function. I'm using F32 with high precision thermistor measurements and I'm linearising with the Stienhart-Hart equation. I've got at least 16 stable bits in my measurements so getting the full ~23-bits of precision of a 32-bit float is important to keep rounding errors small.
Also, I found a bug in Log. It's highly localized to numbers between 4095 and 4096 but the error is large. (input is almost 2^12... some issue with log-table wrap around?)
Lawson
As I suspected, the problem was with the table interpolation code. It did not correctly deal with the second table address exceeding the bounds of the table. The fix was made more complex because the LOG and ALOG tables are 2048 entries saturating at $1_0000, while the SINE table is 2049 entries saturating at $FFFF. The fix uses 3 instructions and 2 masks using up all the free space in the F32 cog.
The attached .zip file contains my test program that I used to verify that LOG was fixed, without messing up the accuracy of SIN.
The attached F32.spin contains my fixes. (along with a version update note, and comments on my fixes)
Is there any way to get the first post of this thread and OBEX entries for F32.spin updated with these fixes?
Lawson
Jonathan
I use F32 a lot. I just replaced my earlier version with this new one.
@Jonathan, I hope all goes well on your son's birthday.
Here's the output from a little test program I wrote.
You can see how Float32Full has a nice gradual change in the angle from the function while F32 has a rather sharp jump.
Here's my test program:
I usually use version 1.5 posted in this thread. Version 1.3 from the OBEX gives the same results.
The "25" and "31" in the above test just happen to be the numbers I was using when I noticed this problem.
I'll attempt to understand what the ATan2 method is doing, but I'm afraid this fix if beyond me.
I really like how F32 only uses one cog and I'm hoping I don't have to resort to using the two cog Float32Full.
If Jonathan or someone else can fix this bug, I'd really appreciate it.
Thanks,
It also appears to work correctly using "-1.0" for the second argument.
For now, I'll just just divide both arguments by the absolute value of the "b" term.
Edit: The patch doesn't check for a zero "b" term. This is not a good fix. I'll try to fix my patch.
See post #92 below for a better patch.
Testing to see if the absolute value of b equals b is 1632 clocks faster than just dividing b by the absolute value of b.
I hope Jonathan doesn't mind my posting a patched version of F32 here until it can be fixed properly.
I also have a modified version of F32 that allows it to be used from multiple objects and multiple cogs. In order to do so, I added a lock. I'm sure the lock slows things down a bit so I wouldn't suggest using my modified version unless you need it.
Edit: I haven't tested the multi cog version well enough. I suspect it has problems so I removed it.
Edit(12/17/12): I've also removed the modified F32 object from this post. A better modified version is attached to post #92.
I'm interested I have a version of F32 that I'm getting ready to post that has a built in "interpreter" to do sequences of float operations at PASM speed without needing the Spin calling cog. However, it relies on the fact that all of the F32 commands are standardized, and don't have extra functionality outside of the PASM.