Usefulness of Floating Point Math Co-Processors

Prophead100 · 2012-08-16 15:01

I have some floating point heavy math code on the Prop that uses a lot of trigonometry functions such as tan,atan, sin, asin, cos, acos and some basic multiplication/division and addition/substraction. It works OK with the current OBEX objects but I would like to substantially speed up some 32 bit math in SPIN and perhaps upgrade to 64 bit math in SPIN or GCC if possible so I can squeeze in more performance.

Has anyone had experience using the uM-FPU 32 or uM-FPU64 coprocessors to speed up the calculations? Do these co-processors when used with the Prop makeup for the overhead of communicating with an external chip?

Duane C. Johnson · 2012-08-16 15:14

Hi Prophead100;

How much precision do you require?

And what is the application?

Duane J

Duane Degn · 2012-08-16 15:40

I've used the uM-FPU32 chip a bit. I haven't done side by side speed comparisons, but my gut tells me using the F32 object would be faster than using the uM-FPU32 chip.

I was pleasently surprised I was able to compute the IK positions of all 18 servos at 50Hz for my hexapod using F32.

If you need 64 bit math, then I think the uM-FPU64 chip would be worth using.

I think cessnapilot wrote objects for both the FPU chips.

Prophead100 · 2012-08-16 16:03

The application is a mobile implementation of a tracking solar panel using my SPIN Solar Object which is 32 bits right now. It runs fast enough to calculate and provide coordinates to control a two axis solar panel or heliostat using a low precision (+/- 0.5 degree) formula but the floating point math is using a couple cogs and has a noticable (e.g. some observable fraction of a second) execution time. With faster 32 bit floating point math and fewer cogs I may have space to add MPPT power control (also floating point) and some closed-loop sensors for refined tracking (and power management options) or perhaps other functions run by the same board. I could also add more code to refine the heliostat formula.

I am also working on using a higher precision formula (+/- 0.003 degree) for a concentrating solar collector or solar observatory which requires 64 bit math. With help from Jazzed and Heater, some C code from NREL ran well on the Prop using GCC but took a couple hundred seconds with a basic configuration which would be too slower than the speed needed to track the sun in real time at that details (72 ms).

jmg · 2012-08-16 16:13

With most maths routines, you can trade off speed with precision, so you may find a 40 bit real, for example, adds precision, but not at such an impact on speed as 64 bit would.

You could also use a Fixed-real approach, where a slower task does the precision calcs, and a faster one just interpolates between what are very short segments.

Tracy Allen · 2012-08-16 16:25

If each calculation involves pouring variables into the hopper and turning the crank on a complicated equation in many terms, well, you can program the whole thing into the µFPU, using its nice IDE. The communication reduces to transferring the data and retrieving the result(s), between which the µFPU is quite fast with the calculation itself. There are extras, such as hookup for a GPS and automatic parsing into variables, which might help with your mobile app. Also it has analog to digital converters for possible use with the sensors.

Mark_T · 2012-08-16 16:29

Surely fraction of a second latency is enough for tracking the sun in the sky? Or is this on the move?

When I looked at the specs of these co-processors I wasn't impressed - I think they only served to reduce the ROM footprint on the microcontroller! Certainly not in the class of a 387 I believe. If they didn't have a gazillion pins an ARM FP vector co-processor would be handy (single cycle for single-precision on all but divide/sqroot, 2-cycle for double-precision...)

The uM-FPU V3.1 does have 2 analog inputs with 12 bit A/D converter I note...

rod1963 · 2012-08-16 17:02

Too bad someone didn't adapt the STM32F4 as a math-coprocessor. It has DSP and floating point hardware. Use the Prop as a front-end and the F4 as the back end compute engine.

SRLM · 2012-08-16 18:34

I'm able to get over 2KHz on this block of floating point math operations (with a standard Propeller setup) with my version of F32:

t_5 = 2 * offset
const_2_pi = 2 * pi

c = (K_Q * diameter) / K_T
t_1 = M_z / (4*c)
t_2 = M_y / t_5
t_3 = M_x / t_5
t_4 = F_z / 4

F_1 = (t_4 + (t_1 - t_2)) #> 0
F_2 = (t_4 - (t_1 + t_3)) #> 0
F_3 = (t_4 + (t_1 + t_2)) #> 0
F_4 = (t_4 + (t_3 - t_1)) #> 0


t_1 = const_2_pi / (diameter * diameter)
t_2 = rho * K_T

omega_d_1 = t_1 * ((F_1 / t_2) sqrt 0)
omega_d_2 = t_1 * ((F_2 / t_2) sqrt 0)
omega_d_3 = t_1 * ((F_3 / t_2) sqrt 0)
omega_d_4 = t_1 * ((F_4 / t_2) sqrt 0)

n_d_1 = omega_d_1 / const_2_pi
n_d_2 = omega_d_2 / const_2_pi
n_d_3 = omega_d_3 / const_2_pi
n_d_4 = omega_d_4 / const_2_pi

Maximum Time: 0.417ms
Minimum Time: 0.377ms
Average Time: 0.414ms

Note that, all of the operations are done. IE, there is no optimization of "2 * pi": this is included on the calculations.

This uses one cog to compute without any oversight, and is 32 bit. It supports most of the functionality of F32, but not all:

"*":"Mul", "/":"Div", "+":"Add", "-":"Sub", "sqrt":"Sqr", \
"#>":"LimitMin", "<#":"LimitMax", "arc_t2":"ATan2", "arc_c":"ACos", \
"arc_s":"ASin", "sin":"Sin", "cos":"Cos", "tan":"Tan", "~":"PID", "||":"TruncRound", \
"ffloat":"Float"

If needed, you could substitute the routines that you do need (exp, ...) for ones in the list above that you do not need.

Anyway, I was planning on posting the source code for this on the forums sometime. If it looks useful to you then I can do it this weekend.

SRLM · 2012-08-16 18:35

Could you post the math routine that you are doing, or a link to the formulas?

Prophead100 · 2012-08-18 16:26

Thanks for all the input

Tracy Allen wrote: »

If each calculation involves pouring variables into the hopper and turning the crank on a complicated equation in many terms, well, you can program the whole thing into the µFPU, using its nice IDE. The communication reduces to transferring the data and retrieving the result(s), between which the µFPU is quite fast with the calculation itself. There are extras, such as hookup for a GPS and automatic parsing into variables, which might help with your mobile app. Also it has analog to digital converters for possible use with the sensors.

The ability to program a set of functions into the uFPU may the most effective way short of rewrite a lot of code to optimize it. The GPS parsing would definitely simplify the time and mobile location part of the equation. Perhaps, I could program it to simply output the solar variables as something that goes between the GPS and the Prop for the higher resolution/speed applications where the cost of the chip would be worth it. My only apprehension is learning another programming dialect. For routine lower precision solar tracking the Prop alone could do it at a lower cost. Thanks

SLRM - The code I am trying to optimize is the solar object ( http://obex.parallax.com/objects/807/ ) in SPIN and some NREL code in C++ (Post 15 of this thread --> http://forums.parallax.com/showthread.php?141011-Newbie-C-Syntax-Question).

Mark T. - The latency issue comes in when (a) the higher resolution tracking is needed for a high ratio solar concentrator or solar telescope, (b) where more time is needed in the code for other operations such as process control of the mechanics or (c) mobile as you suggest.

Usefulness of Floating Point Math Co-Processors

Comments