During my digital filter tests I had an idea how to speed up floating point operations. The actual floating point operations take only ~80 clocks. Most of the time (~700 clocks) is spent for the function calls, passing parameters and unpacking and re-packing the IEEE754 format. So I wrote Add, Mul and Div operations for unpacked floating point numbers. I use one longword for the mantissa and another for the exponent and flags (sign + overflow). The basic operations are indeed very fast but as I have to pass twice as many registers and loading them from hub memory stalls the FIFO most of the advantage is lost if the code executed from hub ram. One operation takes ~300 cycles which is still a factor >2 improvement. But I fear all the overhead is not worth it, not to mention the ugly looking code of complex function calls vs. plain assignments and expressions in C.
To really make use of the full speed it would be necessary to load the code to LUT ram and call it from COG ram.
My ultimate dream was that this could be used together with operator methods
. That would allow to make any extension to the built-in math completely transparent so that fast floating point numbers or even vector and matrix operations could be used as if they were built in to the compiler. But unfortunatelly, from the current point of view this is theoretically possible but too complicated to justify the required work.