I'm in the throes of designing a standalone FFT library/object for the P2 using 32bit cordic operations.
It strikes me it would be good to get some input about what people might want from such a thing.
Currently I have a core routine that expects 32bit+32bit fixpoint real or complex input values,
and generates the forward FFT in-place from this. Optional post-processing to calculate magnitude+phase
or magnitude squared (actually twice magnitude squared which is what is needed for spectral measurement)
It has been tested with N upto 2^15 (above that and P2 runs out of hub ram!)
What I'm thinking of doing is extending the input options so that the input vector need not be shared
with the work vector, can be just real values (4 bytes per element rather than 8 for complex), or 16 bit
real values. The input fixed-point representation would be specifiable, as would the working vector
fixed point (probably limited between fix2.30 and fix16.16).
output options could include logarithmic (dB) as well as raw, magnitude/phase and mag squared.
the option to divide the output by N (implemented by shifting right one on each round) is already their,
there needs to be an inverse operation of course, with option to synthesize negative frequency components
from the non-negative automatically.
The execution model is a cog that waits for command to be written to its command address, which would
need a locking convention for multi-threaded use, but it could just be fired up and killed after a single use too.
Its what I'm used to on the P1, but is that the best approach?
My use for this is spectrum analysis, but I can see others wanting a workhorse for implementing filtering
or fast convolution/fast correlation. My use case is repeatedly throwing data at it and averaging the
results, with double buffering so data acquisition doesn't have to stall for it.
My current performance figures are, for a 4096 point FFT, 37.1ms (about 110k SPS). The inner core
is a butterfly computation routine using 4 cordic multiplies and 1 cordic rotate in parallel.
I have to praise the design of the cordic unit, it does everything needed for signal processing code,
I think I've used every operation now.