Math coprocessors?
John A. Zoidberg
Posts: 514
Hey there,
Did anyone tried to put a DSPIC or any 16-bit microcontroller as a coprocessor for the Propeller, to have quicker floating point math operations? I'm wondering about interfacing the DSPIC through the SPI and to the Propeller.
Did anyone tried to put a DSPIC or any 16-bit microcontroller as a coprocessor for the Propeller, to have quicker floating point math operations? I'm wondering about interfacing the DSPIC through the SPI and to the Propeller.
Comments
That's one of the cool things about the propeller. Somebody's already done most of the work for you. You just have to glue the stuff together with some spin.
I hope this helps.
Doug
http://forums.parallax.com/showthread.php?t=125498&highlight=float
Or SpinLMM
http://obex.parallax.com/objects/635/
The first is an optimized float object, while SpinLMM has a fast embedded float routine as an example, with the basic functions. It works only under BST, but if you are at ease with Pasm you can try it and add functions. The advantage is it doesn't require additional cogs.
Massimo
Using Floating Point in Assembler Code
Assembler code provides significantly faster execution speed. The following shows a quick comparison for a floating point add:
FloatMath FAdd (Spin) 371.0 usec
Float32 FAdd (Spin) 39.0 usec
_FAdd (Assembler) 4.7 usec
Each statement has its own execution time.
It makes sense only if via assembler you do not get the speed you want and you need more precision or expensive instructions like trascendentals where a fast implementation will compensate for slower transfer. I'd recommend you use 8 bits transfers to recover some of that dead time.
I have been using the 32-bit at a time serial transfer that runs at the clock frequency so a full 32-bits can be transferred in under 2us. However the UART on the LPC111x only runs up to 3.125Mbps. Hmmmm, but there is the enhanced SPI which is a fully functional Synchronous Serial Port capable of 16-bit transfers at 25Mbits. Using a modified serial transfer format from the Prop we can essentially communicate synchronously between the two chips. So assuming that we can offload a lot of FP ops to this chip and we can transfer back and forth quickly enough then this may produce results many times faster than with the cog doing the FP.
This is just a little bit of musing, but it's food for thought and you know that this is only a $1 chip and more can be had.
Calling any F32 commands from Spin has quite a bit more overhead. Of course, in the extremely unlikely case that you only need one cog for the control code, you could run 7 instances of the F32 code, so your average multiply time would be under 2us [8^)
Jonathan
Jonathan