Shop OBEX P1 Docs P2 Docs Learn Events
Math coprocessors? — Parallax Forums

Math coprocessors?

John A. ZoidbergJohn A. Zoidberg Posts: 514
edited 2010-10-21 15:46 in Propeller 1
Hey there,

Did anyone tried to put a DSPIC or any 16-bit microcontroller as a coprocessor for the Propeller, to have quicker floating point math operations? I'm wondering about interfacing the DSPIC through the SPI and to the Propeller. :)

Comments

  • Mike GreenMike Green Posts: 23,101
    edited 2010-10-20 19:08
    Floating point on the Propeller is pretty fast (~40us). It's hard to run an external floating coprocessor much faster, particularly over a serial link like I2C or SPI.
  • John A. ZoidbergJohn A. Zoidberg Posts: 514
    edited 2010-10-20 19:11
    I see. Well, is it also well and good if I dedicate all the floating point operations into another cog? :)
  • hinvhinv Posts: 1,255
    edited 2010-10-20 22:58
    I speak without experience, but I have read the documentation for Floating Point library. It covers a few difference scenarios, one of which is dedicating a cog. Check it out here: http://obex.parallax.com/objects/202/
    That's one of the cool things about the propeller. Somebody's already done most of the work for you. You just have to glue the stuff together with some spin.

    I hope this helps.

    Doug
  • max72max72 Posts: 1,155
    edited 2010-10-21 01:21
    You can check also
    http://forums.parallax.com/showthread.php?t=125498&highlight=float
    Or SpinLMM
    http://obex.parallax.com/objects/635/

    The first is an optimized float object, while SpinLMM has a fast embedded float routine as an example, with the basic functions. It works only under BST, but if you are at ease with Pasm you can try it and add functions. The advantage is it doesn't require additional cogs.

    Massimo
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-10-21 03:06
    Mike has a good point. It's surprising how fast floating point can work on the Propeller. If you're looking for speed, the manual is very detailed regarding assembler and SPIN code. You can pick and choose to optimize speed-driven statements.

    Using Floating Point in Assembler Code
    Assembler code provides significantly faster execution speed. The following shows a quick comparison for a floating point add:

    FloatMath FAdd (Spin) 371.0 usec
    Float32 FAdd (Spin) 39.0 usec
    _FAdd (Assembler) 4.7 usec

    Each statement has its own execution time.
  • AleAle Posts: 2,363
    edited 2010-10-21 05:31
    The problem with an external co-processor is the transfer of the arguments. Even 1 cycle execution of the fp operation at say 10 MHz means that you have to still transfer the arguments i.e. 64 bits (plus the instruction) 1 bit at a time for SPI/I2C. If you use a 5 MHz SPI you still have to wait 12.8 us to transfer those bits...

    It makes sense only if via assembler you do not get the speed you want and you need more precision or expensive instructions like trascendentals where a fast implementation will compensate for slower transfer. I'd recommend you use 8 bits transfers to recover some of that dead time.
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2010-10-21 05:49
    Just thinking about this other thread about I2C peripheral micros or PPCs as I call them I have been advocating a small 32-bit micro, the LPC111x. When you think about it the chip does a 32-bit integer multiply in 20ns so floating point should be quite fast when compared to the Prop. The only limiting factor as Mike pointed out is the link speed.

    I have been using the 32-bit at a time serial transfer that runs at the clock frequency so a full 32-bits can be transferred in under 2us. However the UART on the LPC111x only runs up to 3.125Mbps. Hmmmm, but there is the enhanced SPI which is a fully functional Synchronous Serial Port capable of 16-bit transfers at 25Mbits. Using a modified serial transfer format from the Prop we can essentially communicate synchronously between the two chips. So assuming that we can offload a lot of FP ops to this chip and we can transfer back and forth quickly enough then this may produce results many times faster than with the cog doing the FP.

    This is just a little bit of musing, but it's food for thought and you know that this is only a $1 chip and more can be had.
  • lonesocklonesock Posts: 917
    edited 2010-10-21 09:58
    So in the F32 code linked in Massimo's post, the floating point multiply takes about 1000 clocks (~12.5us on a 80MHz system) *if* called from assembler. The portion of the code that actually performs the multiplication takes only 40% of that, and the rest is the boring but necessary unpacking of the input parameters, and the subsequent packing of the result. So, even having a single cycle integer multiply would only speed up the routine by 40%.

    Calling any F32 commands from Spin has quite a bit more overhead. Of course, in the extremely unlikely case that you only need one cog for the control code, you could run 7 instances of the F32 code, so your average multiply time would be under 2us [8^)

    Jonathan
  • Tracy AllenTracy Allen Posts: 6,666
    edited 2010-10-21 14:47
    I'm thinking of the microMega µFPU 3.1 coprocessor that Parallax sells. (It was designed by Cam Thomson, who wrote the pasm floating point library for the Prop.) As a coprocessor it has a lot of extra nice features, including its own toolset for and serial i/o with NMEA parsing. The speed to transfer variables is less acute when the FPU has to perform a complex chain of calculations involving few variables and many parameters and functions.
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2010-10-21 15:41
    Was that 1,000 clocks or 1,000 instructions? Remember that a cog runs at 1/4 of the clock so each instruction takes 50ns which is 2.5 times slower than even the $1 ARM's cycle time of 20ns.

    lonesock wrote: »
    So in the F32 code linked in Massimo's post, the floating point multiply takes about 1000 clocks (~12.5us on a 80MHz system) *if* called from assembler. The portion of the code that actually performs the multiplication takes only 40% of that, and the rest is the boring but necessary unpacking of the input parameters, and the subsequent packing of the result. So, even having a single cycle integer multiply would only speed up the routine by 40%.

    Calling any F32 commands from Spin has quite a bit more overhead. Of course, in the extremely unlikely case that you only need one cog for the control code, you could run 7 instances of the F32 code, so your average multiply time would be under 2us [8^)

    Jonathan
  • lonesocklonesock Posts: 917
    edited 2010-10-21 15:46
    Clocks. Yep, got it, just providing a reference point on the F32 code.

    Jonathan
Sign In or Register to comment.