Shop OBEX P1 Docs P2 Docs Learn Events
Can sloppy math processors speed things up... a lot? — Parallax Forums

Can sloppy math processors speed things up... a lot?

ElectricAyeElectricAye Posts: 4,561
edited 2011-01-06 13:06 in General Discussion
I guess it works well enough for the brain, so why not computers...

http://www.physorg.com/news/2011-01-sloppy-arithmetic.html

Comments

  • Heater.Heater. Posts: 21,230
    edited 2011-01-05 09:30
    Why not indeed?

    In my Fourier Transform for the Prop I used fixed point arithmetic such that 0.0 to 1.0 is represented as 0 to 4095. So all calculations are done to only done to 13 bits accuracy.
    That means the multiplies can go faster because they have fewer bits to shift and add.

    Given this not so good precision it still works out that if you input a single frequency all the output frequency bins are exactly zero except the correct one. I suspect that one could reduce the precision even further, say 10 or 8 bits and the result would just be a little noise in the output. This almost doubles the speed of the multiplies and the little noise might be quite acceptable in many applications. I also suspect that the Fourier Transform algorithm has a tendency to average out noise so that the results don't accumulate errors.

    As an other example. If you are controlling something with a PID loop, say an actuator position, there is no point in using arithmetic of extreme precision if your input measurement data is only accurate to 8 or 10 bits anyway.

    On the other hand, I'm sure there are other algorithms that iterate over the data a lot and accumulate error as they go. The results might be rendered useless. For example what happens if you put a one percent error into every multiply of a Mandlebrot set calculation. I'm have a feeling that the resulting image would change dramatically.

    As it happens, I'm now a days working with systems using fuzzy logic. No need for 128 bit floating point ops there.
  • Bobb FwedBobb Fwed Posts: 1,119
    edited 2011-01-05 14:24
    Prop 3 anyone? Some sloppy single-cycle division would be nice. Or single-cog parallel single-cycle sloppy multiplication (not that I could see how that possibly would work ... Chip and Beau will figure it out)!
  • Martin_HMartin_H Posts: 4,051
    edited 2011-01-05 18:33
    Fast approximations are very useful. Here's a well known one for a fast inverse square root:

    http://ilab.usc.edu/wiki/index.php/Fast_Square_Root#Method_using_The_Quake_3_Walsh_Method
  • Heater.Heater. Posts: 21,230
    edited 2011-01-05 22:12
    One might put ones finger in the air and say that the number of transistors required for a processor goes up with the square of the data width. (Chip designers here please comment on that statement).

    So instead of having a single 64 bit core we could have had four 32 bit cores. That would give four time the performance on day to day PC work. I could argue that the race to 64 bits of precision has held up performance gains we could have had with only 32 bits.
  • kwinnkwinn Posts: 8,697
    edited 2011-01-06 10:25
    Heater. wrote: »
    One might put ones finger in the air and say that the number of transistors required for a processor goes up with the square of the data width. (Chip designers here please comment on that statement).

    So instead of having a single 64 bit core we could have had four 32 bit cores. That would give four time the performance on day to day PC work. I could argue that the race to 64 bits of precision has held up performance gains we could have had with only 32 bits.

    The ratio is less than the square of the data width, but a bit more than linear. Doubling the data width requires doubling memory, register, and gating circuitry sizes, and more than doubling gates that perform arithmetic and logic functions.
  • pharseidpharseid Posts: 192
    edited 2011-01-06 13:06
    I would also think it would depend on the exact execution hardware the processor contains. Flash multipliers are big to start with and they get bigger at slightly worse than the square of the data width. For adders, you could get away with a linear ratio, but it would get progressively slower without additional carry look-ahead logic. I would think the control logic wouldn't grow much at all.

    phar
Sign In or Register to comment.