Shop OBEX P1 Docs P2 Docs Learn Events
G_FLOPS ... Possible with FPU units? — Parallax Forums

G_FLOPS ... Possible with FPU units?

TJHJTJHJ Posts: 243
edited 2012-03-12 12:14 in Propeller 1
Hi,

So I am cooking up this high math crunching application, and was wondering if anyone could make some suggestions on how many FLOPS (Floating point operations per second) can be obtained by a FPU unit or a few units, I trying to get around 2 Gflops, more would be better, but I am not sure I can move the data in and out.
I am looking to have the prop act as the center point passing the data around and making it into something useful.

Any suggestions on what FPU chips could be interfaced with the prop or get anywhere close to that kind of speed?

Thanks for everyones help.

Comments

  • KyeKye Posts: 2,200
    edited 2012-03-08 17:52
    Desktop multi core processor?
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-03-08 17:55
    Your best bet may be the GPUs in high-end video cards.

    -Phil
  • rod1963rod1963 Posts: 752
    edited 2012-03-08 18:47
    A good and cheap alternative is to use a PC with several Nvidia video cards meant for number crunching.

    http://www.nvidia.com/object/cuda_home_new.html
  • Heater.Heater. Posts: 21,230
    edited 2012-03-08 21:29
    Tjhj,
    No.
    Even if you had an infinitly fast FPU. Software in a Cog or external device you still only have a 20MIPs processor in the Prop. With even slower access to RAM or the outside world.
    No way to move data around fast enough to get 2GFLOPS.
  • darrenodarreno Posts: 16
    edited 2012-03-09 13:46
    Any microprocessor with a fast coprocessor could achieve 2GFLOPs average just not sustained. An example would be passing a large sample to the coprocessor for computing things like FFT and filters. You pass one set of data that's buffered on the FPU or DSP then perform multiple operations on it.

    A single GPU processor can hit nearly 4 TFLOPS. The internal bandwith of a high end GPU card is well over 200GB/s. Send the GPU a few megabytes of data along with an OpenCL kernel, then sit and wait for an answer.

    The answer comes down to what type of workload you have. The bcm2835 used in the Raspberry Pi is supposed to do 10+ GFLOPs, uses a couple watts, and costs under $20
  • HumanoidoHumanoido Posts: 5,770
    edited 2012-03-11 07:52
    Of course this is entirely possible with multiple props and don't forget about the possibilities of integrating along with the props. You may need to write some code to take the parallel results out and reassemble. The OBEX is rather stark when it comes to parallel apps. Integrating with GPU boards takes some time to learn the language and program the integrator.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-03-11 09:59
    Humanoido wrote:
    Of course this is entirely possible with multiple props ...

    2 GFLOPS would be absurd to attempt with an array of Propeller chips. Each cog can do about 20 kFLOPS (timed using fmult in Float32Full). So 2 GFLOPS would require 100,000 cogs, or more than fourteen thousand Propeller chips (each running Float32Full in seven cogs, plus one cog for communication).

    -Phil
  • the-new-guythe-new-guy Posts: 16
    edited 2012-03-11 11:06
    2 GFLOPS would be absurd to attempt with an array of Propeller chips.
    Humanoido has done so himself. Take a look at the "Big Brain" project.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-03-11 11:38
    Humanoido has done so himself. Take a look at the "Big Brain" project.
    LOL! :) It's always nice to see new forum members with a sense of humor!

    -Phil
  • HumanoidoHumanoido Posts: 5,770
    edited 2012-03-11 22:36
    Humanoido has done so himself. Take a look at the "Big Brain" project.
    You did your research well. In the project before the Big Brain, we achieved 64MFLOPS. The Big Brain is functioning in the TeraFLOPS range. For more information, you are welcome to email me and I will help you with your project.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-03-11 23:25
    @the-new-guy,

    Please don't be misled. No reasonable array of Propeller chips can achieve performance in the GFLOPS range, let alone TFLOPS. Even if you had enough Propeller chips to hit such a rate theoretically, the communication overhead involved in uploading parameters and downloading results would reduce the whole system to a crawl. When Humanoido refers to the "Big Brain", it encompasses more than just an array of Propeller chips. If he's getting TFLOPS performance, it's coming from a GPU, not the Propellers.

    -Phil
  • Heater.Heater. Posts: 21,230
    edited 2012-03-12 02:00
    Phil,
    So 2 GFLOPS would require 100,000 cogs, or more than fourteen thousand Propeller chips

    Hmm..Wonder how much that would weigh? What volume it would occupy, using demo boards say? How much power it would consume?

    All in all I think one would be better of linking your Propeller, via FullDuplexSerial, to one of these:

    800px-JaguarXT5.jpg


    which ticks over at a steady 1750 TeraFlops.
    800 x 593 - 82K
  • HumanoidoHumanoido Posts: 5,770
    edited 2012-03-12 02:59
    GPUs have good TFLOPs speed but not hard controlling ability. Parallel Propellers have massive controlling ability and excellent (unenhanced) M-FLOPS speed. It's the best of both worlds and a marriage made in Heaven. AMDs have dropped so much in price that it makes sense to mix and match machines. Ten to twenty bucks now buys one board with hundreds of core processors at the computer capital of the world. I have a Propeller machine I'd like to connect to the Jaguar XT-5 in the photo. Parallax has 16,000 QFN prop chips which could achieve 24,000 MFLOPS without the overhead and add 512,000 controlling ports based on the numbers established with the largest Propeller UltraSpark series machine. One could take 16,000 GPU boards for an excess of 16K TeraFLOPs putting it into the PetaFLOPS range approaching supercomputers for around $160K.
  • PublisonPublison Posts: 12,366
    edited 2012-03-12 11:16
    2 GFLOPS would be absurd to attempt with an array of Propeller chips. Each cog can do about 20 kFLOPS (timed using fmult in Float32Full). So 2 GFLOPS would require 100,000 cogs, or more than fourteen thousand Propeller chips (each running Float32Full in seven cogs, plus one cog for communication).

    -Phil

    Wouldn't Parallax LOVE to get an order for fourteen thousand Propeller to try on this project. :)

    Jim
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-03-12 12:14
    Publison wrote:
    Wouldn't Parallax LOVE to get an order for fourteen thousand Propeller to try on this project.
    Such a project would be a fool's errand, I'm afraid. I would hope they'd get the cash up front (or, more responsibly, decline the order).

    -Phil
Sign In or Register to comment.