G_FLOPS ... Possible with FPU units?

TJHJ · 2012-03-08 17:49

Hi,

So I am cooking up this high math crunching application, and was wondering if anyone could make some suggestions on how many FLOPS (Floating point operations per second) can be obtained by a FPU unit or a few units, I trying to get around 2 Gflops, more would be better, but I am not sure I can move the data in and out.
I am looking to have the prop act as the center point passing the data around and making it into something useful.

Any suggestions on what FPU chips could be interfaced with the prop or get anywhere close to that kind of speed?

Thanks for everyones help.

Kye · 2012-03-08 17:52

Desktop multi core processor?

Phil Pilgrim (PhiPi) · 2012-03-08 17:55

Your best bet may be the GPUs in high-end video cards.

-Phil

rod1963 · 2012-03-08 18:47

A good and cheap alternative is to use a PC with several Nvidia video cards meant for number crunching.

http://www.nvidia.com/object/cuda_home_new.html

Heater. · 2012-03-08 21:29

Tjhj,
No.
Even if you had an infinitly fast FPU. Software in a Cog or external device you still only have a 20MIPs processor in the Prop. With even slower access to RAM or the outside world.
No way to move data around fast enough to get 2GFLOPS.

darreno · 2012-03-09 13:46

Any microprocessor with a fast coprocessor could achieve 2GFLOPs average just not sustained. An example would be passing a large sample to the coprocessor for computing things like FFT and filters. You pass one set of data that's buffered on the FPU or DSP then perform multiple operations on it.

A single GPU processor can hit nearly 4 TFLOPS. The internal bandwith of a high end GPU card is well over 200GB/s. Send the GPU a few megabytes of data along with an OpenCL kernel, then sit and wait for an answer.

The answer comes down to what type of workload you have. The bcm2835 used in the Raspberry Pi is supposed to do 10+ GFLOPs, uses a couple watts, and costs under $20

Humanoido · 2012-03-11 07:52

Of course this is entirely possible with multiple props and don't forget about the possibilities of integrating along with the props. You may need to write some code to take the parallel results out and reassemble. The OBEX is rather stark when it comes to parallel apps. Integrating with GPU boards takes some time to learn the language and program the integrator.

Phil Pilgrim (PhiPi) · 2012-03-11 09:59

Humanoido wrote:

Of course this is entirely possible with multiple props ...

2 GFLOPS would be absurd to attempt with an array of Propeller chips. Each cog can do about 20 kFLOPS (timed using fmult in Float32Full). So 2 GFLOPS would require 100,000 cogs, or more than fourteen thousand Propeller chips (each running Float32Full in seven cogs, plus one cog for communication).

-Phil

the-new-guy · 2012-03-11 11:06

Phil Pilgrim (PhiPi) wrote: »

2 GFLOPS would be absurd to attempt with an array of Propeller chips.

Humanoido has done so himself. Take a look at the "Big Brain" project.

Phil Pilgrim (PhiPi) · 2012-03-11 11:38

the-new-guy wrote:

Humanoido has done so himself. Take a look at the "Big Brain" project.

LOL!

It's always nice to see new forum members with a sense of humor!

-Phil

Humanoido · 2012-03-11 22:36

the-new-guy wrote: »

Humanoido has done so himself. Take a look at the "Big Brain" project.

You did your research well. In the project before the Big Brain, we achieved 64MFLOPS. The Big Brain is functioning in the TeraFLOPS range. For more information, you are welcome to email me and I will help you with your project.

Phil Pilgrim (PhiPi) · 2012-03-11 23:25

@the-new-guy,

Please don't be misled. No reasonable array of Propeller chips can achieve performance in the GFLOPS range, let alone TFLOPS. Even if you had enough Propeller chips to hit such a rate theoretically, the communication overhead involved in uploading parameters and downloading results would reduce the whole system to a crawl. When Humanoido refers to the "Big Brain", it encompasses more than just an array of Propeller chips. If he's getting TFLOPS performance, it's coming from a GPU, not the Propellers.

-Phil

Heater. · 2012-03-12 02:00

Phil,

So 2 GFLOPS would require 100,000 cogs, or more than fourteen thousand Propeller chips

Hmm..Wonder how much that would weigh? What volume it would occupy, using demo boards say? How much power it would consume?

All in all I think one would be better of linking your Propeller, via FullDuplexSerial, to one of these:

which ticks over at a steady 1750 TeraFlops.

Humanoido · 2012-03-12 02:59

GPUs have good TFLOPs speed but not hard controlling ability. Parallel Propellers have massive controlling ability and excellent (unenhanced) M-FLOPS speed. It's the best of both worlds and a marriage made in Heaven. AMDs have dropped so much in price that it makes sense to mix and match machines. Ten to twenty bucks now buys one board with hundreds of core processors at the computer capital of the world. I have a Propeller machine I'd like to connect to the Jaguar XT-5 in the photo. Parallax has 16,000 QFN prop chips which could achieve 24,000 MFLOPS without the overhead and add 512,000 controlling ports based on the numbers established with the largest Propeller UltraSpark series machine. One could take 16,000 GPU boards for an excess of 16K TeraFLOPs putting it into the PetaFLOPS range approaching supercomputers for around $160K.

Publison · 2012-03-12 11:16

Phil Pilgrim (PhiPi) wrote: »

2 GFLOPS would be absurd to attempt with an array of Propeller chips. Each cog can do about 20 kFLOPS (timed using fmult in Float32Full). So 2 GFLOPS would require 100,000 cogs, or more than fourteen thousand Propeller chips (each running Float32Full in seven cogs, plus one cog for communication).

-Phil

Wouldn't Parallax LOVE to get an order for fourteen thousand Propeller to try on this project.

Jim

Phil Pilgrim (PhiPi) · 2012-03-12 12:14

Publison wrote:

Wouldn't Parallax LOVE to get an order for fourteen thousand Propeller to try on this project.

Such a project would be a fool's errand, I'm afraid. I would hope they'd get the cash up front (or, more responsibly, decline the order).

-Phil

G_FLOPS ... Possible with FPU units?

Comments