Propeller supercomputer hardware questions

Dr. Mario · 2010-09-12 17:59

First of all, I am especially interested in having my own supercomputer.

And, secondly, I know it's not an easy task: Having to design your own supercomputing CPU framework (such as Common Bus Architecture, Hypercube, and several others - on a side note, AMD Radeon HD4670 and newer have something along the line of Hypercube, something still to be thankful if you can't afford the parts and is easily intimidated by having to build the arrays correctly).

Thirdly, I want to know if the main seed computer (on base of the array) is needed to spoon-feed those Propeller chips to get them to toil until something useful is spat out. I am considering using a PowerPC SoC from Freescale - to get the data from the PC via Ethernet (Fast Ethernet or Gigabit Ethernet) to be fed to the Propeller I / II boards then have them process it. (And yes, Propeller will be on the same board as the PowerPC processor, mostly for compiling the codes into something the other identical chips will understand.)

Lastly, anyone who have no experience in using PowerPC processor; VIA Cyrix Nano (Isaiah) or Intel Atom boards can be tried.

I just want any idea before I would go out on that - I am going to use hybrid structure - Common Bus Architecture on larger scale (the whole system, seed computer included), Hypercube on smaller scale (a board). I know the risk, though (320 - 1,024 Propeller will really stress the ATX computer power supply out - don't think about doing vampire tap on 3.3 Volts power cable - it will draw in 20 - 40+ Amps, not a good thing for the ATX PSU - instead I will be going for the DC-DC converter and tap on 5 - 12 Volts lines and draw lesser currents in return for 3.3 Volts regulated power - oh yeah, I will heatsink the boards too! They get real toasty when doing serious Pi calculations, and/or do Conway's game of life.)

But what about the video? Seed computer will be there for few reasons -one is to provide AMD Radeon HD 4670 video card's service to all of those small guys. (AMD Radeon HD 4000 are cheap now, so why not?)

Humanoido · 2010-09-13 01:16

Hi Dr. Mario, yes you could interface the Power PC as a front end. In my current UltraSpark 40, I have used the Front End approach, with an EEE PC in one version running VR versions of both the screen and keyboard instances (along with use of other resources available), and other prop board configs in others - you may want to use a dedicated prop chip to handle the overhead to include video. You can also bring everything on chip, and run a self booting language, which I did in another experiment. When you get up into high numbers of props, the wiring of a hypercomputer for example design takes up a huge number of dimensional vertice interface wires and calculating project assembly wiring time becomes important for one person before executing the design in hardware. There are numerous ways to handle this and it really depends on the features you want a Propeller supercomputer to have. I think you can divide up the project into many sections such as research, how many chips will be used, bus/interface design, power supply design, software design, etc. Depending on your adeptness with PASM and Spin, learning the programming language may take up a considerable portion of the coded design. Parallax has a lot of support so I would stick with the Propeller chip. However, IMO, with the 4670 it could get stuck very quickly with no out.

Humanoido

Heater. · 2010-09-13 01:55

I admire what you guys are up to but I would hesitate to call the result a "super computer".

The main point about the term "super computer" is, rightly or wrongly, the machines floating point performance.

So let's see what we have here and do some back of the envelope calculations:

I refer you to this page http://forums.parallax.com/showthread.php?t=124168&page=5 where RossH shows some results for the Whetsone floating point benchmark on the Prop when programming in C with Catalina.

As far as I can understand (subject to review as always) if you use 2 Cogs for floating point "coprocessor" support along with 1 COG to run the main program you are going to get about 0.04 MFLOPS (Mega Floating Point Operations Per Second).

Given that you will need perhaps 4 COGs per Propeller to perform communications between Propellers in the cluster that's it. 0.04MFLOPS per Propeller.

So Humanoid's tower of 40 Props could deliver 40 * 0.04 = 1.6MFlops

And Dr Mario's high end cluster perhaps 1000 * 0.04 = 40MFlops

What do these figures compare to?

Here are some results for the same Whetstone benchmark run on various old PC's:

* PC System  CPU/Options               Cache   MHz   MWIPS   MFLOPS  MIPS
 *
 * P3 Clone   AM80386DX with 387        128K    40    5.68    0.820   7.40
 * P4 Escom   80486DX2 CIS chipset      128K    66    16.4    3.79    29.3
 * P5 Escom   Pentium Neptune chipset   256K   100    66.3    16.7    96.9
 * P6 Dell    PentiumPro 440FX PCIset   256K   200    161     41.5    315 
 *
 * P3 Clone   AM80386DX with 387        128K    40    3.07    0.613   4.20
 * P4 Escom   80486DX2 CIS chipset      128K    66    10.0    2.75    16.4
 * P5 Escom   Pentium Neptune chipset   256K   100    28.5    9.26    36.6
 * P6 Dell    PentiumPro 440FX PCIset   256K   200    81.7    26.9    129

Here we see Humanoido's Towering Inferno is about as fast as a 386/486 era PC.

Dr Mario's cluster is somewhat faster than a Pentium Pro.

I venture to suggest that my Android mobile phone could turn in a better result than one thousand Propellers when it comes to floating point work.

Now don't let me discourage you, perhaps "super computer" is not the main point of the exercise anyway. Perhaps you will prove me wrong. This is Propeller Land after all, where impossible things have been done many times.

Have fun.

Edit: On the the hand...

The Humanoido's "Towering Inferno" is up there with IBM's "Stretch" machine from 1961 as used at AEC-Los Alamos National Laboratory

Dr Mario's creation would be up with the CDC7from 1969 as used at the Lawrence Livermore National Laboratory.

As I'm looking at the Props floating point support just now I will do some of my own measurements. Luckily Lonesock has been optimising the Props float routines.

Dr. Mario · 2010-09-13 02:58

I am only interested in doing away with AI. It's what I am thinking of - floating point is the last thing on my list - if the floating point is REALLY needed, the individual board's thread processor (cell controller on Hypercube - also talk to seed computer via 1x PCIe), which will be either A8/A9 based TI OMAP or the smaller PowerPC, will "donate" the FPU times to them.

And, about the HDD - I am thinking which to use, PATA or SATA IO on laptop hard drive, regarding the uncertainty of mass throughputs between boards and master CPU.

Dr. Mario · 2010-09-13 03:04

And BTW, each board shall include 8 - 16 Propeller 1 / II chips on the super-duper RS board measurement of the computer-drawn board - 5 x 7-8 inches board. With everything SMD.

And, with Propeller II, the dedicated PSU design is bound to get real hairy... At least, I can always try few options, like SMPS controller with MCU core inside (not a simple, friendly DIP8 chip, mind you...)

Dr. Mario · 2010-09-13 03:46

For those curious peoples, the processor to be interfaced with Propellers:

Seed computer: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC8641D&nodeId=0162468rH3bTdG8653

Board host processor: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC533&webpageId=11533145765637172706C1&nodeId=0162468rH3bTdG06C18648&fromPage=tax
http://focus.ti.com/docs/prod/folders/print/omap3530.html
http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC565&webpageId=11533145765637172706C1&nodeId=0162468rH3bTdG06C18648&fromPage=tax

And for IO:
PCIe to PCI converter: http://focus.ti.com/docs/prod/folders/print/xio2001.html

Board processors have to be chosen at future date, though. And, don't be shocked if the MPC8641D is extremely expensive - I have been meaning to use it for seed computer on my machine, as it's efficient in controlling up to 4,096 Propeller II chips, anyways. PPC are hard to program, but are always predictable, so I am going to use it, and if you're not convinced that I shall use the CPU that powerful, I have a 400MHz G2-based internal host SoC, yet to be soldered, somewhere in my box.

Heater. · 2010-09-13 04:58

Dr Mario,

...floating point is the last thing on my list...

Fair enough. Let's think about integer performance then.

That MPC8641D has two cores and seems to be rated as 1.5GHz each, 2.3MIPS/MHz

If we trust the spec sheet here: http://www.freescale.com/webapp/sps/site/overview.jsp?code=DRPPCDUALCORE

That's 1500MHz * 2.3 * 2 = 6900 MIPS !!!

A Prop has a theoretical to whack of 20MIPS * 8 COGs = 160MIPs

So one MPC8641D is equivalent of 6900 / 160 Prop, about 43.

It is my contention that the Props MIPs is realistically 4 to 10 times slower. This is surely true if your program is substantial enough to not fit in COG or it's data set is big enough to reside in HUB. Further you will need to dedicate half the COGs to communications between Props

This gets us to one MPC8641D being equivalent to 344 to 860 Propellers. This is before we even start thinking about communication bottlenecks.

Would it not be easier and cheaper to string a few MPC8641D boards together, especially as they have a 2.5Gb/s per lane fabric interface?

Humanoido · 2010-09-13 05:05

Dr. Mario, don't give up the idea of having your own supercomputer. There are many ideas on the Forum, both positive and negative. By looking at both sides of the coin, we can see the world in a more focused light.

Heater, you have benchmarked the UltraSpark 40 with negative conditions that do not exist on the actual machine.

The UltraSpark 40 is entirely an integer-based machine which is being used not for floating point, but for the design and verification of parallel algorithms. It's equally suitable for Artificial Intelligence as Dr. Mario pointed out. This is definitely an area that many of us look forward to developing.

I quote an approximation made by Bill Henning in the UltraSpark 40 thread, which is 64MFLOPS. The machine speed is 6,400 MIPS with the 5 MHz crystal and overclocks with a 6.25 at 8,000 MIPs.

Is it super computing? When we take one standard prop chip at 160/32 and multiply its power to get 8,000 MIPs with 1,280 ports, it becomes a super computing experience for me.

Bill Henning wrote: Floating point is implemented as PASM code on the prop, and is on the order of 20x-100x slower than integer code. Actually even integer multiplication and division has to be implemented as pasm code Still, 6.4BIPS ~= 64MFLOPS - still quire respectable (look up early super computers such as a VAX as proof)

Humanoido

Leon · 2010-09-13 05:39

Can you develop an application that even runs at 160 or 200 MIPS on that system? Something like the Sieve of Eratosthenes for calculating prime numbers would be suitable. It parallelizes quite well, I think.

Heater. · 2010-09-13 05:41

Humanoido,

...you have benchmarked the UltraSpark 40 with negative conditions that do not exist on the actual machine.

Perhaps so, it would not be the first time I grabbed hold of the wrong end of the stick. Also my apologies for mis-naming the UltaSpark 40.

The UltraSpark 40 is entirely an integer-based machine which is being used not for floating point,

Understood. How is my analysis in my last post incorrect?

The machine speed is 6,400 MIPS

I have to disagree with Bill here. In theory perhaps, 20MIPs times 8 COGS times 40 Propellers is indeed 6400 MIPS.

However that is for a PASM program executing entirely in COG with no significant data and no input output.

There are those of the opinion that PASM is effectively the micro-code of the Propeller, looked at that way one does not measure the MIPs of a CPU by the rate at which micro-code steps are performed.

It's also overlooking the not trivial detail that to get the benefit of parallel processing you generally need good communications between nodes. For which I arbitrarily allocate 4 COGs.

Perhaps you and I should come up with a little benchmark program or two that is representative of what you would have running on each COG, and actually measure the effective MIPS.

Is it super computing?

Compared to the list of top performing computers here: http://en.wikipedia.org/wiki/Supercomputer then yes at some point in history.

Compared to the speed of this PC I'm now writing this with no.

Humanoido · 2010-09-13 05:58

Heater applied a worst case scenario of floating point on a non-floating point machine. We will apply the best case scenario of a non-floating point approximation using MIPS and Heater's supplied chart and show the quantity of various computers needed to equal one UltraSpark 40.

attachment.php?attachmentid=73196&stc=1&d=1284382307

attachment.php?attachmentid=73196&stc=1&d=1284382307

Heater. · 2010-09-13 06:13

Interesting,

I still maintain the usable MIPS of 40 Propellers is more like:

4 COGs times
2 MIPS times
40 Propellers, equals

320 MIPS

That's about two and a half Pentium Pros from 1997.

Which is of course more than 10 times slower than the normal PC of today I'm writing this on or my mobile phone.

Edit: Again I say, we should come up with some representative algorithm that can actually measure what you want to do.
Could be as simple as:

1) Input two arrays of one thousand, say, signed integers.
2) Add up the two arrays-
3) Output the resulting one thousand element array.

On the Prop we do it in PASM.
On the PC we do it in C.

Sapieha · 2010-09-13 06:14

Heater. wrote: »

Compared to the list of top performing computers here: http://en.wikipedia.org/wiki/Supercomputer then yes at some point in history.

Compared to the speed of this PC I'm now writing this with no.

Hi Heather.

As I in most cases will have same meaning WHAT is Supercomputer.

I still think and in some cases That need very many sensors that all will answer to HOST if possible in Real time (Interrupt driven systems Type PC are not suitable to that).
BUT with Some Propeller Arrays it can be possible to monitor them all at same time and lets them decide if it is need to inform HOST if it need intervention.
And THAT Propeller array CAN be called SuperComputer

Humanoido · 2010-09-13 06:34

Leon wrote: »

Can you develop an application that even runs at 160 or 200 MIPS on that system? Something like the Sieve of Eratosthenes for calculating prime numbers would be suitable. It parallelizes quite well, I think.

Hi Leon, thanks for the suggestion to run the Sieve of Eratosthenes for calculating prime numbers. To answer your question, I don't know. The question is not about running at the full quoted speed (speed quoted from Parallax Propeller specs) as the purpose of the UltraSpark 40 is to simply develop and test parallel algorithms. It already has 40 times the speed of a single Propeller chip and that's all I need.

Humanoido

Heater. · 2010-09-13 06:48

Sapieha,

WHAT is Supercomputer.

Yep, no point to argue over semantics. It's a moving target anyway.

...Some Propeller Arrays it can be possible to monitor them all [sensors] at same time and lets them decide if it is need to inform HOST if it need intervention.
And THAT Propeller array CAN be called SuperComputer

Yes, yes. You hit the nail on the head.

When Humanoid first announced the UltraSpark 40 one of my first question was "what would be the best use of a machine with 1280 general purpose IOs?

pharseid · 2010-09-13 17:15

First of all, let me say that I pretty much agree with everything Heater said. Historically, I take a little exception with the statement about floating point performance being the measure of a supercomputer: a number of acknowledged supercomputers (NASA's original MPP, the Connection Machine, the DAP) had no floating point hardware. But that's quibbling. The problem with the Prop is that the ratio of its I/O speed to its instruction execution rate is fairly low for something being considered in a massively parallel scheme. It's not a design flaw, it's just something the Prop wasn't designed for.

The comparisons Heater made with PC's certainly isn't the worst case for the Prop. If you have an integer algorithm, look at something like the multicore, VLIW integer chips from TI, you're talking about 10's of billions of instructions/sec on one chip. I don't think I'm qualified to analyze the system Dr. Mario proposed, if I've got this right he's talking about 4 different processor types and I have no idea at all how that's supposed to fit together.

But if you're primary interest isn't maximizing MFLOPS/MIPS per dollar but to look at parallel programming from a software standpoint, that's a different thing. I'm sure Leon can address this better than I can, but from my (tiny) experience, using languages based on Communicating Sequential Processes is a lot of work. Using something like Linda (which is fairly long in the tooth by software standards now) which allows you to write programs without considering what node a particular task is going to run on or having to think about load balancing strikes me as a better approach. If individuals build machines that use ad hoc programming methods, I don't think interest will ever grow beyond threads like this. If someone implements a language that is fun to program on (Spin Linda or ProbBasic Linda maybe) you can get a cult following and more efficient machines will follow.

-phar

pharseid · 2010-09-13 17:40

Oh, to add at least one answer to Heater's question about what to do with all those I/O's. You hook up an array of ultrasonic transducers and recievers to the I/O's. Each chip calculates the proper delay to pulse its transducer after recieving the Master GO! signal. At some point, the Master Chip sends that pulse and the slave chips wait for the appropriate interval, then pulse their transducer and listen for the return waveforms. Then you do all that complicated math and you've got phased array sonar! I've never heard of anybody doing that, but it seems very doable.

I'm thinking you paste all these ultrasonic gizmos to a shirt and pipe the output to a pair of VR glasses and you have Bat Vision! I'm also thinking I should trademark the phrase "Bat Vision".

-phar

Heater. · 2010-09-13 21:41

pharseid:

...phased array sonar!

and

Bat Vision

Perfect, just the kind of idea I was fishing for for an array of Props like Humanoido's.

I make a biological analogy:

Controlling PC = "Brain", because of it's sheer power and memory.
Propeller Array = "Spinal cord", lots of inputs/outputs, initial processing of sensor data, capable of some autonomous actions.
Sensors = well, the senses.

An ultrasonic phased array might be an ideal example.

pharseid · 2010-09-14 03:37

Boy, it IS hard to think of uses for all those I/O's. Besides Bat Vision, I'm coming up blank.

On the subject of the phased sonar array, I would think it would be easier for a robot to construct a model of its surroundings by that method than using computer vision, because the phase array data would contain depth information. Or if used in conjuction with a vision system, it could be used to determine the distance to objects identified in the visual image.

-phar

Leon · 2010-09-14 03:50

There are image processing techniques using two or more cameras that give position data for objects; I remember a lecture I attended at Leeds University by someone from MIT who was using that technique, many years ago. It was very impressive.

Heater. · 2010-09-14 03:59

pharseid and Leon,

You guys have just hit on something I have been idly pondering for years.

In the AI/robotics world there has been much said about 3D computer vision/recognition for recognizing objects or navigation or whatever.

But I've always had this suspicion that we have a 3D model in our heads whether we can see or not.

Consider: If I put my hand in my jacket pocket and feel around I can recognize that I have a telephone in there, some car keys, a cigarette lighter, bits of string and old bus tickets etc etc.

I can "see" and recognize all of that without seeing them at all. The 3D model is totally independent of vision.

No idea where that all leads, just a thought.

Leon · 2010-09-14 04:22

There's a relatively new area of research called haptic visualization, with simulated objects being felt instead of being viewed on a display. The visual and haptic systems are closely associated in the brain, of course.

Heater. · 2010-09-14 04:40

Haptic visualization seems to be the opposite of what I have in mind. There they have some model of objects in the machine and want to provide the correct "feel" for the user.

Rather than have the machine "feel" objects and build an internal 3D model from them.

Leon · 2010-09-14 05:01

Heater, pass the parcel!!!!

Heater. · 2010-09-14 05:04

Leon, you are sneaky:)

Humanoido · 2010-09-14 06:03

This has come up several times in other threads, so I would like to address the topic here as related to the UltraSpark 40 and what can be accomplished with a few Propeller chips. There's really no shortage of ideas for creating apps from a massive number of ports on the UltraSpark 40 or other machines with many ports.

Humanoido

TEN IDEAS FOR MASSIVE NUMBERS OF I/O PORTS
IMPLEMENTED WITH THE ULTRASPARK 40

http://forums.parallax.com/showthread.php?p=938983&posted=1#post938983

Humanoido · 2010-09-14 06:06

Heater. wrote: »

Haptic visualization seems to be the opposite of what I have in mind. There they have some model of objects in the machine and want to provide the correct "feel" for the user. Rather than have the machine "feel" objects and build an internal 3D model from them.

Hi Heater, you have vision systems that are trying to match an image of a known object stored in memory to one in the real world. Why would a tactile sense be any different? Humanoido

Dr. Mario · 2010-09-14 12:39

Yeah, good point, you guys.

Although I am looking at using Propeller for AI clusters - it's a hard comparison toward either AMD Radeon HD 5770 VPU and Propeller II chips: AMD Radeon HD 5770 got 1,600 streaming DSP cores, while the cluster of 4,096 Propeller II will need to get up to the same horsepower: It can spit out 3.75 Teraflops easily.

Heater. · 2010-09-14 13:09

Humanoido,

you have vision systems that are trying to match an image of a known object stored in memory to one in the real world. Why would a tactile sense be any different?

I don't think I meant "different" but a tactile sense augments the concept of 3D in our minds and so it could in machines.

Whenever I have read about 3D object or spatial recognition for computers it is always from the idea of "it has a camera , or two, it has the image data, how do we get it to "understand" the 3Dness from that imagery?" with no other references to work from.

OK. This is going to get long winded and possibly gross but imagine this:

Take a human baby. Allow it to grow up in an environment where everything it sees is far enough away that 3D vision does not work so well. Arrange that it can never reach out into that environment and touch the objects there. It can never crawl into that space. It only ever sees one side of anything, mothers face straight on but never from the side or back.

I propose that this poor child may never get the "concept" of a 3D world into its head. No matter what it can see, it will never get the idea of space and distance because it has never mechanically experienced space and distance. Building a 3D model from the flat world we have allowed it to experience is impossible.

Why should it be possible for computers then?

Where this waffley argument gets us I don't know.

pharseid · 2010-09-14 13:37

Heater, I can give you a personal example of the 3d model we form in our minds. Years ago my grandfather lived on a lake and a guy who had property close to his lost a motor in the lake. My grandfather suggested he ask my father to dive for it. When my father got the request, he decided to make a family outing of it and after warning us boys that it would be EXTREMELY unlikely the motor would be anywhere near the spot we were searching (he had been diving for decades at that point) off we went. The water in this lake was very dirty, below two feet or so there was no observable light at all. I had just come up from a fruitless search when my father told me that the son of the guy who had lost the motor had just kicked his glasses off the pontoon boat we were using. So I went down into the blackness and started feeling around for his glasses. After a while I noticed a weird thing, I knew where I was by the pebbles and rocks I was feeling. I would use landmark rocks to slowly extend my search and I finally found his glasses after a half hour or so. I suspect that the sort of sensory deprevation in these conditions made this process easier, after over 30 years I've never had a similar experience.

-phar

Heater. · 2010-09-14 13:54

pharseid,

Good story.

That's my point really, its the physical motion, the experience by feeling things, that creates the 3D model of the world in our minds. It's the means by which we actually measure distances and how big things are. This all gets correlated with what we see at some point, so much so that the 2D images we have on our retinas become 3D.

That blind mole in his labyrinth of underground tunnels knows exactly where he is.

Propeller supercomputer hardware questions

Comments