Propeller supercomputer hardware questions
Dr. Mario
Posts: 331
First of all, I am especially interested in having my own supercomputer.
And, secondly, I know it's not an easy task: Having to design your own supercomputing CPU framework (such as Common Bus Architecture, Hypercube, and several others - on a side note, AMD Radeon HD4670 and newer have something along the line of Hypercube, something still to be thankful if you can't afford the parts and is easily intimidated by having to build the arrays correctly).
Thirdly, I want to know if the main seed computer (on base of the array) is needed to spoon-feed those Propeller chips to get them to toil until something useful is spat out. I am considering using a PowerPC SoC from Freescale - to get the data from the PC via Ethernet (Fast Ethernet or Gigabit Ethernet) to be fed to the Propeller I / II boards then have them process it. (And yes, Propeller will be on the same board as the PowerPC processor, mostly for compiling the codes into something the other identical chips will understand.)
Lastly, anyone who have no experience in using PowerPC processor; VIA Cyrix Nano (Isaiah) or Intel Atom boards can be tried.
I just want any idea before I would go out on that - I am going to use hybrid structure - Common Bus Architecture on larger scale (the whole system, seed computer included), Hypercube on smaller scale (a board). I know the risk, though (320 - 1,024 Propeller will really stress the ATX computer power supply out - don't think about doing vampire tap on 3.3 Volts power cable - it will draw in 20 - 40+ Amps, not a good thing for the ATX PSU - instead I will be going for the DC-DC converter and tap on 5 - 12 Volts lines and draw lesser currents in return for 3.3 Volts regulated power - oh yeah, I will heatsink the boards too! They get real toasty when doing serious Pi calculations, and/or do Conway's game of life.)
But what about the video? Seed computer will be there for few reasons -one is to provide AMD Radeon HD 4670 video card's service to all of those small guys. (AMD Radeon HD 4000 are cheap now, so why not?)
And, secondly, I know it's not an easy task: Having to design your own supercomputing CPU framework (such as Common Bus Architecture, Hypercube, and several others - on a side note, AMD Radeon HD4670 and newer have something along the line of Hypercube, something still to be thankful if you can't afford the parts and is easily intimidated by having to build the arrays correctly).
Thirdly, I want to know if the main seed computer (on base of the array) is needed to spoon-feed those Propeller chips to get them to toil until something useful is spat out. I am considering using a PowerPC SoC from Freescale - to get the data from the PC via Ethernet (Fast Ethernet or Gigabit Ethernet) to be fed to the Propeller I / II boards then have them process it. (And yes, Propeller will be on the same board as the PowerPC processor, mostly for compiling the codes into something the other identical chips will understand.)
Lastly, anyone who have no experience in using PowerPC processor; VIA Cyrix Nano (Isaiah) or Intel Atom boards can be tried.
I just want any idea before I would go out on that - I am going to use hybrid structure - Common Bus Architecture on larger scale (the whole system, seed computer included), Hypercube on smaller scale (a board). I know the risk, though (320 - 1,024 Propeller will really stress the ATX computer power supply out - don't think about doing vampire tap on 3.3 Volts power cable - it will draw in 20 - 40+ Amps, not a good thing for the ATX PSU - instead I will be going for the DC-DC converter and tap on 5 - 12 Volts lines and draw lesser currents in return for 3.3 Volts regulated power - oh yeah, I will heatsink the boards too! They get real toasty when doing serious Pi calculations, and/or do Conway's game of life.)
But what about the video? Seed computer will be there for few reasons -one is to provide AMD Radeon HD 4670 video card's service to all of those small guys. (AMD Radeon HD 4000 are cheap now, so why not?)
Comments
Humanoido
The main point about the term "super computer" is, rightly or wrongly, the machines floating point performance.
So let's see what we have here and do some back of the envelope calculations:
I refer you to this page http://forums.parallax.com/showthread.php?t=124168&page=5 where RossH shows some results for the Whetsone floating point benchmark on the Prop when programming in C with Catalina.
As far as I can understand (subject to review as always) if you use 2 Cogs for floating point "coprocessor" support along with 1 COG to run the main program you are going to get about 0.04 MFLOPS (Mega Floating Point Operations Per Second).
Given that you will need perhaps 4 COGs per Propeller to perform communications between Propellers in the cluster that's it. 0.04MFLOPS per Propeller.
So Humanoid's tower of 40 Props could deliver 40 * 0.04 = 1.6MFlops
And Dr Mario's high end cluster perhaps 1000 * 0.04 = 40MFlops
What do these figures compare to?
Here are some results for the same Whetstone benchmark run on various old PC's:
Here we see Humanoido's Towering Inferno is about as fast as a 386/486 era PC.
Dr Mario's cluster is somewhat faster than a Pentium Pro.
I venture to suggest that my Android mobile phone could turn in a better result than one thousand Propellers when it comes to floating point work.
Now don't let me discourage you, perhaps "super computer" is not the main point of the exercise anyway. Perhaps you will prove me wrong. This is Propeller Land after all, where impossible things have been done many times.
Have fun.
Edit: On the the hand...
The Humanoido's "Towering Inferno" is up there with IBM's "Stretch" machine from 1961 as used at AEC-Los Alamos National Laboratory
Dr Mario's creation would be up with the CDC7from 1969 as used at the Lawrence Livermore National Laboratory.
As I'm looking at the Props floating point support just now I will do some of my own measurements. Luckily Lonesock has been optimising the Props float routines.
And, about the HDD - I am thinking which to use, PATA or SATA IO on laptop hard drive, regarding the uncertainty of mass throughputs between boards and master CPU.
And, with Propeller II, the dedicated PSU design is bound to get real hairy... At least, I can always try few options, like SMPS controller with MCU core inside (not a simple, friendly DIP8 chip, mind you...)
Seed computer: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC8641D&nodeId=0162468rH3bTdG8653
Board host processor: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC533&webpageId=11533145765637172706C1&nodeId=0162468rH3bTdG06C18648&fromPage=tax
http://focus.ti.com/docs/prod/folders/print/omap3530.html
http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC565&webpageId=11533145765637172706C1&nodeId=0162468rH3bTdG06C18648&fromPage=tax
And for IO:
PCIe to PCI converter: http://focus.ti.com/docs/prod/folders/print/xio2001.html
Board processors have to be chosen at future date, though. And, don't be shocked if the MPC8641D is extremely expensive - I have been meaning to use it for seed computer on my machine, as it's efficient in controlling up to 4,096 Propeller II chips, anyways. PPC are hard to program, but are always predictable, so I am going to use it, and if you're not convinced that I shall use the CPU that powerful, I have a 400MHz G2-based internal host SoC, yet to be soldered, somewhere in my box.
Fair enough. Let's think about integer performance then.
That MPC8641D has two cores and seems to be rated as 1.5GHz each, 2.3MIPS/MHz
If we trust the spec sheet here: http://www.freescale.com/webapp/sps/site/overview.jsp?code=DRPPCDUALCORE
That's 1500MHz * 2.3 * 2 = 6900 MIPS !!!
A Prop has a theoretical to whack of 20MIPS * 8 COGs = 160MIPs
So one MPC8641D is equivalent of 6900 / 160 Prop, about 43.
It is my contention that the Props MIPs is realistically 4 to 10 times slower. This is surely true if your program is substantial enough to not fit in COG or it's data set is big enough to reside in HUB. Further you will need to dedicate half the COGs to communications between Props
This gets us to one MPC8641D being equivalent to 344 to 860 Propellers. This is before we even start thinking about communication bottlenecks.
Would it not be easier and cheaper to string a few MPC8641D boards together, especially as they have a 2.5Gb/s per lane fabric interface?
Heater, you have benchmarked the UltraSpark 40 with negative conditions that do not exist on the actual machine.
The UltraSpark 40 is entirely an integer-based machine which is being used not for floating point, but for the design and verification of parallel algorithms. It's equally suitable for Artificial Intelligence as Dr. Mario pointed out. This is definitely an area that many of us look forward to developing.
I quote an approximation made by Bill Henning in the UltraSpark 40 thread, which is 64MFLOPS. The machine speed is 6,400 MIPS with the 5 MHz crystal and overclocks with a 6.25 at 8,000 MIPs.
Is it super computing? When we take one standard prop chip at 160/32 and multiply its power to get 8,000 MIPs with 1,280 ports, it becomes a super computing experience for me.
Humanoido
Perhaps so, it would not be the first time I grabbed hold of the wrong end of the stick. Also my apologies for mis-naming the UltaSpark 40.
Understood. How is my analysis in my last post incorrect?
I have to disagree with Bill here. In theory perhaps, 20MIPs times 8 COGS times 40 Propellers is indeed 6400 MIPS.
However that is for a PASM program executing entirely in COG with no significant data and no input output.
There are those of the opinion that PASM is effectively the micro-code of the Propeller, looked at that way one does not measure the MIPs of a CPU by the rate at which micro-code steps are performed.
It's also overlooking the not trivial detail that to get the benefit of parallel processing you generally need good communications between nodes. For which I arbitrarily allocate 4 COGs.
Perhaps you and I should come up with a little benchmark program or two that is representative of what you would have running on each COG, and actually measure the effective MIPS.
Compared to the list of top performing computers here: http://en.wikipedia.org/wiki/Supercomputer then yes at some point in history.
Compared to the speed of this PC I'm now writing this with no.
I still maintain the usable MIPS of 40 Propellers is more like:
4 COGs times
2 MIPS times
40 Propellers, equals
320 MIPS
That's about two and a half Pentium Pros from 1997.
Which is of course more than 10 times slower than the normal PC of today I'm writing this on or my mobile phone.
Edit: Again I say, we should come up with some representative algorithm that can actually measure what you want to do.
Could be as simple as:
1) Input two arrays of one thousand, say, signed integers.
2) Add up the two arrays-
3) Output the resulting one thousand element array.
On the Prop we do it in PASM.
On the PC we do it in C.
Hi Heather.
As I in most cases will have same meaning WHAT is Supercomputer.
I still think and in some cases That need very many sensors that all will answer to HOST if possible in Real time (Interrupt driven systems Type PC are not suitable to that).
BUT with Some Propeller Arrays it can be possible to monitor them all at same time and lets them decide if it is need to inform HOST if it need intervention.
And THAT Propeller array CAN be called SuperComputer
Humanoido
Yep, no point to argue over semantics. It's a moving target anyway.
Yes, yes. You hit the nail on the head.
When Humanoid first announced the UltraSpark 40 one of my first question was "what would be the best use of a machine with 1280 general purpose IOs?
The comparisons Heater made with PC's certainly isn't the worst case for the Prop. If you have an integer algorithm, look at something like the multicore, VLIW integer chips from TI, you're talking about 10's of billions of instructions/sec on one chip. I don't think I'm qualified to analyze the system Dr. Mario proposed, if I've got this right he's talking about 4 different processor types and I have no idea at all how that's supposed to fit together.
But if you're primary interest isn't maximizing MFLOPS/MIPS per dollar but to look at parallel programming from a software standpoint, that's a different thing. I'm sure Leon can address this better than I can, but from my (tiny) experience, using languages based on Communicating Sequential Processes is a lot of work. Using something like Linda (which is fairly long in the tooth by software standards now) which allows you to write programs without considering what node a particular task is going to run on or having to think about load balancing strikes me as a better approach. If individuals build machines that use ad hoc programming methods, I don't think interest will ever grow beyond threads like this. If someone implements a language that is fun to program on (Spin Linda or ProbBasic Linda maybe) you can get a cult following and more efficient machines will follow.
-phar
I'm thinking you paste all these ultrasonic gizmos to a shirt and pipe the output to a pair of VR glasses and you have Bat Vision! I'm also thinking I should trademark the phrase "Bat Vision".
-phar
and
Perfect, just the kind of idea I was fishing for for an array of Props like Humanoido's.
I make a biological analogy:
Controlling PC = "Brain", because of it's sheer power and memory.
Propeller Array = "Spinal cord", lots of inputs/outputs, initial processing of sensor data, capable of some autonomous actions.
Sensors = well, the senses.
An ultrasonic phased array might be an ideal example.
On the subject of the phased sonar array, I would think it would be easier for a robot to construct a model of its surroundings by that method than using computer vision, because the phase array data would contain depth information. Or if used in conjuction with a vision system, it could be used to determine the distance to objects identified in the visual image.
-phar
You guys have just hit on something I have been idly pondering for years.
In the AI/robotics world there has been much said about 3D computer vision/recognition for recognizing objects or navigation or whatever.
But I've always had this suspicion that we have a 3D model in our heads whether we can see or not.
Consider: If I put my hand in my jacket pocket and feel around I can recognize that I have a telephone in there, some car keys, a cigarette lighter, bits of string and old bus tickets etc etc.
I can "see" and recognize all of that without seeing them at all. The 3D model is totally independent of vision.
No idea where that all leads, just a thought.
Rather than have the machine "feel" objects and build an internal 3D model from them.
Humanoido
TEN IDEAS FOR MASSIVE NUMBERS OF I/O PORTS
IMPLEMENTED WITH THE ULTRASPARK 40
http://forums.parallax.com/showthread.php?p=938983&posted=1#post938983
Although I am looking at using Propeller for AI clusters - it's a hard comparison toward either AMD Radeon HD 5770 VPU and Propeller II chips: AMD Radeon HD 5770 got 1,600 streaming DSP cores, while the cluster of 4,096 Propeller II will need to get up to the same horsepower: It can spit out 3.75 Teraflops easily.
I don't think I meant "different" but a tactile sense augments the concept of 3D in our minds and so it could in machines.
Whenever I have read about 3D object or spatial recognition for computers it is always from the idea of "it has a camera , or two, it has the image data, how do we get it to "understand" the 3Dness from that imagery?" with no other references to work from.
OK. This is going to get long winded and possibly gross but imagine this:
Take a human baby. Allow it to grow up in an environment where everything it sees is far enough away that 3D vision does not work so well. Arrange that it can never reach out into that environment and touch the objects there. It can never crawl into that space. It only ever sees one side of anything, mothers face straight on but never from the side or back.
I propose that this poor child may never get the "concept" of a 3D world into its head. No matter what it can see, it will never get the idea of space and distance because it has never mechanically experienced space and distance. Building a 3D model from the flat world we have allowed it to experience is impossible.
Why should it be possible for computers then?
Where this waffley argument gets us I don't know.
-phar
Good story.
That's my point really, its the physical motion, the experience by feeling things, that creates the 3D model of the world in our minds. It's the means by which we actually measure distances and how big things are. This all gets correlated with what we see at some point, so much so that the 2D images we have on our retinas become 3D.
That blind mole in his labyrinth of underground tunnels knows exactly where he is.