Propeller mini-super computer

AIman · 2011-01-28 11:03

I am working on two different designs for a mini-super computer based on the Propeller. Normally I would keep this sort of thing to myself but I think if we pitch together on this we can all greatly benefit. Actually I think Parallax could build a very fast very small computer based on the Propeller but thats not what this thread is about.

Here's the basics -

There are two different types of computers I am in the process of building. One is based on a cube and the other a sphere. Literally.

They are designed to function much the way the human brain cells work where if one isn't being used another can use it or if one gets damaged or overload another can step in to take over.

All that being said here is the way the Propeller chips are looking in the cube design. A set of 6 propellers are put together so that all the prongs touch and a couple prongs of one is bent outward and soldered to wires to connect it to a PC. All the other pins are soldered to the prongs of the Propeller chip it touches. Groups of the cubes are then interconnected to each other the same way the interface with the PC works. Each cube is connected to the interface with the computer or run through a single interface depending on application. In other words a group of cubes linked together that either depend on a central cube for a PC link or a cube can access a PC on its own and be connected to other cubes. The advantage of this over the sphere design is space.

For the sphere design the chips are being soldered together much the same way that a soccer ball looks. I suggest using a styrofam ball to maintain shape. Oh and be careful, you don't want a fire. As before a single point of contact or multipule points of contact can be used depending on application. Keep in mind that this form will require short jumpers at times to maintain the sphere shape. The advantage of this over the cube design is its a lot more computer power.

Leon · 2011-01-28 11:33

How do they communicate with each other?

What is the point when you can get a single chip with much more performance?

Mike G · 2011-01-28 11:58

@AIman, checkout this thread. You guys probably have a few things in common..

http://forums.parallax.com/showthread.php?124495-Fill-the-Big-Brain/page13

AIman · 2011-01-28 11:59

Leon -

They communicate with each other the same way they would if they were in different locations.

The question of a single chip is best answered by one of the Parallax guys.

Mike -

Thanks for the link.

AIman · 2011-02-01 06:47

Leon I have an answer for you.

Originally I was thinking you were asking things regarding the speed of the propeller and how fast propellers could talk to each other and then I got thinking about what you ask and realized you were meaning the trade off in speed between running something like described vs running one chip.

The trade off between running one of these designs vs a single chip is dependant on what you are doing. My designs are based off the 44 pin propeller item P8X32A-Q44. In short each side of the chip gets assigned a task to work on. Unless the project is going to require the use of a couple of cogs getting info its going to be faster to run a single chip.

When the process become longer and more involved it becomes much faster to run multipule chips and break out the assignment based on either the number of chips involved or the number of cogs involved. The purpose of making these specific designs was so that multipule in depth calculations and analyzis could happen will getting input from various sources and creating output to others. It was also designed assuming some basic super computer structuring of handing off different tasks would be involved, namely tasks being assigned to different locations while taking into account space, cost and overall power and heat factors.

So a single chip will be faster until the processing speed and time of the propeller maxs out, which is dependant on your individual requirements.

To put this in terms of robotics, if you are running 5 Sonar, a motor controller and an IR you are pretty much maxing out your cogs in terms of availablity. I assume leaving one open to communicate with a PC or something of that nature.

On the other hand if you take the same project and do 5 Sonar, 3 Motor controllers, a data logger, a camera, 3 IR sensors, a wirless transmitter and an internet connection then the designs described will be much faster. For the cube design, each task is deligated (sp?) to a specific chip and that chip is used only for that task for that time. In the sphere design the task is given to specific chips or cogs within a specific chip and the handoff is handled directly by that chip.

In otherwords these designs are removing a central CPU.

Leon · 2011-02-01 07:24

I thought you were talking about a supercomputer, not a distributed controller with several relatively autonomous tasks. I can't see that using several Propeller chips cobbled together as you propose will get you much computing performance, it isn't a "supercomputer" as the term is usually understood.

hinv · 2011-02-01 07:27

Hi AIman,

There are definately cases where more than 1 prop is needed to handle a situation, but in your above example, after running out of cogs, a more cost effective way to add more monitored devices/tasks is to combine som of the simpler tasks into one cog. If you look in the object exchange, you will see a 4serial port object, a multi servo object, etc.

If your intent is more acedemic than to solve a particular problem, I suggext you read the threads below.

http://forums.parallax.com/showthread.php?125614-Propeller-supercomputer-hardware-questions

http://forums.parallax.com/showthread.php?128055-Propforth4.0a-available-for-download

I for one was hoping parallax would put out a multipropeller design, like the c3 could have been, but there isn't one best way to connct propellers together, but if pin conservation is important, there are some really high speed serial examples that will do over 14Mbit in object exchange, and in the forums.

I hope this helps.

Doug

Mike G · 2011-02-01 07:54

Leon, I agree I think the folks that refer to super... really mean distributed.

Heater. · 2011-02-01 08:56

Here we go again...Prop does not have the performance for a "super-computer"....Prop does not have the communications support....Better to use a single faster chip...Better to use a PC....da..da...da...

But wait, no. The title of this thread says "mini-super computer".

Well, I guess that could be just about anything you want it to be. No problem.

AIman · 2011-02-28 15:25

Basically I am looking at a way to create a computer capable of functioning both independantly, meaning each cog on its own - and in unity, meaning all the cogs together. The projects I have in mind will require several of the most capable processors and the end goal is to maximize space and functionality. One Prop on its own won't cut it and because of the amount of information to be monitored it will take several.

Initally using individual chips and linking them was considered, however that also included some of the other chips Parallax puts out. Because of the need to have each item available at any given time and monitored without using a central CPU, using a series of chips linked together has become more and more desired, perhpas needed would be a better word.

In the end, the sort of processing power being looked at will tax several Props. I am already using all the Props I have and need to get more.

Humanoido · 2011-03-01 23:08

AIman wrote: »

I am working on two different designs for a mini-super computer based on the Propeller. Normally I would keep this sort of thing to myself but I think if we pitch together on this we can all greatly benefit. Actually I think Parallax could build a very fast very small computer based on the Propeller but thats not what this thread is about.

Here's the basics -

There are two different types of computers I am in the process of building. One is based on a cube and the other a sphere. Literally.

They are designed to function much the way the human brain cells work where if one isn't being used another can use it or if one gets damaged or overload another can step in to take over.

All that being said here is the way the Propeller chips are looking in the cube design. A set of 6 propellers are put together so that all the prongs touch and a couple prongs of one is bent outward and soldered to wires to connect it to a PC. All the other pins are soldered to the prongs of the Propeller chip it touches. Groups of the cubes are then interconnected to each other the same way the interface with the PC works. Each cube is connected to the interface with the computer or run through a single interface depending on application. In other words a group of cubes linked together that either depend on a central cube for a PC link or a cube can access a PC on its own and be connected to other cubes. The advantage of this over the sphere design is space.

For the sphere design the chips are being soldered together much the same way that a soccer ball looks. I suggest using a styrofam ball to maintain shape. Oh and be careful, you don't want a fire. As before a single point of contact or multipule points of contact can be used depending on application. Keep in mind that this form will require short jumpers at times to maintain the sphere shape. The advantage of this over the cube design is its a lot more computer power.

Alman, this is a very interesting thread! Some research in the Forum posts show small versions of a new Sphere computer built using a multi-prop system and some of the cubes were created in a lower cubed numbering.

Work is proceeding on several more advanced models that can architecture with both hardware and software. It's really slow going some days because we literally have to invent the wheel.

Some early threads are about molding the three dimensional aspects and communications of the Sphere computing machine. This does not follow conventional boards of current or past technology.

The current project is a growing number of 170 Parallax processor cores. The architectures can cube in combination of hardware and software. With larger investments of time, energy, and resources, it's important to multi-purpose.

In regard to directly soldering Propeller pins to Propeller pins, there is another side. Use sockets in a "chip removable config" in case one chip goes bad and needs to be replaced. Place protection resistors from pin to pin. Perhaps it's not as easy as "hack n stack," though we wish..

koehler · 2011-03-02 00:38

I have to say, the kid in me thinks this is kind of cool, however the adult sees this as an academic exercise only. I'm sure someone out there could throw 10-20+ Props together for some specific case, however I doubt more than a handful would do so.
A nice dual-core ARM Cortex A9 @1Ghz is probably going to run $40 or less, the cost of 5 props. 5 Props x 8 cores equals around 40 'threads' I would think. I don't think a dual-core 1Ghz proc with L1 and L2 ( up to 8MB) cache would be overly stressed.
The cube/sphere layouts are cool, I remember thinking about that in the 80's when Connection Machines was rockin' along. The thing is, now that we have such powerful cores in the embedded world, it moots the need for such designs. I may be wrong, but those designs were thought up partially because of the need for more power, and the only way to get it was to go multi-chip.

Dr_Acula · 2011-03-02 03:44

Humanoido's "tower of power" rocks, IMHO!

As for the definition of a "super computer", I guess as I get older I (cynically) think that that it is simply a computer that is faster than current computers.

Maybe you could tweak the definition and talk about "cycles per watt"? eg, I have a propeller that is using well under half a watt. My PC is using at least 500 watts. So you are allowed 1000 propellers to make a PC, and that is 8000 cogs, and maybe that might actually be faster than a PC?

The sphere is interesting. Watch for static from the styrofoam.

I wonder what the best layout is for comms? When you go from 2 dimensions to 3 dimensions, the number of hops decreases. So if you go to 4,5,6 dimensions etc (in a mathematical sense), that must decrease the number of hops. That means each prop can devote more time to processing rather than comms. A 3x3x3 cube is easy to visualise. A 3x3x3x3 "cube" is a little harder to see in the minds eye.

Please keep us posted with this project!

koehler · 2011-03-02 04:48

Dr_Acula wrote: »

Humanoido's "tower of power" rocks, IMHO!

As for the definition of a "super computer", I guess as I get older I (cynically) think that that it is simply a computer that is faster than current computers.

Well, even going back to the 80's a super-computer has always been pretty well defined as a quite advanced systems which have real, peerless computing-math capabilities. Every 6 months or year now there is something faster coming out in the average desktop-server field.

[/QUOTE]
Maybe you could tweak the definition and talk about "cycles per watt"? eg, I have a propeller that is using well under half a watt. My PC is using at least 500 watts. So you are allowed 1000 propellers to make a PC, and that is 8000 cogs, and maybe that might actually be faster than a PC? [/QUOTE]

I think thats a bit unfair, on the PC side for a couple of reasons. 1. I've got a 6-core AMD with 4GB RAM, and my Killowatt normally reads 200+ watts when its cranking.
2. Look at what you are doing right now on your computer, and tell me how many Props you'll need to give you an OS with even half the features, and apps like this browser?
Cogs are great for what they are designed for. I don't think its really computing as we know it.
Now emulating an Apple II with 48K RAM, and ? ROM with a basic display? Sure.

But staying with something a bit smaller, to fairly compare against, using current tech, I found this from end of 2010: The NuSmart 2816 is the worlds first chip to integrate a 2GHz dual-core ARM Cortex-A9 processor, multi-core 2D/3D graphics processor, 64bit DDR2/3-1066 memory controller, 1080p multi-format video engine, SATA-II controller, USB 2.0, Ethernet, together with general I/O controllers. NuSmart says the 2816 is very energy efficient consuming less than two watts when running at 1.6GHz.

So somewhere around 3.2Ghz of ARM RISC @2W will yield ~3000 Dhrystone MIPs.

A Prop with 8 x 20 MIPS = 160 MIPS (non-Dhrystone!) at 1/2 W, so 4 Props (2W) is ~640 MIPS (non-Dhrystone).

For what its worth, I believe Dhrystone MIPS are computationally expensive, and at a guess would expect the Prop to have <100 MIPS max if it were ever programmed to run the test.
Actually, on a straight MIPS to MIPS basis its already decided. Prop x 4 is 640 MIPS, dual core A-9 as above, 3200 MIPS. The answer is obvious.

I'm not knocking the effort, just saying calling it a super-computer is just wrong on several levels. I believe someone got a toaster to run Linux, so doing something like this just to do it is cool. Lets just stay this side of Charlie Sheen reality...

0430, really should have gone to bed.

The sphere is interesting. Watch for static from the styrofoam.

I wonder what the best layout is for comms? When you go from 2 dimensions to 3 dimensions, the number of hops decreases. So if you go to 4,5,6 dimensions etc (in a mathematical sense), that must decrease the number of hops. That means each prop can devote more time to processing rather than comms. A 3x3x3 cube is easy to visualise. A 3x3x3x3 "cube" is a little harder to see in the minds eye.

Please keep us posted with this project!

The more n-dimensions you add, the number of GPIO's goes up exponentially.

Robert T · 2011-03-02 05:44

Although not a super computer, look at the HoverFly Technologies board(http://shop.hoverflytech.com/). The board has two Propellers. I assume because of the need for addition I/O and more computational power. I am not sure of how the computations are distributed or how the Propellers communicate since the software is not available. I think the Propeller design makes one think in multiple processors because it is itself a parallel style processor. When I think of the Arduino(or other processors) for some reason I don't consider multiple processors to solve a problem.

Clock Loop · 2011-03-02 21:27

AIman, Looks like you need this thread.

http://forums.parallax.com/showthread.php?127983-55-Parallax-Propeller-s-Parallells-Processing-of-Permanent-Perturbations.

Humanoido · 2011-03-03 07:13

Comparing apples to oranges is an age old debate that has never reached any definitive conclusions except in the minds of the individual debaters. Perhaps the only real conclusion is that various types of systems coexist for their own reasons which can be representative of a significant capability to stand above other systems, making it "super" in some unique way. If you just get on the Forum and survey some of the larger propeller machines, you'll see a wide range of purposes and designs. In a more general sense, "super computing" can exist in the realm of one's interests, in terms of speed, unique design, power consumption, number of cores, special software, vast number of controlling ports, emulation capability, or the number of programming languages..

Ari · 2011-03-03 14:27

Sounds like self healing distributed computing.....essentially a mini data center....

why not just use a single chip to handle a fast protocol (gig-e) and then allow that program to bounce between chips that are not in use?

it would be a bit of a wiring nightmare and you would wind up with the most expensive version of an intel i7 possible, but it sounds like a great way to understand modern distributed computing....

the better bet would be to buy pre configured pico ITX mobos and cpus and just hook them up via fast ethernet.....you could use dragonfly BSD to handle the compute distro

there is of course the issue of network stack overhead.....but that could be sorted out by using a dedicated controller (or program that bounces between cogs)

all well beyond the scope of anything I could implement.....I would just go out and buy the pre-built solutions.....at the end of the day all of this would be to handle complex programs....which modern systems do quite well, for very little coin....and with proper optimization, very little power....

I recently build a powerhouse VM esXi host, for under a grand and pulling around 60 watts at load....

I don't think the propeller would ever allow for such things.....most of it's resources would be used up by the simple communication protocols

but the first time i saw product description for the propeller i thought the same things as the OP....then the reality of cost sank in

microcontrollers are wonderful cheap devices for simple portable tasks.....they are easily dwarfed by modern computing offerings.....

and for multi threaded tasks they are absolutely devoured by common offerings

sounds like a fun project though....

Dave Hein · 2011-03-03 14:43

When I first saw the subject line I thought it was a mini-supper computer -- something that could automatically prepare my supper.

Ari · 2011-03-03 19:11

Dave Hein wrote: »

When I first saw the subject line I thought it was a mini-supper computer -- something that could automatically prepare my supper.

i am first in line to buy that device....let me know how it works out....then again a housekeeper or mail order bride would probably be cheaper (at least until the mail order bride gets a green card and runs off with half your assets)

Humanoido · 2011-03-05 09:11

Ari wrote: »

why not just use a single chip to handle a fast protocol (gig-e) and then allow that program to bounce between chips that are not in use?

Ari, how can I make a program bounce between the chips that are not in use?

koehler · 2011-03-06 15:45

I have to say, does Ari really understand the implications of what he said there?
You are either going to have a copy of 'the program' in every "chip"/cog from the outset, or you are going to need to xmit a copy of it across the network to an avbailable resource "chip"/cog.
The former is going to require all "chip"/cog's to waste memory holding a copy for potential possible use that may never occur for most of them.
The latter will require inter-process comms to poll all nodes to determine which is the least utilized, and then for one node to xmit its 'program' to another node. This will take a fair amount of coordination between the two, or three, etc if it is in a mesh network, and time.
Certainly do-able, however the overhead associated with it means it will be costly with respect to time/impact on resources. Doubtful you'd want to be doing this often.

Humanoido wrote: »

Ari, how can I make a program bounce between the chips that are not in use?

Humanoido · 2011-03-07 03:40

koehler wrote: »

I have to say, does Ari really understand the implications of what he said there? You are either going to have a copy of 'the program' in every "chip"/cog from the outset, or you are going to need to xmit a copy of it across the network to an available resource "chip"/cog. The former is going to require all "chip"/cog's to waste memory holding a copy for potential possible use that may never occur for most of them. The latter will require inter-process comms to poll all nodes to determine which is the least utilized, and then for one node to xmit its 'program' to another node. This will take a fair amount of coordination between the two, or three, etc if it is in a mesh network, and time. Certainly do-able, however the overhead associated with it means it will be costly with respect to time/impact on resources. Doubtful you'd want to be doing this often.

Hi Koehler! Really great ideas, as with many modern day inventions, often happen without understanding the full implications that happen farther down the road. How we bounce the program is programmatic.The real focus is, as you mentioned:

coordination
time/impact
resources
overhead
inter-process comms
Certainly do-able

Ari · 2011-03-07 04:45

No I do understand the implications of what I said earlier. That was exactly why I said such a concept would have a high resource cost. Either central controller (which defeats the purpose entirely) or a background process. If I had to approach the problem I would write the program in small chunks and spread it across cogs/mem. Or better yet nest it within other process. I suppose that ultimately implies an "operating system".

Timing is a non issue, since high precision timing IC's can act as clock ref. Also you could boost the process dynamically by having the fully compiled program implement clock "throttling". This is why I made the intel analogy earlier.

You would just be combining propellers to equal something similar (yet far slower) than an newer intel/amd chip....

The entire project would be beyond the scope of analysis tools that most people own (determining high clock accuracy and such) and would be a hell of a learning curve.

I posted in another thread that it would be far simpler to just buy intel imbedded chips (pine view) and just hook them up via gig-e and then use dragonfly BSD to cluster/distribute compute.....

Or the painfully simple solution is to just build an esXi chassis.....which can be done with very little cost....

again what is the purpose of building such a device anyhow? it defeats the purpose of a "microcontroller". So why use micontrollers to re-invent the wheel (at a far greater cost)

Humanoido · 2011-06-22 07:15

What is the follow up on this project? Did you build it?

Propeller mini-super computer

Comments