Propeller supercomputing

Gadgetman · 2006-04-20 08:25

A 8 x 8 matrix may be a bit more than most would want to buy, at least initially, but...

What about a stick with 4 or 8 in a single row?
(Possibly with the option of connecting sticks side-by-side to create a matrix? )

Have the free I/O-pins avaliable on the edges, and you suddenly have a really neat 'backbone'...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

Kaos Kidd · 2006-04-20 14:04

One thing I see...
Unless you can download to a single chip, they will ALL get the same code... which is still 32K.
Or, someone would need to come up with some form of overlay method, and storage.
Now, if you are talking large numbers of sensors that use simular code, and that code can be paramitorized, and driven from the same calling code, it might work.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket

KK
·

Electronegativity · 2006-04-20 14:49

Hi Kaos Kidd.

In Chip's original post he mentions having separate enumeration pins for each chip.

I guess you could use these to tell each chip when its turn had come to be programmed.

Kind of like the "chip enable" pin that's on almost every inegrated circuit.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I wonder if this wire is hot...

Paul Baker · 2006-04-20 15:00

I would like to see a N=2 hypercube (8 propellers), though hypercubes would be better suited for 64 I/O pin.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10

Electronegativity · 2006-04-20 15:11

Hi Paul.

You mean cube: 2X2X2 = 8.

Hypercubes are 4 dimensional: 2X2X2X2 = 16.

The tough part is generating the 4th spatial dimension.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I wonder if this wire is hot...

Kaos Kidd · 2006-04-20 15:41

Electronegativity:
OOps, your right... I did miss that... My bad... [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket

KK
·

cgracey · 2006-04-20 15:41

parsko said...
Guys,

Could we keep in mind that one does not necessarily HAVE to use an array of 8x8 Props. Am I correct in saying that all the same theories apply to a 1x2 array of Props? YES

Chip, would you please explain the theories of enumeration, parallelization to me, being a laymen (I implied this question in my first post on this topic, but should have asked outright)? I'm mostly curious about the "10 words or less" version, or should I say, what does this mean to me? I personally really like the idea of linking two Props for data aquisition in control systems (though I have no clue what even one Prop will do yet!)

Today (without any addition to the Propeller Tool software), you can load up more than one Propeller chip in parallel. They would all receive the EXACT same program, and then immediately start running after loading. They need to be enumerated (assigned unique ID's), though, before they will have any uniqueness. Some multi-chip systems may not care who the member chips are, but for shared work, it often helps for everyone to know WHO they are within the system. By knowing WHO you are, you can know WHEN it's your turn to output to a communal bus, or WHAT data you should process.

To enumerate, the first thing everyone can do after firing up is to count how long it takes their P29 pin (input) to go high, determine their ID from this time, then make their P28 (output) go high to trigger then next guy. So, the first chip in the system has his P29 pin tied high. Everybody's P28 starts out low, and connects to the next guy's P29. So (as a result of programming), there is a 'high' propagating down the line of chips - into everyone's P29, and out everyone's P28. This way, everybody can learn who they are. To kick off high-speed communication, it could be the FIRST chip's responsibility, or whoever's you want to make it. So, you can load Propellers in parallel, have each guy learn who he is, and then begin running the main application with all chips communicating through some messaging scheme over I/O pins.

Also, because everybody has the same-named variables at the same locations within their HUB RAMs, you could use your local symbolic names (in the form of @variable) to get addresses of other chips' same-named variables. By having a messaging scheme over I/O pins that is fast, any chip could read/write any other chip's variables (longs, words, or bytes in HUB RAM) by name.

This is just one idea for a cooperative system. I'm sure there are countless ways this could be accomplished. This just seemed readily workable to me.

-Parsko

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey) : 4/20/2006 3:44:32 PM GMT

Kaos Kidd · 2006-04-20 16:04

Chip:
All I can say is... WOW!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket

KK
·

Tracy Allen · 2006-04-20 16:14

It seems to me Propeller also has potential for "supra-computer" explorations. Like...

Cellular Automata: "A regular array of identical finite state automata whose next state is determined solely by their current state and the state of their neighbours." <www.cs.bham.ac.uk/~wbl/thesis.glossary.html>

Neural Network: "An interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal brain. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns. Neural nets are used in bioinformatics to map data and make predictions." <www.inproteomics.com/nwglosno.html>

Part of my grad school work was on phase locking and chaos in coupled neurons. I did experiments using coupled oscillators built with programmable unijunction transistors, and using numerical simulation on a PDP7. One propeller could do 8 neurons, and with the pwm output and summed sigma-delta inputs, it could even do the analog weighting functions. Wow, use propeller to build an analog super-computer!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Electronegativity · 2006-04-20 16:35

That's really cool Tracy.

I wonder if a propeller array could be adapted for "classic" neural network functions like voice recognition?

Maybe in conjunction with other hardware that would perform the FFT and/or SVD.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I wonder if this wire is hot...

Phil Pilgrim (PhiPi) · 2006-04-20 16:35

Genetic Algorithms for large-domain optimization problems comes to mind, too. They're ideally suited for parallel processing.

-Phil

Paul Baker · 2006-04-20 17:03

Electronegativity said...
Hi Paul.

You mean cube: 2X2X2 = 8.

Hypercubes are 4 dimensional: 2X2X2X2 = 16.

The tough part is generating the 4th spatial dimension.

Electronegativity, your correct. Ok Im not sure what its called, but I was thinking of a 3D cube whose edge processors are connected to the opposite end. Actually it only works for N=3 or greater, here's a single layer for N=3 (1 of 3 layers):

attachment.php?attachmentid=41335

So what is this called?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10

Phil Pilgrim (PhiPi) · 2006-04-20 17:06

Topologically speaking, that would be a cylinder. If you connect top and bottom in the same fashion, you'd have a torus.

-Phil

Paul Baker · 2006-04-20 17:10

And if you have 2 more layers of 3x3 processors connected in torus fashion between each layer and itself, is also a torus?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10

Electronegativity · 2006-04-20 17:26

I think, topologically speaking, that would be called Fruit Loops.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I wonder if this wire is hot...

Phil Pilgrim (PhiPi) · 2006-04-20 17:56

If I understand your question correctly, that would be a hypertorus -- as opposed to a hypertortoise, i.e. a tortoise that's eaten too many Fruit Loops.

-Phil

Paul Baker · 2006-04-20 18:10

ROFL ok, I got it. Oh yeah I forgot to draw the torus connections in the above pictue.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10

parsko · 2006-04-20 18:49

"To enumerate, the first thing everyone can do after firing up is to count how long it takes their P29 pin (input) to go high, determine their ID from this time, then make their P28 (output) go high to trigger then next guy. So, the first chip in the system has his P29 pin tied high. Everybody's P28 starts out low, and connects to the next guy's P29. So (as a result of programming), there is a 'high' propagating down the line of chips - into everyone's P29, and out everyone's P28."

I undertand the concept, but the execution seems to not work.

Is it true that when a chip starts, it first looks to download data from the eeprom (P29 and P28 above)?

How do they determine the unique ID at this point? Isn't the Prop, at this particular instant, an empty shell awaiting code from eeprom?

Would it be easier to use two other pins and apply the same concept?

Fruit loops and tortoisii [noparse]:)[/noparse]

-Parsko

Electronegativity · 2006-04-20 19:04

First you program all the propellers identically.
Then they execute a program that counts until the pin goes high.
The pin is tied high on the initial propeller.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I wonder if this wire is hot...

Gadgetman · 2006-04-20 19:43

Just a bit about 'hypercube' architecture...

They are ALWAYS a 'power of 2' size (2, 4, 8, 16, 32, 64 and so on) and yes, 2 is a valid number.
The simplest is a 2 nodes with one interconnect between them,
the next step up is 4 nodes in a square, with in total 4 interconnects, and where a message may have to pass through 2 interconnects to reach the recipient.
Then comes 8 nodes, usually drawn as a cube, and with 12 interconnects. Here a message may have to pass up to 3 interconnects to reach its recipient.
The next with 16 nodes is usually drawn as a small cube within a large one, and with added interconnects between the corners of the inner and outer cube. This gives 32 interconnects, and a max traversal of up to 4 interconnects to reach a recipient.

Drawing further dimensions is a bit of a hassle, so to simplify it, start with a square(4 nodes) and stretch the uppermost interconnect so that you get this shape:

*-----*
 \    /
  *-*

Draw a mirror-image above it(the next 4 nodes), then connect all nodes in the upper trapeezoid with the exact opposite in the bottom one.

To add the next dimension, pick the two 'topmost' nodes in the new figure, and 'pull them apart' until you get a half-circle shape as before, mirror it, and add the vertical interconnects.

Repeat as needed...

On a 65536-size computer you end up with a max traversal of 16 nodes from one to another. I'll leave the interconnect-count for those who enjoys playing with numbers, and only say that 'yes it IS very large...'

That is the disadvantage of Hypercube computers, that the number of interconnects ends up almost unmanageable.

How I know?
I wanted to build one with Z80 chips, but never could find cheap RAM and ROMs for it...
(The right types are getting expensive, and if you want 256 or so... It became too costly)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

Paul Baker · 2006-04-20 20:10

So this is an example of the 16 node:

This seems do-able, even using 8 bit pathways, youd have 8 bits left over for I/O from each node or 128 pins of I/O, with 128 PEs and 2560 MIPS processing power. Of course at individual pricing, it would be $400 of propellers.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10

cgracey · 2006-04-20 20:34

I'm hoping that before too long, we at Parallax will make a few array configurations on PCB's and sell them for much less than the individual chips would cost.

Maybe we could hit $250 for a 4x4 array. We can't go much lower because we still need to allow some distributor margin.

I don't think these boards would displace regular chip sales much, since they're pretty esoteric in purpose. Anyway, then you all could play with this parallel idea.

We could make an 8x8 board, too,·for perhaps $800.

Any interest?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

cgracey · 2006-04-20 20:40

parsko said...

Is it true that when a chip starts, it first looks to download data from the eeprom (P29 and P28 above)?

How do they determine the unique ID at this point? Isn't the Prop, at this particular instant, an empty shell awaiting code from eeprom?

Would it be easier to use two other pins and apply the same concept?

Fruit loops and tortoisii [noparse]:)[/noparse]

-Parsko

After the Propeller is reset, it looks for activity on P31 and P30. At this point, another system can load a program directly into that Propeller's HUB RAM, from where it will begin executing right afterwards. The program that gets loaded into a whole array of Propellers can start by performing this enumeration procedure, then begin executing the main loop code.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Gadgetman · 2006-04-20 20:40

Yes, that is exactly how a 16node Hypercube looks.

The best way to communicate on larger cubes would probably be to set up one or two pins on each Propeller for each link, and have one(or possible even 2) COG dedicated for communication, as it not only has to send and receive message to/from itself, but is also expected to relay messages. It may also need some sort of routing-protocol in case some nodes are too busy(maybe a program is running on a 'sub-cube' that requires intensive communications).

I started writing notes on protocols for my aborted Z80 design, but unfortunately, those notes have since disappeared.

Anyway, the largest HyperCube design I have ever seen was in a Technical school in Trondheim here in Norway(it was one of the world's first i386 designs, and was begun before any PCs with the 386 was available), and at that time it had 16 nodes, but they were planning to upgrade it to 32 nodes.
Also, their design was round, and I believe that is the only physical design that will work on large numbers.
(That, and cylindrical. )

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

Gadgetman · 2006-04-20 20:46

Chip Gracey said...
I'm hoping that before too long, we at Parallax will make a few array configurations on PCB's and sell them for much less than the individual chips would cost.

Maybe we could hit $250 for a 4x4 array. We can't go much lower because we still need to allow some distributor margin.
Any interest?

*AAARGH*

Difficult choice...

Buy apartment, or new supercomputer?

Oh, wait, I have already signed the d@mn contract...
(Besides, I need the new place so that I can turn around without hitting a computer...
I have no idea where to store the UltraSpark 5 I'm expecting, and... )

*AAARGH!!!!*

Anyone want to buy a 98 model Citro

jamma · 2006-04-20 22:09

This talk of hypercubes recalls Danny Hillis' Connection Machine from the early 80s. I remember SciAm had a cover article on it. I believe they got as high as a 2^16 hypercube. To Chip's point, they moved away from SIMD (Single Instruction, Multiple Data; i.e. "classical" parallelism) to MultipleIMD, where functional units perform different operations on different data.

Interestingly, they also abandoned hypercubes in favor of fat-trees. See: http://en.wikipedia.org/wiki/Fat_tree

johnsrobotics · 2006-04-21 02:22

I really like the 4x4 on a board idea....and think it can be made very useful.

Phil Pilgrim (PhiPi) · 2006-04-21 02:24

As a followup to an earlier discussion in this thread about the patent mess in this country, here's a link to a recent editorial in EETimes ("No Patents. No Kidding" by Rick Merritt):

··www.eetimes.com/news/latest/showArticle.jhtml?articleID=184417250

I believe Mr. Merritt's argument has merit. Our current system is broken; it rewards the wrong people; it stifles true innovation, rather than rewarding it. Why not just get rid of it entirely?

-Phil

CHIPKEN · 2006-04-21 03:58

Phil and all,

Please review what EDN says recently about multiprocessor architectures. They say this is tough thing for you guys to do - interesting.

www.edn.comms4171

Chuck

CHIPKEN · 2006-04-21 04:07

Phil and all,

Should be www.edn.com/ms4171.

Chuck

Propeller supercomputing

Comments