3D Propeller
Humanoido
Posts: 5,770
I've connected Stamps together in network fashion which is more linear than three dimensional. Now I have an experimental three dimensional array of Propeller chips but want to know the best way to connect all of these together. I gather that one way is simple just to demonstrate the concept while the other way is more complicated but more efficient. Go with simple the first time is is best, I think. Ok, all you master gurus, how about some ideas about how to simply connect a 2 x 2 x 2 or a 3 x 3 x 3 array in three dimensions? For example, if serial pin ports are connected, the vertice from one plane will need to communicate with the other vertice planes.
humanoido
Post Edited (humanoido) : 12/9/2009 2:50:42 PM GMT
humanoido
Post Edited (humanoido) : 12/9/2009 2:50:42 PM GMT
Comments
http://forums.parallax.com/showthread.php?p=835128
Is this to get the processing speed up or to have more inputs/outputs or some other purpose? I guess that helps determine whether you have a single linear 'ethernet' topology, a 'star' topology or a token ring or something else (eg the middle prop in a 3x3x3 is the router??).
What sort of plugs will you use? Is it ok if it is large or does it have to be tiny? How do you reprogram all the props? Will it be mechanically robust? Will you be wiring it up by hand or will there be PBCs?
I'm not sure if I have all the answers but it sounds a fun project. Is this for robotics or something?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
What should I call it? Fun project? Supercomputer? New cube design? Brain for robot? All the above? You tell me. [noparse]:)[/noparse] Since this is the first one, it can be big. It can be robust or just plain work to show the concept. The prototypes here are all wired by hand on solderless breadboard.
I think it needs a mathematical multiple three dimensional array topology. I was not thinking of one master in the center of a linear or star network on one serial pin. Mainly I want a corner prop in a cube to communicate with all other corners or any prop within the cube.
A good design to start with is the CM-1/CM-2 from Thinking Machines (AKA the father of massive parallelization - Danny Hillis's 64k PU PHD thesis) - a 12 dimensional hypercube with a 1-bit word size that had customers until the Berlin wall fell back in 89. The design is sound and is based on an improved architecture of the ILLIAC IV last seen running at Ames.
In the CM case each chip had 16 PU's and 1 router - some propeller emulation of this design could use 2 PU props + 1/8 controller prop or use 7 of the cogs in a 3-level tree structure with the 8th acting as an off chip router to other prop subtrees.
The connection structure could be the following with some sort of complexity/bandwidth tradeoff - bus, ring, binary-tree, b-tree, 2d-mesh, 3d-mesh, hypercube - but, a fully connected network would have an exponential # of wires - Danny stated it well in his book - that eventually we end up with no more room for wires after a certain # of nodes and the cost of the actual propeller chips is miniscule.
However, an excellent project - I would expect that we can power up 128 cogs and aim for 512 to selectively solve·some·"embarrassingly parallel" problems.
/michael
Post Edited (Michael O'Brien) : 12/9/2009 7:22:49 AM GMT
Above is the 12 dimensional hypercube. It was the design used in the CM-2 supercomputer that had 4,096 chips. The structure as you can see, is a cube with 2 to the power of 12 corners. Each chip connects to 12 other chips. One connection represented one dimension in the cube.
I think some modification or cousin to this design could incorporate all eight internal cogs of each chip in some manner of dimension as well. Otherwise, the full 4,096 propeller chips will be a tad dense in wiring in the cube. Dividing by eight cogs yields 512 "processors."
In another idea, if one processor needs to access any processor in the array, the number of wires to do that may be excessive for that many propeller chips. A different approach is needed, and could be represented in terms similar our Basic Stamp supercomputer designs, i.e. one processor talking at one time. If we go with this approach, one processor can talk to all processors at the same time, and these "listener" processors can be located at any place in the dimensional cube. Any thoughts on this?
If you don't mind token passing like Clock_Loop suggests, then you can use his design or mctrivia's 4 port serial "quilt" board.
In any case, great skill is required for programming such configurations. Do you have what it takes?
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $21.99 has backlight driver and touch sensitive decoder.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Hillis mentioned that eventually parallel machines would have a nearest neighbor 3D topology. So North,South,East,West,Up and Down, I suppose.
-phar
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Post Edited (Leon) : 12/9/2009 8:11:04 PM GMT
-phar
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Post Edited (Leon) : 12/9/2009 10:21:38 PM GMT
If only 12 per node are necessary, then I think you only need 12x11x10x9x8x7x6x5x4x3x2x1
wires instead of 2^4096, although otherwise only 2 or 4096 wires should be sufficient to connect
them all together. What do you use this 12-dimensional computer for? Simulating the Universe
according to String Theory?
"I dimly remember that transputer link switching chip. The mention of topologies should be noted here, because there are definitely horses for courses when it comes to topologies. To give simple examples, a hypercube topology like the Connection Machine pretty much perfectly maps a parallel quicksort. You find an estimate of the mean of the key, subdivide the hypercube into 2 smaller hypercubes and move the data elements into the appropriate sub-hypercube based on a comparison with the mean. Then just continue subdividing the hypercubes and the data set. I think that sorts in something like O(log2 n). But if you're doing something like a parallel screen buffer, a 2D mesh like the Massively Parallel Processor is a much better match.
-phar"
The set of all digital sounds looks in 4D like a honeycomb with a star at each hexagonal angle from all 8 of its axial directions.
I suppose that the hypercube topology would make the ultimate ipod if it in any way helps tell the difference between
all those musical stars, since the space is fully predefined by such a simple pattern as blue crosses on graph paper in 4D.
(The space is also defined simply in 0D,1D,and 2D, with the most confusion probably in 3D).
Post Edited (VIRAND) : 12/10/2009 12:02:41 AM GMT
-phar
4096 hyper cube of prop2s I could route on 16 10x10 pcb and would cost 50g. Would be capable of 5TIPs and have 300000 spare io.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $21.99 has backlight driver and touch sensitive decoder.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $21.99 has backlight driver and touch sensitive decoder.
A cog only needs to *assert* a signal for the FPGA to know from the clock if the cog is active. Using that (with some init), data can be mux-latched onto a bus so that a bus tree can be created. Then an Internet IPv4-like addressing model could be used. One could address a device based on hostid like x.x.x.x ... such that x > 0 and x < 8 and use 0 for multicast.
Only 16 Props each with four 2 wire serial links to neighbours.
That uses only 8 pins used for links in each Prop.
Then the longest path from any Prop to any other Prop is only 4 hops.
Arrange that communications everywhere is bit synchronized.
That way you can clock the 4 bits out of the 4 outputs in 4 different directions at the same time from one COG.
Similarly the 4 incoming bits from 4 different links can be sampled at the same time by one COG.
Probably want to shift data in LONGS rather than bytes for efficiency.
Should be as fast handling each link in isolation like a normal serial links whilst saving on the number of COGs required to do it.
Build the thing on 16 dinky individual boards fitted with appropriate serial connectors. I'm thinking use USB cables to "patch board" wire the hypercube up and mini USB connectors.
Don't forget that each Pro has to handle the routing of data through to neighbours so that's probably another COG in each Prop used in the routing function.
End result: 16 Props with 20 free pins each and 5 COG free for application code.
That's 320 I/O pins for external goodies and 1600 MIPs of go power.
When programming this monstrosity has got the better of you then you have 16 dinky little boards for other purposes.
Just thinking out loud here.....
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
regular and work well if the problem maps well to them, but they have a very large
"average path length" and a relatively low bisection bandwidth taking into account
the number of links required.
Consider the 4096-node network with 12 bidirectional links between nodes; the
average path length here is 6 hops and the worst-case path length is 12 hops.
There's a simple, regular network that supports more nodes (4608) with fewer
links (8 unidirectional into and out of each node) and a worst-case path length
of only 4 hops, significantly less than the *average* for the 12-d hypercube.
I'm not going to ruin the puzzle; you may amuse yourself by designing this
network yourself. This network has significantly fewer wires, yet it also has a
higher bisection bandwidth than the 12-d hypercube.
Post Edited (rokicki) : 12/10/2009 10:01:09 PM GMT