3D Propeller

Humanoido · 2009-12-09 01:53

I've connected Stamps together in network fashion which is more linear than three dimensional. Now I have an experimental three dimensional array of Propeller chips but want to know the best way to connect all of these together. I gather that one way is simple just to demonstrate the concept while the other way is more complicated but more efficient. Go with simple the first time is is best, I think. Ok, all you master gurus, how about some ideas about how to simply connect a 2 x 2 x 2 or a 3 x 3 x 3 array in three dimensions? For example, if serial pin ports are connected, the vertice from one plane will need to communicate with the other vertice planes.

humanoido

Post Edited (humanoido) : 12/9/2009 2:50:42 PM GMT

Clock Loop · 2009-12-09 02:01

Here is one idea.

http://forums.parallax.com/showthread.php?p=835128

Dr_Acula · 2009-12-09 04:10

3x3x3? 27 props, are you building a super computer or something?!

Is this to get the processing speed up or to have more inputs/outputs or some other purpose? I guess that helps determine whether you have a single linear 'ethernet' topology, a 'star' topology or a token ring or something else (eg the middle prop in a 3x3x3 is the router??).

What sort of plugs will you use? Is it ok if it is large or does it have to be tiny? How do you reprogram all the props? Will it be mechanically robust? Will you be wiring it up by hand or will there be PBCs?

I'm not sure if I have all the answers but it sounds a fun project. Is this for robotics or something?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller

Humanoido · 2009-12-09 05:30

Clock Loop: Thanks for the link. That's a good linear approach as one prop, say in the middle, can communicate with its left or right partner. What I am thinking, is a modification to that concept, in 3D, i.e. the middle prop can communicate with any prop in the cube, and no need to go through its neighbors to reach the edge props.

Humanoido · 2009-12-09 05:44

Dr_Acula: actually I started thinking about this a long time ago and had some success on a small scale with BASIC Stamps. I planned to build one on a larger scale but ran into challenges with huge amounts of power that need to be distributed to all the chips. I recently solved that challenge with the Propeller chip's design.

What should I call it? Fun project? Supercomputer? New cube design? Brain for robot? All the above? You tell me. [noparse]:)[/noparse] Since this is the first one, it can be big. It can be robust or just plain work to show the concept. The prototypes here are all wired by hand on solderless breadboard.

I think it needs a mathematical multiple three dimensional array topology. I was not thinking of one master in the center of a linear or star network on one serial pin. Mainly I want a corner prop in a cube to communicate with all other corners or any prop within the cube.

Michael O'Brien · 2009-12-09 07:17

Humanoido,
A good design to start with is the CM-1/CM-2 from Thinking Machines (AKA the father of massive parallelization - Danny Hillis's 64k PU PHD thesis) - a 12 dimensional hypercube with a 1-bit word size that had customers until the Berlin wall fell back in 89. The design is sound and is based on an improved architecture of the ILLIAC IV last seen running at Ames.
In the CM case each chip had 16 PU's and 1 router - some propeller emulation of this design could use 2 PU props + 1/8 controller prop or use 7 of the cogs in a 3-level tree structure with the 8th acting as an off chip router to other prop subtrees.
The connection structure could be the following with some sort of complexity/bandwidth tradeoff - bus, ring, binary-tree, b-tree, 2d-mesh, 3d-mesh, hypercube - but, a fully connected network would have an exponential # of wires - Danny stated it well in his book - that eventually we end up with no more room for wires after a certain # of nodes and the cost of the actual propeller chips is miniscule.
However, an excellent project - I would expect that we can power up 128 cogs and aim for 512 to selectively solve·some·"embarrassingly parallel" problems.

/michael

Post Edited (Michael O'Brien) : 12/9/2009 7:22:49 AM GMT

Humanoido · 2009-12-09 08:03

Michael O'Brien: thanks for this great find - you can imagine the surprise when I found details on how to construct this computer, published in a paper by the original inventor!

Above is the 12 dimensional hypercube. It was the design used in the CM-2 supercomputer that had 4,096 chips. The structure as you can see, is a cube with 2 to the power of 12 corners. Each chip connects to 12 other chips. One connection represented one dimension in the cube.

I think some modification or cousin to this design could incorporate all eight internal cogs of each chip in some manner of dimension as well. Otherwise, the full 4,096 propeller chips will be a tad dense in wiring in the cube. Dividing by eight cogs yields 512 "processors."

Humanoido · 2009-12-09 13:41

One thought is the hardware representation of the common three dimensional array, typically found in computer programming software. The idea is to hard wire some form of this, so that access of the software example is the same as access of the hardware.

In another idea, if one processor needs to access any processor in the array, the number of wires to do that may be excessive for that many propeller chips. A different approach is needed, and could be represented in terms similar our Basic Stamp supercomputer designs, i.e. one processor talking at one time. If we go with this approach, one processor can talk to all processors at the same time, and these "listener" processors can be located at any place in the dimensional cube. Any thoughts on this?

jazzed · 2009-12-09 14:48

humanoido said...
... Any thoughts on this?

I've already provided a hardware "connection" solution and driver code for theoretically up to 256 Propellers with asynchronous messaging ability and 5MB/s point to point access (1 primary up to 255 secondary devices). The limits of the actual number of Propellers usable is unknown, but my development system which requires one download and for programming has 5 and runs off a 3.6V Nicad battery pack.

If you don't mind token passing like Clock_Loop suggests, then you can use his design or mctrivia's 4 port serial "quilt" board.

In any case, great skill is required for programming such configurations. Do you have what it takes?

Leon · 2009-12-09 15:50

There are languages like occam and XC that were designed for parallel systems like that.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM

mctrivia · 2009-12-09 16:24

For 2d I designed a pcb I call the infi prop allows limitless number of props. The 12d cube would be cool to build though I would make a stackable pcb design.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $21.99 has backlight driver and touch sensitive decoder.

Leon · 2009-12-09 16:34

I think that you will find that any gains in performance will be offset by communications latency. A single fast chip will outperform a system like that.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM

pharseid · 2009-12-09 19:48

An advantage of the Connection Machine and other massively parallel SIMD machines is that you really didn't need a communication protocol. Every processing element outputs simultaneously and inputs simultaneously, so there's no handshaking. The Connection Machine was unusually among designs in that the routing unit would route to arbitrary addresses, but this still worked because it was guaranteed to deliver its message in a set number of cycles. Other SIMD machines like the Massively Parallel Processor and the Distributed Array of Processors had nearest neighbor connectivity.

Hillis mentioned that eventually parallel machines would have a nearest neighbor 3D topology. So North,South,East,West,Up and Down, I suppose.

-phar

Leon · 2009-12-09 19:57

The transputer had a link switching chip, making it very easy to use application-specific topologies configured in software such as a systolic array or a pipeline.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM

Post Edited (Leon) : 12/9/2009 8:11:04 PM GMT

pharseid · 2009-12-09 21:52

I dimly remember that transputer link switching chip. The mention of topologies should be noted here, because there are definitely horses for courses when it comes to topologies. To give simple examples, a hypercube topology like the Connection Machine pretty much perfectly maps a parallel quicksort. You find an estimate of the mean of the key, subdivide the hypercube into 2 smaller hypercubes and move the data elements into the appropriate sub-hypercube based on a comparison with the mean. Then just continue subdividing the hypercubes and the data set. I think that sorts in something like O(log2 n). But if you're doing something like a parallel screen buffer, a 2D mesh like the Massively Parallel Processor is a much better match.

-phar

Leon · 2009-12-09 22:16

It was the C004 chip, IIRC. I had two of them on my 16-module motherboard, giving lots of different topologies. 320 MIPS really impressed people 24 years ago. It cost about £13,000, at the time!

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM

Post Edited (Leon) : 12/9/2009 10:21:38 PM GMT

VIRAND · 2009-12-09 23:46

For all practical purposes, the connections of the 12-D cube computers are ethernet cables.
If only 12 per node are necessary, then I think you only need 12x11x10x9x8x7x6x5x4x3x2x1
wires instead of 2^4096, although otherwise only 2 or 4096 wires should be sufficient to connect
them all together. What do you use this 12-dimensional computer for? Simulating the Universe
according to String Theory?

"I dimly remember that transputer link switching chip. The mention of topologies should be noted here, because there are definitely horses for courses when it comes to topologies. To give simple examples, a hypercube topology like the Connection Machine pretty much perfectly maps a parallel quicksort. You find an estimate of the mean of the key, subdivide the hypercube into 2 smaller hypercubes and move the data elements into the appropriate sub-hypercube based on a comparison with the mean. Then just continue subdividing the hypercubes and the data set. I think that sorts in something like O(log2 n). But if you're doing something like a parallel screen buffer, a 2D mesh like the Massively Parallel Processor is a much better match.

-phar"

The set of all digital sounds looks in 4D like a honeycomb with a star at each hexagonal angle from all 8 of its axial directions.
I suppose that the hypercube topology would make the ultimate ipod if it in any way helps tell the difference between
all those musical stars, since the space is fully predefined by such a simple pattern as blue crosses on graph paper in 4D.
(The space is also defined simply in 0D,1D,and 2D, with the most confusion probably in 3D).

Post Edited (VIRAND) : 12/10/2009 12:02:41 AM GMT

Humanoido · 2009-12-10 13:42

VIRAND said...
For all practical purposes, the connections of the 12-D cube computers are ethernet cables.
If only 12 per node are necessary, then I think you only need 12x11x10x9x8x7x6x5x4x3x2x1
wires instead of 2^4096, although otherwise only 2 or 4096 wires should be sufficient to connect
them all together. What do you use this 12-dimensional computer for? Simulating the Universe
according to String Theory?

That was the point made earlier as I think that's a lot of wiring, too much, especially in a cramped space. I have a very small working module that does not use wire. Although it's not a folded hypercube, the HS, if expanded, may run supercomputing applications depending on the software. In other words, software could handle the connections in place of 4,096 wires. It seems to be one practical way of doing large numbers of interconnections. Though it still needs a software protocol to prevent crosstalk which, at the moment, seems like an exorbitantly large project. Hope to come up with some ideas to trim it. Maybe supercomputers are just this way...

pharseid · 2009-12-10 15:30

I'm curious about how many Props you expect to use. For the number you mention in your first post, a max of 27, you could do all connections on an FPGA, although it might not be cost efficient. It would probably be better to use multiple smaller FPGA's and have each handle a couple dimensions (if you're sold on the hypercube approach). Probably as important as connectivity, the FPGA's memory could be used as FIFO's to buffer data packets and allow faster communication.

-phar

mctrivia · 2009-12-10 16:29

Did some quick math.

4096 hyper cube of prop2s I could route on 16 10x10 pcb and would cost 50g. Would be capable of 5TIPs and have 300000 spare io.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $21.99 has backlight driver and touch sensitive decoder.

pharseid · 2009-12-10 18:08

I'm not a Prop person, so I'm not sure this is doable, but my understanding is that the timing of a cog's access to the I/O pins is deterministic. I would think that a clever FPGA implementation would be able to recognize which cog was accessing it by its timing. In that way you would be able to have multiple packet transfers running at a time, octupling the communication bandwidth. This isn't specific to this problem, of course, any Prop-FPGA project might benefit from it. Has this been done?

mctrivia · 2009-12-10 18:24

Every cog can communicate on every cycle. To allow multiple cogs to use the same pin each cog would have to write 0 after every 1. You could get some bandwidth increase but not 8x.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $21.99 has backlight driver and touch sensitive decoder.

jazzed · 2009-12-10 18:54

mctrivia said...
Every cog can communicate on every cycle. ...

Umm, I think he meant clock cycle.

A cog only needs to *assert* a signal for the FPGA to know from the clock if the cog is active. Using that (with some init), data can be mux-latched onto a bus so that a bus tree can be created. Then an Internet IPv4-like addressing model could be used. One could address a device based on hostid like x.x.x.x ... such that x > 0 and x < 8 and use 0 for multicast.

heater · 2009-12-10 19:53

I'd be happy with a 4D Hypercube.

Only 16 Props each with four 2 wire serial links to neighbours.
That uses only 8 pins used for links in each Prop.
Then the longest path from any Prop to any other Prop is only 4 hops.
Arrange that communications everywhere is bit synchronized.
That way you can clock the 4 bits out of the 4 outputs in 4 different directions at the same time from one COG.
Similarly the 4 incoming bits from 4 different links can be sampled at the same time by one COG.
Probably want to shift data in LONGS rather than bytes for efficiency.
Should be as fast handling each link in isolation like a normal serial links whilst saving on the number of COGs required to do it.

Build the thing on 16 dinky individual boards fitted with appropriate serial connectors. I'm thinking use USB cables to "patch board" wire the hypercube up and mini USB connectors.

Don't forget that each Pro has to handle the routing of data through to neighbours so that's probably another COG in each Prop used in the routing function.

End result: 16 Props with 20 free pins each and 5 COG free for application code.

That's 320 I/O pins for external goodies and 1600 MIPs of go power.

When programming this monstrosity has got the better of you then you have 16 dinky little boards for other purposes.

Just thinking out loud here.....

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

rokicki · 2009-12-10 21:56

I've done a fair amount of work with MPP topologies like this. Hypercubes are nice and
regular and work well if the problem maps well to them, but they have a very large
"average path length" and a relatively low bisection bandwidth taking into account
the number of links required.

Consider the 4096-node network with 12 bidirectional links between nodes; the
average path length here is 6 hops and the worst-case path length is 12 hops.

There's a simple, regular network that supports more nodes (4608) with fewer
links (8 unidirectional into and out of each node) and a worst-case path length
of only 4 hops, significantly less than the *average* for the 12-d hypercube.
I'm not going to ruin the puzzle; you may amuse yourself by designing this
network yourself. This network has significantly fewer wires, yet it also has a
higher bisection bandwidth than the 12-d hypercube.

Post Edited (rokicki) : 12/10/2009 10:01:09 PM GMT

3D Propeller

Comments