help!!! from pro regarding multiprocessing

Dev · 2012-02-01 20:35

Ok so im a total "noob" when it comes to this chip. A buddy and me stumbled upon this chip and had a concept using it. now regarding the concept we cant really openly discuses it but we need help using 16 of these together (multiprocessing) any help would be nice thnx

dev

jvrproductions · 2012-02-01 20:37

You want to connect 16 chips together?

Dev · 2012-02-01 20:38

yes and run them as one unit

Circuitsoft · 2012-02-01 20:48

What are you doing with 128 threads, how much communication is necessary between threads, and how much I/O do you need?

Dev · 2012-02-01 20:49

to clarify im talking about 16 props running as one big 128 core unit no master or slaves

Dev · 2012-02-01 20:51

Circuitsoft wrote: »

What are you doing with 128 threads, how much communication is necessary between threads, and how much I/O do you need?

high thread aplications

as much as possible communication with about 2 I/O (1in 1out) per prop

obrienm · 2012-02-01 20:53

Dev,
Hi, there are several developers (at least 10) in this forum working on all or pieces of this problem since 2007 (program loading, data insertion/extraction, message routing etc..). Note: The chip is not powerfull enough even using 10's with all 8 cores running assembly to outperform a single core i7 or any CUDA/Stream processor for example. That said, the chip is excellent for academic exploration of SIMD, MIMD configurations of any type of linear, grid or hypercube connection topology. The specific advantage of the propeller chip is that it has all the pieces of a larger more expensive system (symmetric multiprocessing, shared-memory contention, open connection topology) without the overhead - allowing real experimentation outside of a corporate research lab.
Some block diagrams to start the ideas flowing....(half of a 4-dim hypercube combined NEWS and linear 74595 load / 74165 read, 2-dim grid)

If you would like the specific forum members that "enthusiastically to say the least" use the chip primarily for its parallel processing abilities (as a research vehicle) - to focus on your problem - I find that specific technical details on your project assist greatly with our clarity of vision.

/michael

Dev · 2012-02-01 20:58

16 is only 1 of 4 layers to be used

this is to be used as a high thread processor great for certain things a normal processor isnt

Dev · 2012-02-01 21:14

obrienm wrote: »

If you would like the specific forum members that "enthusiastically to say the least" use the chip primarily for its parallel processing abilities (as a research vehicle) - to focus on your problem - I find that specific technical details on your project assist greatly with our clarity of vision.
/michael

the technical details would be
a 16 prop 128core processing unit that is part of a set of 4 that are going to be used as a high powered parallel processing unit for portable "processing" device. to say the least

obrienm · 2012-02-01 21:19

Dev,
I see - a star network where any thread-thread comms if required go through the host processor. No router cores or front backend controller chips or XCore (XK-XMP-64 - 512 thread) or Ambric (336 thread) type comms will mean a serial connection to every chip - this is a simpler configuration to work out as most of the project will be in the multithreaded controller on the host pc - written in java, C##, Objective C. etc...
This is entirely doable, as you can use propellent and your lets say javax.comm based software to control as many chips as you want. Propellent can bootstrap the same spin/pasm code into all the chips in sequence using an Ant XML script. The host comm based software can load/read data at USB 2.0 speeds.
I would expect then that the problem domain will be constrained by the comm port speed - so problems where each core processes for at least 1K cycles will be required to keep the comm traffic down. There is no routing hardware or specialized uart circuitry - so you will have to dedicate 1 or 2 cores to comm port communication - leaving 6 for general threads.
The best setup at this time requiring the least custom breadboard or board work - is the quickstart board.
Even though your topology is not a hypercube - you could consult the "bible" and read through Danny Hillis's PHD report that Thinking Machines's CM-1 was based off of. A sort of scaled version (a single quad) of the 4096 (16-core 1-bit processor) chips is possible.

Your project is very interesting. I have POC'd 8 chips (48 cores) so far and see no issues with the 8 USB sessions running in parallel via javax.comm on a 32-bit Win32 JVM. My personal target is a KiloCore - which may come close to a fraction of the performance of a single thread on an Intel Core-i7 2600.

MIMD star network 48 cores - coprocessor

SIMD grid network of 16 cores - coprocessor

I will await the word of a couple forum members that I respect once they see the thread.

thank you for the details.
/Michael

Dev · 2012-02-01 21:26

this is going to be fully custom so i already plan on custom pcb design

jazzed · 2012-02-01 22:18

Maybe 4 of these on a gadget gangster baseboard board just to get started (17 Propellers) ?
http://forums.parallax.com/showthread.php?131538-50-TetraProp(tm)-Boards-now-ready-to-ship-!&p=998005&viewfull=1#post998005

Nice stack of QuickStarts there obrienm.

Heater. · 2012-02-02 07:36

dev,

Perhaps we need to know something of your requirements before being able to advise very well. I can understand your not wanting to disclose the complete idea but perhaps some hints about processing power required, memory requirements, I/O requirements (amount, type, speed) event handling requirements.

This idea of building multi-propeller systems has come up a few times over the years have a search on the forums for what has been achieved.

But here is my take:

There are only a few reasons for wanting to bolt together dozens of Props.
1) To get more processing power.
2) To get more memory for bigger application code/data.
3) To get a lot of I/O pins.
4) To be able to handle a lot of asynchronous events.

In my view 1) and 2) are non starters with the Propeller. The Prop is basically a slow mcu by modern standards. By the time you have laced a few of them together you will be in such expense and complexity that it would be easier/cheaper to do with a different faster device with more memory. Don't forget that Spin is interpreted and therefore very slow. One can do big PASM code by using LMM methods (or the various C compilers available now) but then the code gets bigger (Spin byte code is very compact) and you run out of RAM sooner. Also LMM results in an execution speed over 4 times slower than running PASM in a COG. All in all for big code and/or more speed it's better to select a more suitable device.

3) and 4) are were multiple Props may shine in some special applications. Do you have such a requirement for lots of I/O and events? If you need to use a lot of pins connected to a lot of event processing that can run as native PASM within a COG then you are giving other systems a run for their money.

Dev · 2012-02-02 13:20

Heater. wrote: »

dev,

Perhaps we need to know something of your requirements before being able to advise very well. I can understand your not wanting to disclose the complete idea but perhaps some hints about processing power required, memory requirements, I/O requirements (amount, type, speed) event handling requirements.

This idea of building multi-propeller systems has come up a few times over the years have a search on the forums for what has been achieved.

But here is my take:

There are only a few reasons for wanting to bolt together dozens of Props.
1) To get more processing power.
2) To get more memory for bigger application code/data.
3) To get a lot of I/O pins.
4) To be able to handle a lot of asynchronous events.

In my view 1) and 2) are non starters with the Propeller. The Prop is basically a slow mcu by modern standards. By the time you have laced a few of them together you will be in such expense and complexity that it would be easier/cheaper to do with a different faster device with more memory. Don't forget that Spin is interpreted and therefore very slow. One can do big PASM code by using LMM methods (or the various C compilers available now) but then the code gets bigger (Spin byte code is very compact) and you run out of RAM sooner. Also LMM results in an execution speed over 4 times slower than running PASM in a COG. All in all for big code and/or more speed it's better to select a more suitable device.

3) and 4) are were multiple Props may shine in some special applications. Do you have such a requirement for lots of I/O and events? If you need to use a lot of pins connected to a lot of event processing that can run as native PASM within a COG then you are giving other systems a run for their money.

to simplify what i need. a large amount of parallel processing. in a portable low powered device. i need enuff pins for 3 usb ports 2 Ethernet ports and a screen. thats about it possibly a sata as well

pedward · 2012-02-02 14:16

Good luck with SATA. You should look at Humanoido's posts, he has built some boards with many props for a "brain".

Circuitsoft · 2012-02-02 15:11

Dev wrote: »

to simplify what i need. a large amount of parallel processing. in a portable low powered device. i need enuff pins for 3 usb ports 2 Ethernet ports and a screen. thats about it possibly a sata as well

Honestly, you'll be much better off with an ARM than the prop. The prop is not good at doing heavy processing. It is good at quickly reacting to external stimulus. Look at the Gumstix Stagecoach. Load it up with Overo WaterStorm modules, and you'll be good to go.

Humanoido · 2012-02-03 00:16

Dev wrote: »

Ok so im a total "noob" when it comes to this chip. A buddy and me stumbled upon this chip and had a concept using it. now regarding the concept we cant really openly discuses it but we need help using 16 of these together (multiprocessing) any help would be nice thnx dev

Dev, in 2002 I began developing multi chip machines using Parallax BASIC Stamps leading up to a Propeller Big Brain. A record of machine development shows over 300 machines. One of the current machine configurations has over 100 props, with cogs plus processors that has reached into tens of thousands. Machine specifications describe some of the HW, SW and apps. Various posts here and here can be reviewed for specific information. Studies have shown the Propeller is a powerful chip and when placed in multi parallel configurations has unlimited capability. Photos

Dev · 2012-02-05 20:02

Humanoido wrote: »

Dev, in 2002 I began developing multi chip machines using Parallax BASIC Stamps leading up to a Propeller Big Brain. A record of machine development shows over 300 machines. One of the current machine configurations has over 100 props, with cogs plus processors that has reached into tens of thousands. Machine specifications describe some of the HW, SW and apps. Various posts here and here can be reviewed for specific information. Studies have shown the Propeller is a powerful chip and when placed in multi parallel configurations has unlimited capability. Photos

this helps a lot! thanks.

im testing a few theories at the moment that could be useful to everyone that's doing powerful applications. so far its gathering excellent!!!! results. When i finish i shall post the findings if they are good

help!!! from pro regarding multiprocessing

Comments