Shop OBEX P1 Docs P2 Docs Learn Events
Alternate Prop Chips - Page 2 — Parallax Forums

Alternate Prop Chips

2

Comments

  • mctriviamctrivia Posts: 3,772
    edited 2010-08-10 11:50
    The xmos is a pretty impressive looking processor. However it does not have anywhere near as nice of a forum to work with.

    What about ditching true parallel processing and go with a threaded processor like the Intel Atom? you can get dev kits relatively cheap and you then have access to a 1GHz+ chip with full floating point support.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-10 11:53
    Humanoido, whilst you were responding I was doing a little thought experiment. The results of which will probably get me excommunicated from here forever or worse.

    Gedankenexperiment, lets's assume:

    1) I have 1000 dollars to spend on chips. We will ignore the cost of boards, interconnects, power supplies etc for now.

    2) I want to build a Humanoido style "Towering Inferno" of processors.

    3) The aim is to have as many independent but communicating parallel processes as possible for the money.

    4) Each processes/thread/processor should have it's own RAM. It may be small but must be usefully big.

    5) Processes/threads/processors may not operate directly on each others RAM. They will communicate through defined mechanisms other than their RAM space.

    6) The purpose of all this is to experiment with parallel/distributed algorithms. In a CSP style.

    What are the options? I put two up for consideration.

    1) XMOS G4 device. 4 totally separate cores per chip. BUT each core can run 8 almost completely independent threads with their own deterministic timing. That's 50MIPS per thread (I believe) Using the C like language, XC, threads cannot share memory and must communicate through hardware channels. Those hardware channels also run between chips so communication from anywhere to anywhere in the Towering Inferno is dead easy.

    2) Propeller. No need to go into details here on this forum I think.

    So what could we have for our $1000 dollars?
                                 XMOS          Propeller
                                 ----          ---------
    Price per chip (Dollars)       16                  7
    Chips per $1000                62                142
    Total Processes              1984               1136 (See note 1)
    Total RAM (K Bytes)         15872               4544
    RAM per process (K bytes)       8                  4
    MIPS per process               50                 20 (See note 2)
    Total MIPS                  99200              22720 (See note 1 & 2)
    Dollar/MIP                   0.01               0.04 (See note 1 & 2)
    

    Note 1 - Prop can only do 20MIPS per process when running within the confines of a COG. Using data from the 4K RAM available will slow that considerably. When executing code from that RAM divide by a factor of 10 to 100.

    Note 2 - As the Prop has no native interprocessor communication perhaps 25% of the available processors (COGs) will be used for the communication function and unavailable for general use.

    Conclusion: My Towering Inferno, if there were ever to be such a project, is not going to be built out of Propellers. The cost/performance ratio is not good. The communication bandwidth between chips is abysmal and requires considerable effort to implement.

    Now, before the lynch mob come to take me away I hope they hear me say this:

    This thought experiment is in no way intended to entice anyone to drop their Props or stop Spinning. I for one do not intend to do so. It is an exploration of a rather obscure and extreme application. It totally overlooks the may other important aspects of the Prop and it's environment. The arguments presented here are in no way relevant to 99% of the uses that we see Prop put to here every day.

    I still maintain that this exercise in parallel experimentation is more economically done under simulation on a regular PC.

    May the lord have mercy on my soul.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-10 13:07
    Humanoido: "I wish Parallax would....at minimum, double the number of cogs in the next generation prop chip."

    This has been discussed at length a while ago. Problem is, apart from finding physical space to but more COGs, doubling the number of processors halves the bandwidth between COG and HUB RAM. I think many people took the view that this was not a good idea.

    The way to find more space for COGs is to reduce the amount of HUB RAM. Again when Chip put that trade off to the forum many agreed that it was preferable to have more RAM.

    Humanoido "chips are being released with 40 per chip and considerably above"

    Can you link us to some examples of that? We have been looking at XMOS in these last few posts. They have been working with a more recent, smaller scale technology. They have not worried so hard about power consumption. They have gone all out for cores/threads and speed. The result so far is only 4 cores. Who is doing 40 per chip or considerably above?
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-10 13:27
    Heater. wrote: »
    Humanoido, whilst you were responding I was doing a little thought experiment. Gedankenexperiment, lets's assume:...
    Conclusion: My Towering Inferno, if there were ever to be such a project, is not going to be built out of Propellers. The cost/performance ratio is not good. The communication bandwidth between chips is abysmal and requires considerable effort to implement... I still maintain that this exercise in parallel experimentation is more economically done under simulation on a regular PC.
    Heater: Sehr gutes Gedanke Experiment durch ein Genie. 非常由天才的好想法實驗。 I enjoyed comparison content of this thought analysis as it sheds light on many factors of consideration, and, in no way pales the bright light already shining on the prop chips. The original experiment is to find cheap simple low cost chips to build a cluster machine for developing parallel algorithms mainly for language development using a big propeller-based supercomputing machine. For the software emulation on the PC, it's a genuine possibility if easy to implement and you'll certainly get the programming guys on your side, but the dyed-in-the-wool hardware folks may still want hands-on stuff to throw some wire around.

    Humanoido
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-10 13:52
    mctrivia wrote: »
    Any prices? I see the INtel 48 is going out free to schools with no pricing to others. Wish I was a school right now... The NVidia is a graphics card, not a chip, to get that many cores. I see they have 352 processors for $229. Very interesting. I see they have a 1,000 core supercomputer product. The 12-core chip is pricey.
    http://www.guru3d.com/news/amd-12core-server-chip-pricing-/ "The Opteron X12 6164 runs at 1.7GHz and has 16MB of cache and it costs $797, while the 6136 is clocked at 2.4GHz, packs 18MB of cache and costs $808. The slower version runs on just 65W of power, while the 2.4GHz model consumes 105W, which sounds quite impressive. The X12 6176 is clocked at 2.3GHz and it has 18MB of cache, but at $1,491 it's the priciest of the lot. The 2.2GHz 6174 costs $1,250, while the 2.1GHz 6172 is priced at $1,060. The twelve cores will have reduced clock speeds but will not draw more power than 6-core versions. Even though the clock frequencies will fall, Magny-Cours chips will pack more performance compared to existing Opteron chips, Conway said. The larger cache and increased cores will make servers faster. The company's upcoming 12-core server chips, code-named Magny-Cours, put two six-core chips in one package. The same silicon is used in its six-core chips, code-named Istanbul, which are part of the Opteron line of server processors. Finer details in the Magny-Cours chip. Two six-core chips are connected by four hyper-threaded interconnects and are targeted at two- and four-socket servers, Conway said. It includes a total of 12MB of L3 cache, with each core supporting 512KB of L2 cache. The chips will be manufactured by AMD's spinoff, GlobalFoundries, using existing 45-nanometer technology. AMD is also working on a new x86 chip architecture code-named Bulldozer. The architecture will be used in chips manufactured using the 32-nm process in 2011. The company has scheduled a 16-core chip code-named Interlagos for release in 2011 [via computerworld]."

    Humanoido
  • mctriviamctrivia Posts: 3,772
    edited 2010-08-10 13:59
    Yes nvidia makes a graphics card but they made the chip that has all those cores on it also.
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-10 14:11
    mctrivia wrote: »
    Yes nvidia makes a graphics card but they made the chip that has all those cores on it also.
    Mctrivia, I was looking for the chip online with the pricing but didn't see it. Did you find it somewhere? Thanks.

    Humanoido
  • mctriviamctrivia Posts: 3,772
    edited 2010-08-10 14:20
    I don't believe they sell just the chip. But there are instructions on the web on how to use the card in a robot.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-10 15:05
    OK, Intel and Nvidea are seriously into the cores thing. Must remember that for my next super computer.

    Just how far off topic for this forum can we get?
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-08-10 16:14
    heater said:
    I may not come back but I would certainly send my past self a future copy of the Wall Street Journal :)

    My thoughts entirely except I only need a single page - over in OZ its the lotto results that I want sent back :D

    humanoido: If you use a very simple 8bit cpu (or 4bit) in an fpga and you use the XC3S1400A which has 1.4M gates you may get a lot of cpus in this chip. IIRC its cost is ~$50 but its in a BGA package. IIRC you can get an assembled demo pcb with one for ~$150.

    Now if you want to really use a biggie FPGA look at Xilinx Spartan or Virtex 6 chips. Altera have similar large FPGAs. Some of these chips sell for $10,000 each !!!

    LUTs/LEs/CLBs are all fairly similar. The newer Spartan/Virtex 6 have a larger LUT (an extra input+).

    Of course the biggest problem here will be the interconnect. There is always a 'little' problem to be overcome isn't there!

    Now for the exercise, IIRC the 486 had 1M transistors. The i7 has 1G (billion) transistors. As a guess the Z80 had maybe 100K transistors. Maybe use 4 transistors per gate for cals???

    Now.... back to reality....
  • LeonLeon Posts: 7,620
    edited 2010-08-10 16:29
    The Z80 had a lot fewer than 100,000 transistors. The later MC68000 only had about 68,000, and Intel claimed that Motorola wouldn't be able to manufacture it, with the technology available at the time. They looked a bit silly when Motorola started shipping working silicon.

    I've just seen that the Z80 actually had about 8,500 transistors.
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-10 22:15
    For those of you interested in building your supercomputer machine by using a PC simulation...

    http://www.rcac.purdue.edu/rackanode/

    a bit of humor...

    the link was brought to my attention today by SSchaffer.

    Humanoido
  • $WMc%$WMc% Posts: 1,884
    edited 2010-08-11 10:43
    Humanoido

    Why not a Prop.? < then $10 bucks each, Really fast,32 X 8 processing

    I can see a high speed 64bit front side bus with a few Propellers.

    We now have PropBasic to run the Prop at high speed.
    I run serial @ 230k4Buad w/ a 5mHz xtal
    Find a chip a $1, $2, or $5 chip that will run @ 230K4Buad

    You can over-clock the Prop. to 6.25mHz and really roll with it.

    I've been following Your posts for years, I find them in-lighting and different from the norm.And sometimes Brilliant.
    I'm sure what ever You come up with will be awesome!

    I'm just trowing out some ideas!
  • mctriviamctrivia Posts: 3,772
    edited 2010-08-11 11:43
    ok enough time wasted playing that silly game. got on the high scores now to go do more work.
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-08-11 14:25
    Leon: I didn't realise just how far we have come. 8,500 transistors for a Z80.
    Just what do those 1,000,000,000 transistors do in an Intel i7 do?? I know a large lump are for cache memory. Just goes to show how much windoze and bitmapped graphics really use.

    So an FPGA with 1,000,000 gates should be able to emulate quite a number of Z80's so a much simpler 8bit CPU should be able to do a lot more.
  • LeonLeon Posts: 7,620
    edited 2010-08-11 15:08
    I've played with simple CPUs in a small Altera Flex 10K FPGA, and John Kent's 6909 running on a Xilinx Spartan3 with external SRAM:

    http://members.optushome.com.au/jekent/system09/index.html

    I find it difficult to get my head round what's inside those very large chips like the latest Intel CPUs, like you.
  • pharseidpharseid Posts: 192
    edited 2010-08-11 16:42
    If you choose the Java simulation route, there would seem to be a direct route to an FPGA implementation. There are a couple freeware VHDL Java engine designs, including this one:
    http://www.jopdesign.com/perf.jsp . It's probably going to take some work to get threads distributed over multiple processors though.

    Also be aware that on really big systems, you're not going to be able to exploit concurency through multi-threading. That would imply a whole bunch of processors sharing the same physical memory. I think the most popular paradigm for dealing with that at the moment is distributed shared memory and I think there are a couple of Java implementations of that. You can run that on a garden variety network.

    -phar
  • Kevin WoodKevin Wood Posts: 1,266
    edited 2010-08-11 22:19
    I personally consider the Propeller a parallel/serial hyrid, since the cogs/io pins can work in parallel, while hub operations are fundamentally serial. I think that the hub is actually a design limitation/compromise that ultimately prevents the Propeller from reaching it's full potential.

    As for the Java remarks, IMO there are far better language choices for modelling parallel systems.
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-12 01:20
    $WMc% wrote: »
    Humanoido Why not a Prop.? < then $10 bucks each, Really fast,32 X 8 processing I can see a high speed 64bit front side bus with a few Propellers. We now have PropBasic to run the Prop at high speed. I run serial @ 230k4Buad w/ a 5mHz xtal Find a chip a $1, $2, or $5 chip that will run @ 230K4Buad You can over-clock the Prop. to 6.25mHz and really roll with it. I've been following Your posts for years, I find them in-lighting and different from the norm.And sometimes Brilliant. I'm sure what ever You come up with will be awesome! I'm just trowing out some ideas!
    $WMc%, thanks. :) I was just thinking about your idea to use the Prop chip to make the tester to test and develop parallel algorithms for Prop "super computing machines." What caught my attention was the cost divide ratio. When the chip costs $8 and there's eight cogs, it makes each cog/risc processor cost $1. By the time you find another brand processor for less than a dollar, do the heavy learning curve to get up to par, obtain support and hardware developer platforms, and build the machine, a lot of time/effort/energy is consumed. As you already know, we can do far more in this analogy by using Prop chips to build our machine to develop Prop machines. But wait one moment! If we use the prop to develop the machine, then we already have the machine and can develop our software on the actual machine. That's going to be very efficient and will save the time and cost of building a separate testing machine. I think that's a somewhat ironic conclusion.

    Humanoido
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-12 01:29
    mctrivia wrote: »
    ok enough time wasted playing that silly game. got on the high scores now to go do more work.
    Mctrivia, consider yourself in a league of your own - after the original posting with the link some time ago, numerous people contacted me requesting details on how to play the game and how to get it to work. It's also a reminder that running simulation software for a cluster built up from many little software-only computers on the PC can be a very interesting project for those who are software minded.

    Humanoido
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-12 01:38
    Heater. wrote: »
    OK, Intel and Nvidea are seriously into the cores thing. Must remember that for my next super computer. Just how far off topic for this forum can we get?
    Heater, it's on topic. :) Consider that information can be direct and indirect. Indirect information can lead to new ideas and solutions, so in these posts everything is a consideration.

    BTW, I'll check those links now for the 40-processor chip, and the man that put together enough of these chips to make a supercomputer with about 360 cores. The reason this is so interesting is because his supercomputer is only slightly larger than the Propeller-based UltraSpark 40 which has 320 risc processors. He has some very interesting applications that fit parallel processing listed at the site too. There is also a post at another forum that detailed his intentions which summarized the man and the supercomputer. I'll see about putting the information together.

    Humanoido
  • Heater.Heater. Posts: 21,230
    edited 2010-08-12 01:40
    Colour me confused.

    Perhaps I have not been paying attention and missed the entire point of this thread. You have been looking for processors in the less than $1 range so as to build a cluster of them on which to run experimental parallel algorithms which will ultimately be run on a cluster of Propellers.

    How am I doing?

    But, your avatar shows a towering inferno of what I have understood to be Propeller boards.

    So why are you not just using that? Is that towering inferno for real or not? Does it work?
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-12 04:35
    Heater. wrote: »
    Colour me confused. Perhaps I have not been paying attention and missed the entire point of this thread. You have been looking for processors in the less than $1 range so as to build a cluster of them on which to run experimental parallel algorithms which will ultimately be run on a cluster of Propellers. How am I doing? But, your avatar shows a towering inferno of what I have understood to be Propeller boards. So why are you not just using that? Is that towering inferno for real or not? Does it work?
    1) The project is to build a low cost cluster to develop parallel algorithms.

    2) The avatar shows the UltraSpark 40 which is a real working Propeller supercomputer.

    40 Props in a Skyscraper
    http://forums.parallax.com/showthread.php?t=123828


    3) There are two finite states of this supercomputer. Either it is in use, or it's apart being reconfigured. So the access is not 24/7. It is, most of the time, being reconfigured.

    4) I want to double my efforts in terms of time and work accomplished. This is the reason for a second machine.

    5) The reason for looking at very low cost chips for the new machine is because its budget is small and at hobby level.

    6) There's a new project completed - a cluster of 2 stamps with a bus. It was used it as a tester but now a larger one is required. An extra machine would help by being totally dedicated to the development of algorithms only, i.e. an algorithm purposed cluster.

    Tiny Tester for Developing Parallel Algorithms
    http://forums.parallax.com/showthread.php?t=124433


    7) One thought is to simply obtain more prop chips and build a smaller machine, as we have now shown that props are cost effective when you take other factors into account.


    Humanoido
  • LeonLeon Posts: 7,620
    edited 2010-08-12 04:51
    How about using Chuck Moore's new chips:

    http://greenarraychips.com/home/products/index.html

    I've got a couple of his earlier S40 chips, with 40 processors. Chuck fell out with the company making them, and it doesn't look like they are going into large-scale production.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-12 05:03
    Humanoid,

    Re: your points,

    1) OK.
    2) Excellent. Glad to hear it.
    3) You are doing well, most of my projects only have one state, the broken and under repair state:) Or at least that's how it feels sometimes.
    4) OK.
    5) Totally with you there.
    6) Check.
    7) Not quite.

    The problem with 7) is that whilst Props may be a cheap way to get a lot of independent processes/threads my post here http://forums.parallax.com/showpost.php?p=930082&postcount=33 seems to indicate that there is something even cheaper and easier to use for this particular application.

    On the other hand as you already have one towering inferno perhaps more of the same is the way to go rather than having to get familiar with a whole new system. Plus when the original is in it's operative state then you have the chance to run both together as one big machine.

    Now, how are all those Props communicating? How are you routing messages around the system?
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-12 05:10
    Heater. wrote: »
    Humanoido "chips are being released with 40 per chip and considerably above" Can you link us to some examples of that? We have been looking at XMOS in these last few posts. They have been working with a more recent, smaller scale technology. They have not worried so hard about power consumption. They have gone all out for cores/threads and speed. The result so far is only 4 cores. Who is doing 40 per chip or considerably above?
    Heater, go over to the GreenArrays family of chips at
    http://greenarraychips.com/home/products/index.html
    and scroll down to the GA40 - a "40-computers" chip fabricated at 130 n.

    Next you will see links to the Haypress Creek supercomputer, "created by Chuck Moore - the guy that invented Moore's Law" (see sources below).

    Haypress Creek supercomputer (3.0 x 2.5 inch circuit board) 9 S40 multi-computer chips, each with 40 c18 computers. Total of 360 computers running at 700 Mips...

    And this is what is being blogged at the forum:

    [FONT=&quot]http://mackys.livejournal.com/803843.html [/FONT]"The TL;DR version is that Chuck Moore, the guy who made Moore's Law, decided that it would be cool to make a low-power, many-core CPU. So he designed this s40c18 chip, which has 40 cores (ten times more than what Intel is shipping) and got this company called Intellasys to make the actual silicon. But the guy above wasn't happy with "only" 40 cores, so he made a circuit board with 9 of the chips on it, for a total of 360 cores. Which is shockingly big iron. Usually computers with that many CPUs are still big enough to fill a whole room, even with today's technology. This one fits on a circuit board about the size of a large paperback book. Also, power consumption is shockingly low. Intel's top end CPUs are taking ~130W of power per CPU, or about 40W per core in a 4 core chip. An s40c18 running hard will only take about 3W for the whole chip. It's like 0.07W per core. (Quote from the above URL: "Hey DARPA, here's your 1 W supercomputer!") It's not quite an apples/apples comparison to Intel CPUs though. First, the s40c18 only runs Forth, a programming language that almost nobody knows. (It's a very easy language to learn, though.) And it runs at ~0.9 GHz, as compared to 3 GHz for Intel's top of the line. No hardware memory protection for multi-process security. No heavy duty caches to speed up memory access. It's the Lotus 7 of cpus. Small, light, stripped down to the bare essentials, and fast as hell. But no luxuries and not very comfortable, so you might not want to commute every day in it..."
    http://mackys.livejournal.com/803843.html
    "Also, preliminary pricing on the chips themselves is about $48 per (unless you buy 90 or more.)"


    I like to compare the Propeller UltraSpark 40 Supercomputer with the Haypress Creek Supercomputer. The former uses 40 prop chips for 320 computers while the latter uses 9 chips for 360 computers. On a processor for processor basis, the Propeller is lower cost.

    For more information, check the Propeller Supercomputer at
    http://forums.parallax.com/showthread.php?t=123828

    and the Haypress Creek Weblog at
    https://covermy.info/.i/_3D_3DQb0hmLzNXZyBXehh2Lt92YugGdy9mZy9GbvNmL3d3dv8iO75

    Humanoido
  • Heater.Heater. Posts: 21,230
    edited 2010-08-12 05:22
    Er no.

    Moore's Law comes from a paper written by Gordon E. Moore in 1965.
    That is Gordon More as in Intel.

    The GreenArrays chip is interesting. Is it actually possible to buy one? Not sure I have the patients for Forth though.
  • LeonLeon Posts: 7,620
    edited 2010-08-12 05:28
    They aren't available yet, Chuck is still testing prototypes.

    Yes, Forth is something of an acquired taste. I know someone who uses it for everything.

    The problem with connecting lots of Propeller chips together is that there is no suitable language for programming them; something like the XC language used for programming XMOS chips is required. That needs to be developed first.
Sign In or Register to comment.