I don't think we can compare a Prop and GreenArrays so easily.
GreenArrays super computer:
360 computers running at 700 Mips or 250 Gips.
Propeller UltraSpark 40:
320 computers running at 20MIPs or 6.4 Gips.
Makes the Prop look pretty sad does it not?
But wait, those green array processors are only 18 bit and each one only has 64 words of RAM. If anyone thought COGs were memory constrained, this is another world. So, like the Prop it needs an interpreter to be able to code anything remotely complicated. In this case they use Forth. However GreenArrays requires a hand full of processors to run the interpreter.
I don't think we can compare a Prop and GreenArrays so easily. ...those green array processors are only 18 bit and each one only has 64 words of RAM. If anyone thought COGs were memory constrained, this is another world. So, like the Prop it needs an interpreter to be able to code anything remotely complicated. In this case they use Forth. However GreenArrays requires a hand full of processors to run the interpreter.
Heater, good observations, it's how we compare these supercomputers, they each have advantages and disadvantages: The GreenArrays take extra processors to run the interpreter - that reduces real world computational speed as those processors are occupied. Here's some other related information. Propeller UltraSpark 40 supercomputer: Controller Ports: 1,280 and Counters: 640. GreenArrays supercomputer: Controller Ports: none and/or undefined, Counters: none and/or undefined. Makes the Prop more powerful. One idea here is robotic humanoid applications where the Prop supercomputer is going to give other machines some challenge when it comes to managing over a thousand sensor inputs. Both these machines will have their well suited apps.
The problem with connecting lots of Propeller chips together is that there is no suitable language for programming them; something like the XC language used for programming XMOS chips is required. That needs to be developed first.
Leon, good point, as far as I know, at least one person is working on a new Propeller "parallel" language for programming the UltraSpark 40 Prop supercomputer.. There are candidate "parallel" languages but like some other Prop languages, they need to be ported over to the Prop and that takes some development time and considerable skill. Granted, if a prop language existed for parallel architecture, then perhaps the bulk of parallel algorithms would already exist.
Forth provides an interactive programming environment. Its primary uses have been in scientific and industrial applications such as instrumentation, robotics, process control, graphics and image processing, artificial intelligence and business applications. The principal advantages of Forth include rapid, interactive software development and efficient use of computer hardware. Forth is often spoken of as a language because that is its most visible aspect. But in fact, Forth is both more and less than a conventional programming language: more in that all the capabilities normally associated with a large portfolio of separate programs (compilers, editors, etc.) are included within its range and less in that it lacks (deliberately) the complex syntax characteristic of most high-level languages.
It was mentioned here before, one spin on the idea of an alternate prop chip cluster to use for development is a VR chip. Virtual reality chips can be cloned for the price of cloning time and processor space. However, I don't know of any systems that run on the PC which can have very many operating clones in the cluster. Still, we are after a simple processor.
Here's an idea - why not take one PC and multitask a processor program, and keep opening up processors in multitasks - it will be interesting to know how many the pc will handle.
But I am not aware of how to communicate between multitasks to simulate the clustered connections by passing data, i.e. task one talks to task 3. It could be an inexpensive way (totally free) to get several virtual processors on the PC using Windows XP for example.
I haven't seen anyone mention the GA144 which has 144 F18A CPUs on the one small chip. It really looks like they are doing well and are testing silicon now with production available next year.
There is even a very small footprint GA4 with "only" 4 CPUs. BTW, even though the word length is 18 bits the instruction length is only 5 bits (or less) so that they can fit up to 4 instructions in the one word. This is an army of lean mean stack machines.
Each of its F18A computers can perform a basic ALU function in about 1.5 nanoseconds while only consuming about 7 picojoules.
22 of the F18A computers have one or more I/O pins, plus additional circuitry such as:
one or two GPIO pins
analog I/O
specialized digital I/O like high-speed SERDES, an SPI bus, GPIO to analog nodes for sync of sampling, two 18-bit parallel buses and four GPIO pins used, e.g., to control external memory chips but available for other uses
Interesting but looks complex to get anything decent done. Coupled with that, you are required to specify the contents of the ROM for each core. SO there will be minimum quantities and expensive.
Yep, as I have said before, the GA144 looks like a good coprocessor for the Prop2, or the other way arround.
There is obviously a lot of compute power on it, but in my estimation, it is throttled by the lack of memory and that 18bit word width.
The mask ROM can optionally be customized but otherwise several different boot methods are supported including serial Flash. I suppose that if someone wanted a custom mask ROM for the Propeller too that they would only have to cough up enough dollars.
The minimum 10 pack is for advanced orders but from my general feeling about this venture it appears to be tight and driven and I really do believe that they will have chips available mid 2011 as they say. As for the 18-bit word length you could look at it as being two bits more than 16-bits! Given the simplicity of the individual CPUs which is very much reminiscent of the Propeller's COGs yet taken to an extreme in terms of simplicity and speed, I find that this is more like a customizable "processor". Those of us who are on the Propeller bandwagon clearly understand how you can "use" 8 CPUs as some may just be interfacing to a device as a virtual peripheral etc. Similarly the F18 CPUs on the Green Array's chips take this to the next step by factoring this into smaller and simpler CPUs which because they run at such high speeds would be able to emulate perhaps even the Propeller (not that I want to). But sometimes you don't want to devote a whole COG to a job, you just need to factor out part of it but there is no easy way.
Take something like the FullDuplexSerial object for instance, it resorts to JMPRET to achieve a kind of factoring of the COG but at a cost of both speed and timing precision. No such problem with being able to devote an F18 for receive timing, one for receive buffering and handshaking perhaps, and so on, but the kind of speed that it would be able to achieve would make our poor Prop hide in the corner. Even in terms of emulating a virtual CPU and running HLL I can see various F18 cores handling instruction fetches and pipelining, predetermining jumps etc while some F18s will be dedicated to VM instruction decode and so on. Doing this all in parallel means precious little cycles are lost so we are talking about VM execution speeds of hundreds of MIPS. The VM doesn't have to be 18-bit wide either.
BTW, the F18 is clockless and each F18 typically executes an instruction in 1.5ns, maybe faster, maybe slower, but still a lot faster than having a clock slow them all down to the slowest one's speed plus a good margin.
But you have gotta love the Prop, it's the best of both worlds in a simple package with a worldwide network of support, it even has it's own cheer squad: Gimme a P, gimme an R, gimme a .....
Comments
GreenArrays super computer:
360 computers running at 700 Mips or 250 Gips.
Propeller UltraSpark 40:
320 computers running at 20MIPs or 6.4 Gips.
Makes the Prop look pretty sad does it not?
But wait, those green array processors are only 18 bit and each one only has 64 words of RAM. If anyone thought COGs were memory constrained, this is another world. So, like the Prop it needs an interpreter to be able to code anything remotely complicated. In this case they use Forth. However GreenArrays requires a hand full of processors to run the interpreter.
Humanoido
Humanoido
FORTH is a good candidate for a parallel programming language. FORTH already runs on the Prop. There's a parallel programming version too.
http://home.nvg.org/~sk/lang/lang.html
Forth provides an interactive programming environment. Its primary uses have been in scientific and industrial applications such as instrumentation, robotics, process control, graphics and image processing, artificial intelligence and business applications. The principal advantages of Forth include rapid, interactive software development and efficient use of computer hardware. Forth is often spoken of as a language because that is its most visible aspect. But in fact, Forth is both more and less than a conventional programming language: more in that all the capabilities normally associated with a large portfolio of separate programs (compilers, editors, etc.) are included within its range and less in that it lacks (deliberately) the complex syntax characteristic of most high-level languages.
Humanoido
Just like tripe. Some people love it.
Here's an idea - why not take one PC and multitask a processor program, and keep opening up processors in multitasks - it will be interesting to know how many the pc will handle.
But I am not aware of how to communicate between multitasks to simulate the clustered connections by passing data, i.e. task one talks to task 3. It could be an inexpensive way (totally free) to get several virtual processors on the PC using Windows XP for example.
Humanoido
There is even a very small footprint GA4 with "only" 4 CPUs. BTW, even though the word length is 18 bits the instruction length is only 5 bits (or less) so that they can fit up to 4 instructions in the one word. This is an army of lean mean stack machines.
http://forums.parallax.com/showthread.php?124492-Alternate-Prop-Chips/page3
but at that time the GA144 was not available.
Is the GA144 chip even available now? Apparently not.
It seems to be in "advanced ordering" mode.
But the quoted price (works out to $.14 per chip) is a tease.
http://www.greenarrays.com/home/products/index.html
The minimum order ten pack would give you 1,440 computers
if the chip was available.
Quoted 96 billion ops.
http://www.greenarrays.com/home/documents/greg/PB001-100503-GA144-1-10.pdf
There is obviously a lot of compute power on it, but in my estimation, it is throttled by the lack of memory and that 18bit word width.
Doug
The minimum 10 pack is for advanced orders but from my general feeling about this venture it appears to be tight and driven and I really do believe that they will have chips available mid 2011 as they say. As for the 18-bit word length you could look at it as being two bits more than 16-bits! Given the simplicity of the individual CPUs which is very much reminiscent of the Propeller's COGs yet taken to an extreme in terms of simplicity and speed, I find that this is more like a customizable "processor". Those of us who are on the Propeller bandwagon clearly understand how you can "use" 8 CPUs as some may just be interfacing to a device as a virtual peripheral etc. Similarly the F18 CPUs on the Green Array's chips take this to the next step by factoring this into smaller and simpler CPUs which because they run at such high speeds would be able to emulate perhaps even the Propeller (not that I want to). But sometimes you don't want to devote a whole COG to a job, you just need to factor out part of it but there is no easy way.
Take something like the FullDuplexSerial object for instance, it resorts to JMPRET to achieve a kind of factoring of the COG but at a cost of both speed and timing precision. No such problem with being able to devote an F18 for receive timing, one for receive buffering and handshaking perhaps, and so on, but the kind of speed that it would be able to achieve would make our poor Prop hide in the corner. Even in terms of emulating a virtual CPU and running HLL I can see various F18 cores handling instruction fetches and pipelining, predetermining jumps etc while some F18s will be dedicated to VM instruction decode and so on. Doing this all in parallel means precious little cycles are lost so we are talking about VM execution speeds of hundreds of MIPS. The VM doesn't have to be 18-bit wide either.
BTW, the F18 is clockless and each F18 typically executes an instruction in 1.5ns, maybe faster, maybe slower, but still a lot faster than having a clock slow them all down to the slowest one's speed plus a good margin.
But you have gotta love the Prop, it's the best of both worlds in a simple package with a worldwide network of support, it even has it's own cheer squad: Gimme a P, gimme an R, gimme a .....