multi core processor design suggestion
Chris Micro
Posts: 160
Hello together,
now that I'm playing around a little bit with the propeller, some ideas come to my mind how to improve the propeller design a little bit.
1. The propeller has 8 cores which can be used to implement some peripheral functions like rs232 or keyboard interface. For this type of interface a 8 bit core would be sufficient. So why not to construct a propeller with 4x32 cores and 16x8 bit cores.
2. The propeller global memory access is a kind of slow. So, why not to divide the memory systems in two propeller systems. The memory access speed would double.
I know that this two suggestions go a little bit on the price off homogeneity but with gaining speed.
chris
now that I'm playing around a little bit with the propeller, some ideas come to my mind how to improve the propeller design a little bit.
1. The propeller has 8 cores which can be used to implement some peripheral functions like rs232 or keyboard interface. For this type of interface a 8 bit core would be sufficient. So why not to construct a propeller with 4x32 cores and 16x8 bit cores.
2. The propeller global memory access is a kind of slow. So, why not to divide the memory systems in two propeller systems. The memory access speed would double.
I know that this two suggestions go a little bit on the price off homogeneity but with gaining speed.
chris
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,
When you end up playing with the Prop "a lot more" then you will appreciate it for the way it is. Sure, sometimes you might just need 8-bit power but then you may as well have specialized hardware such as UARTs etc. Sure you could split the memory system but now you need extra hardware and software for the two halves to communicate. The power of the Prop is in it's elegant simplicity and that each cog is identical and all I/O's are identical etc because then it's just a matter of software. Anyway, designing and fabricating silicon is a little (a lot) different from pcbs so it's not just a matter of saying "let's try this". There is this thing called "money" and also "time", these two are bad news for engineers (the things I'd love to try!).
Chip's a smart cookie and he has thought it through very well, think of him as your "Zen master" (don't get too big a head Chip) and meditate on the way it has been done and you should come to the same conclusion yourself. Sure there's room for improvement, that's why he is working on Prop II.
*Peter*
4x32 and 16x8, so you'd have 20 COGs which decreases HUB RAM access speed because they all have to share access-time! It's 2,5 times slower then.
Do you know what a native 8 bit COG means? Completely different handling of the COG internal RAM. Missalignment of the opcodes is bad because it makes decoding of opcodes, source and destination registers more complex.
Global memory is slower, but not too bad. Even in the worst case scenario you can have bulk transfer of ~12MB/sec - did I mention it "PER COG". COG to COG communication can be much faster. From my point of view Parallax did a very good job and keeping design simple but usefull for lots of different needs is one of the benefits.
The other thought I had is that it would be nice to have a hardware multiplier.· I'm guessing that multiplies are done in software using a shift and add algorithm, which would take about 100 cylcles for a 32-bit multiply.· Even if only one cog had a multiplier it would greatly improve the speed of the propeller for doing DSP algorithms.
That being said, I am very impressed by the propeller and the development software.
Dave
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,
Probably it could be useful to implement it in only some but not all COGs to reduce cost.
You have to see that this processor has very little program memory so the code has to be broken in pieces that can work independently. You can have several serial interfaces in 1 COG, keyboard and mouse drivers in 1 COG. The processes that really need high speed are not that many. The filters you talk about can be implemented especially if your factors are compatible with mul/div by powers of 2.
There are easy and very fast ways to multiply by a constant using shifts and adds. If the constant is configurable, the code can be generated dynamically.
Of course it is possible to speed up the multiply operation with some limitations to the constants. But for flexibility it would be nice to have the full working multiply operation. For instance, let's assume you want to implement a IIR band pass filter and you want to alter the center frequency and the band width dynamically. How do you do it with fixed constants? For advanced DSP operations multipliers are necessary.
Since a while I'm thinking possibilities to create a microprocessor with the smallest number of gates possible. As far as I know the first ARMs need around 30.000 Gates, which is not much for a 32 bit processor. It seems to me as if the instruction code of the propeller has an ARM like structure to reduce the cost of the decoding logic.
For me the question is if it could be possible to reduce the number of gates for an 8 bit processor. If we can reduce the number of gates per processor it is possible to increase the number of processors per chip and increase the MIPS per chip.
In the example above ( 4x32bit and 16x8bit ) this would lead to 4x20+16x20=400MIPs instead of 170 MIPS the propeller has.
Post Edited (Chris Micro) : 4/28/2009 1:59:05 AM GMT
but prop 2 will have the hardware multiply you want.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Need to make your prop design easier or secure? Get a PropMod has crystal, eeprom, and programing header in a 40 pin dip 0.7" pitch module with uSD reader, and RTC options.
As the number of parallel nodes rises, the solution complexity does also, unless the problem is one that is easily factored in a parallel fashion. For problems that don't factor this way, which from what I can see is most problems, intercommunication burdens will diminish the returns. In other words, you may find that 400MIPs delivering less than the 170, after all the communication is done!
No multiply is a bummer. That's the teaser feature for Prop II!
One other thing that was said early on was breaking symmetry results in a kludge. That kludge would expand the scope of solvable problems open to Prop I. However, scaling that design will scale the kludge, thus limiting the scope of potential solvable problems for Prop II. A quick look at the Intel mess tells that story completely. Ugh...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!
As for the multiply, it would be nice to have it in hardware in every cog, but even adding it as a hub function would help since multiplies are not that frequent. Have 2 hub registers to write the numbers to be multiplied and then read them back when the operation is done to get the 64 bit result. The first computer I programmed (Collins 8400) worked in a similar manner. The data to be processed was written to one or two registers and the result was read out from another.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I have also programmed the Stamp and the SX.· These both use Parallax's Basic language, so it seems like this would have been another obvious choice for the Prop.· This would allows Stamp and SX programmers to immediately come up to speed on the Prop.· It would also make it easier to port code written for the Stamp or SX to the Prop.
My main issue is that I hate to learn yet another vendor specific language.· However, I do like the Prop, and I intend to use it in one of my next projects, so I'm willing to learn the Spin language so I can use the Prop.
Dave
With a small amount of silicon real estate, the Prop could be used in many more signal and image processing applications.· There are lots of algorithms that use IIR and FIR filters, DCTs, FFTs and·correlation.· The Prop has more than enough MIPs to do some interesting DSP functions, but it is limited by the number of multiplies per second.
Dave
Of course, for the most part it's not the language that really matters, it's the principles. If you learn how to program well in one language, it's fairly easy to learn the syntax of another and use that. Personally, I'm glad that they stayed away from C. Why? so that if I have to learn some other microcontroller, I won't mess up the extensions of one with another.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230
I can't second this enough. When all one knows how to use is a hammer.. everything looks like a nail.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"VOOM"?!? Mate, this bird wouldn't "voom" if you put four million volts through it! 'E's bleedin' demised!
When all you know is a hammer, everything is a thumb.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· -- Carl, nn5i@arrl.net
These classes of tasks have, in general, very different requirements.· Designers historically have designed quite different hardware for them.· From the days of the 709 (a vacuum-tube mainframe) right through to the newest mainframes and·latest PCs, I/O has been handled by specialized processors (channels, in mainframe-talk) designed especially·for I/O, and manipulation of data to create new data has been handled by more versatile processors designed for data manipulation.· Not doing it that way is rather like maintaining a fleet of heavy tanks, when many trips need only a Suburban or a motorcycle.
For example, in the PC I'm using at the moment there are four general-purpose processors for doing calculations (it's a dual-Xeon server, with each Xeon having two cores).· There are also two specialized display processors (VGAs) driving three displays.· There is a sound processor.· There are two IDE disk processors, two IDE RAID processors, and a SCSI controller. I've probably omitted some others.
That's twelve processors,·of which only four are general-purpose.· They could have used twelve general-purpose processors instead, but it would have been poor economy and therefore poor design.
So why such defensiveness against the idea that the Propeller's designers, as it evolves into more and more powerful versions, should reexamine the decision to make all processors identical?· I don't suggest that the decision should be changed, but it's reasonable to take another look from time to time, rather than reflexively shooting·down the idea of taking another look, as some have done in this thread.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· -- Carl, nn5i@arrl.net
Post Edited (Carl Hayes) : 4/28/2009 6:48:47 PM GMT
I'm not sure you'd save a lot of die space going with an 8 bit ALU. Yes, the adder & shifter is 1/4 the size because it handles only 1/4 the bits, but the control logic doesn't scale in the same way.
And how much RAM per 8 bit COG? Ever try to write a 8 bit program in 512 bytes (code+data)?
Not to mention the SPIN interpretter couldn't use the 8 bit COGs, so only 4 SPIN threads.
@Carl Hayes
A dedicated chip will always be more efficient than a general purpose chip, but will only be cost effective if a zillion of them can be made. (Which is why my TiVo costs a tenth of an HTPC and uses a tenth of the power.) But if a zillion aren't required then you either need high-cost custom hardware or low cost general purpose hardware.
And Chip is revisiting his assumptions for PropII. Things like the 64K HUB RAM limitation, and number of cogs. But changes to those assumptions trickle down to implementation details like how COGINIT works and the PASM interpretter.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: http://forums.parallax.com/showthread.php?p=800114
Of course Chip -- is that a nickname and his real name Silicon?·or perhaps Wafer? -- of course Chip, being a man of·versatile and inquiring mind, will reexamine every assumption and every·decision every time as a normal part of his intellectual functioning.· It's not Chip who surprised me -- it's the knee-jerk defensive reactions, which certainly didn't, and couldn't,·come from Chip.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· -- Carl, nn5i@arrl.net
Post Edited (Carl Hayes) : 4/28/2009 8:38:37 PM GMT
Now we have GPUs that are just amazing many-core processors, and we can use them for graphics or for parallel computing (CUDA, OpenCL, etc.). These special purpose cards are getting more general purpose, with the next step in the evolution looking to be similar to the Larrabee project (en.wikipedia.org/wiki/Larrabee_(GPU)), where it's basically a set of 24/32/48 P54C cores (depending on die yield).
So, coming from the graphics programming arena, the propeller looks like it is ahead of the curve [noparse][[/noparse]8^)
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
It may have been false, too.· There were too many 370/158, a very popular model, for all the channels to be any particular earlier model of 360 or 370.· But the fact that it was believable shows that the difference between special-purpose·and general-purpose processors need not be a wide gulf.· Still there can be savings, for you can leave stuff out of a specialized processor.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· -- Carl, nn5i@arrl.net
Post Edited (Carl Hayes) : 4/29/2009 12:10:59 AM GMT
-phar
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Need to make your prop design easier or secure? Get a PropMod has crystal, eeprom, and programing header in a 40 pin dip 0.7" pitch module with uSD reader, and RTC options.
Post Edited (mctrivia) : 4/29/2009 3:30:42 AM GMT