Propeller chip what if ??
SteveW719
Posts: 40
I am pretty new to the propeller and i love its structure and elegance. The prop2 looks to be an impressive step forward. Maybe some of the parallax chip fab experts or others who know about that sort of work could satisfy my curiosity .
Lets pretend a multi national company calls up and offers to take 3 million of the prop 2 per year if the following changes cold be made.
Clock speed minimum 500MHZ preferably 1GHZ
Hub Ram 256K longs
Cog ram 2k longs
Smart hub that could bypass unused cogs or cogs that are busy thus allowing a bigger window per revolution for cogs needing access to hub memory.
Would these things be possible? How much bigger would the die have to be? What would be the cost implications?
Thanks for indulging my curiosity
Lets pretend a multi national company calls up and offers to take 3 million of the prop 2 per year if the following changes cold be made.
Clock speed minimum 500MHZ preferably 1GHZ
Hub Ram 256K longs
Cog ram 2k longs
Smart hub that could bypass unused cogs or cogs that are busy thus allowing a bigger window per revolution for cogs needing access to hub memory.
Would these things be possible? How much bigger would the die have to be? What would be the cost implications?
Thanks for indulging my curiosity
Comments
Cog address range is limited to 9-bit addresses, due to the instruction word's 32-bit size and, hence, the size of the source and destination fields.
Smart hub that could bypass unused cogs or cogs that are busy thus allowing a bigger window per revolution for cogs needing access to hub memory.
This would violate determinacy, a hallmark of the Propeller architecture.
-Phil
#2 and #3 are prohibitive in terms of chip area and its effect on price and yield. You're talking about approximately quadrupling the chip area for what is already a large chip.
#4 has been rejected multiple times. It violates the determinacy the makes the Propeller particularly useful.
First lets change the definition of "long" from 32-bits to 40-bits. Now the native COG can address 9+4 or 8k 40-bit longs.
Next the COG access to the Hub could have programmable priority. That is each COG could get even access like now or maybe some COGs get have as many access or maybe some get none. If it was programmable then the deterministic way it is now could just be the default.
As for so much memory, i think that is more of a technology question. That is if the current full custom physical design process was changed to an ASIC flow with external chip foundries did the synthesis and physical design then the 90nm and 62nm options would open up. Yes the, NRE cost would be high but by passing RTL over to the chip houses the project schedule would be much sorter. I think that is how you would get the speed and memory capacity.
It is certainly possible if the money is available. Which usually means the market.
I think it is a great idea!
Extend the simple and not just add bells and whistles i say.
rich
Let's see:
A certain multi-core micro-controller company in England has such devices running at 500MHz. So I guess this is doable.
The same multi-core micro-controller company in England has such devices with a total of 256K Bytes on board. Is that sufficient? Let's say that's doable.
Except that the Props HUB RAM is quite complex if I understand correctly. With multiple ports. So this may not be possible.
This requires a awful ugly hack to the architecture, like bank switching areas of COG RAM. Please God let's not do that.
Or it require a more elegant elegant solution like widening the instructions to 40 bits as richaj45 points out. This is something I would dearly love to see. When the time comes better be at least 48 bits:)
Either way this is no longer a Prop is it, or is it?
Smart hub that could bypass unused cogs or cogs that are busy thus allowing a bigger window per revolution for cogs needing access to hub memory.
Would these things be possible? How much bigger would the die have to be? What would be the cost implications?
Thanks for indulging my curiosity[/QUOTE]
This isn't as pretty, but if you needed more bandwidth, you can have other cogs feed their slots over the pinless I/O's to the cog that needs more bandwidth. I believe 32 unpinned I/Os are planned to make buses like this.
Is this true?
Seems to me if I want to move data from one COG to another via pins I have to reserve at least one bit as a flag to indicate data ready, perhaps another to indicate data consumed. By the time I'm done with setting clearing flags and masking the flags out of the data I'm reading from the pins I might as well have used the time in a simple HUB access.or two.
Not to mention that often the time "wasted" by a COG waiting for it's HUB access slot can be used to do some other useful instructions.
Wow, what a lot of negative response you received. The response would depend on how deep are your pockets... which means, would Parallax take you seriously enough to do it.
All the things you mention, with issues to be resolved of course, are nevertheless doable. Just look at the Intel processors with >1B transistors and >2GHz, etc.
Now lets break it down....
500MHz or 1GHz = yes
Hub Ram 256K longs = expected now?
Cog ram 2k longs = yes (*but see below)
Bypass unused hub access = yes (*but see below)
Die size = 4x size (but smaller geometry would be required I think)
Cost = unknown (you need deep pockets)
Now, lets discuss this rationally.....
Hud access:
Everyone gets upset about losing determinism. However, why not have a flag that can set this, with the default for normal operation. Often, one large program runs, that is not timing sensitive (deterministic), but wants to grab any hub bandwidth available. So this cog could steal the unused cogs accesses. I have suggested this quite a bit before.
Cog Ram 2Kx32:
Firstly, from the info we have, the cogs may now be 50% of the die (more specs in an older thread), and the cog ram takes most of this space. So cog ram x4 =2Kx32 so we are increasing the chip size of half by a factor of 4. Therefore, we doubled the whole die size. This is only a cost vs yield issue.
500MHz-1GHz:
This would most likely mean a finer geometry, and that reduces die size, but increases power consumption. This is just an economics excercise.
Now, lets look differently at the problem...
Mostly, we require one main program and lots of other parallel programs. Therefore, why not one super-cog? Could this be done - Of course, as anything is relatively possible.
So lets make 1 cog a "super cog":
- Give it access to unused hub windows, via a flag, so we can either preserve determinism and slower, or faster without.
- Lets give it 4Kx40 bits (8x current)
- This cog is started in 32 bit mode and operates as a normal cog. A special flag can be set that enables the 40bit mode and then a small bootloader loads up the code in 40bit mode, and begins to execute.
- add 3 bits for D & S
- add 2 bits for the instruction set or whatever. 1 bit could indicate super???
- The alternative is 32bit wide and some form of bank switching and relative jump/calls.
While you did specify this as a Prop II, I believe this could/would not happen for the Prop II. The Prop II is way too far down the track.Back to reality....
However, for a Prop III, or for a custom chip for your company only, if you were serious, you should contact Parallax offline. I am sure we would love to chip in for some features.
However, with 48 bit instructions we could have 16bit each for D & S, and 16 for the instruction+ bits which gives us 2 more bits... 1 for the instruction and 1 for ???
Now, this means the movd and movs become 16 bits, nice for building data, and the movi becomes 12 bits. Now we only need a movb for building bytes.
I'm not sure I see what you mean.
In the Propellers COG architecture there is no differentiation between memory and registers. If you want to dedicate some of that COG space to "registers" for the convenience of your compiler, for example, then you can already do so.
Perhaps by "another kind of processor" you mean being able to use registers as pointers perhaps with auto increment/decrement etc and extra addressing modes. Yes that would be nice, and in the Prop architecture it would have to apply to all COG locations. Cluso has arranged for an extra bit in the instruction width to make this possible.
If you mean "normal kind of other processor" ie, without the elegant simplicity and regularity of the current instruction set then there is not much point, everyone and his dog is working on that.
http://www.youtube.com/watch?v=u5um8QWWRvo
I think you suffered the wrath of the octogenarians (those with more than 8000 posts) because you framed your question as a change to the Prop 2. If you labeled it as the Prop 3 they might be more open to your suggestions. I think what you proposed is doable. There are obvious solutions to increasing the hub RAM, such as bank switching, relative jumps or increasing the instruction size. The argument about determinacy is silly. Determinacy is important for a small subset of applications, and in those cases a mode bit and a bit of logic would provide that requirement.
Most of us on this forum love the prop, but we should keep an open mind. Especially when we're just talking about a hypothetical scenario.
Dave
What a great you tube video.
For the most part i agree.
I was only using the term as a joke cause every body i know thinks i am two "realistic"
about live (A defined in the video).
cheers,
rich
Some of the answers were informative and I learned some things.
I was also thinking about just eliminating all hardware peripherals and giving each cog a whack of FPGA to create any peripheral you may need. That may really push the die size sky high.
very good link, thanks
My thoughts exactly, I'd love to try (I can draw a little at least) but I don't know the subject well enough to do it well. I recommend a search for RSA animate on youtube for lots of other interesting animated talks, Ken Robinson's talk on education is a particular favorite.
Graham
The problem is that most don't think in odd bit lengths(not power of 2). I personally think the propeller cog would make a great piece for an array processor like the http://greenarrays.com/ chip. Which, coincidentally, bugs me because it is an 18bit machine. I find myself asking if only it were 32 bit....
Anyway, if we had the best of both worlds, say a 4x4 grid of cogs with up, down, left, right ports(with directions consistent) that automatically stall, hand shake and all of that, along with a round robin 16 port hub like we have now, I think this architecture would rock! As SGI found out a while back many problems bottleneck at bandwidth, not computational cycles.
Just my 10 bits
Doug