Propeller chip what if ??

SteveW719 · 2011-02-10 19:52

I am pretty new to the propeller and i love its structure and elegance. The prop2 looks to be an impressive step forward. Maybe some of the parallax chip fab experts or others who know about that sort of work could satisfy my curiosity .

Lets pretend a multi national company calls up and offers to take 3 million of the prop 2 per year if the following changes cold be made.

Clock speed minimum 500MHZ preferably 1GHZ
Hub Ram 256K longs
Cog ram 2k longs
Smart hub that could bypass unused cogs or cogs that are busy thus allowing a bigger window per revolution for cogs needing access to hub memory.

Would these things be possible? How much bigger would the die have to be? What would be the cost implications?

Thanks for indulging my curiosity

Phil Pilgrim (PhiPi) · 2011-02-10 20:04

Cog ram 2k longs
Cog address range is limited to 9-bit addresses, due to the instruction word's 32-bit size and, hence, the size of the source and destination fields.

Smart hub that could bypass unused cogs or cogs that are busy thus allowing a bigger window per revolution for cogs needing access to hub memory.
This would violate determinacy, a hallmark of the Propeller architecture.

-Phil

Ravenkallen · 2011-02-10 20:08

I'm no expert, but i think if there are ARM processors out there that are capable of having megabytes of memory, gigahertz speeds and a TON of peripherals, than yeah, i think it would possible. Probable, No! Cost effective, No!... The imaginary company would probably just use a ARM and save themselves a lot of time/money and not waste Parallax's time...Plus, what Phil said about the necessary architecture changes would make it harder to implement.

Mike Green · 2011-02-10 20:13

Forget it! The amount of work required to do #1 is prohibitive and would result in a fundamentally different chip in terms of power consumption, heat production, manufacturing process, etc. This is something that would require a complete redesign.

#2 and #3 are prohibitive in terms of chip area and its effect on price and yield. You're talking about approximately quadrupling the chip area for what is already a large chip.

#4 has been rejected multiple times. It violates the determinacy the makes the Propeller particularly useful.

richaj45 · 2011-02-10 21:30

There seams to be a lot of negativism. After all this is just a thought problem.

First lets change the definition of "long" from 32-bits to 40-bits. Now the native COG can address 9+4 or 8k 40-bit longs.

Next the COG access to the Hub could have programmable priority. That is each COG could get even access like now or maybe some COGs get have as many access or maybe some get none. If it was programmable then the deterministic way it is now could just be the default.

As for so much memory, i think that is more of a technology question. That is if the current full custom physical design process was changed to an ASIC flow with external chip foundries did the synthesis and physical design then the 90nm and 62nm options would open up. Yes the, NRE cost would be high but by passing RTL over to the chip houses the project schedule would be much sorter. I think that is how you would get the speed and memory capacity.

It is certainly possible if the money is available. Which usually means the market.
I think it is a great idea!
Extend the simple and not just add bells and whistles i say.

rich

Heater. · 2011-02-10 22:32

SteveW719

Let's see:

Clock speed minimum 500MHZ preferably 1GHZ

A certain multi-core micro-controller company in England has such devices running at 500MHz. So I guess this is doable.

Hub Ram 256K longs

The same multi-core micro-controller company in England has such devices with a total of 256K Bytes on board. Is that sufficient? Let's say that's doable.

Except that the Props HUB RAM is quite complex if I understand correctly. With multiple ports. So this may not be possible.

Cog ram 2k longs

This requires a awful ugly hack to the architecture, like bank switching areas of COG RAM. Please God let's not do that.

Or it require a more elegant elegant solution like widening the instructions to 40 bits as richaj45 points out. This is something I would dearly love to see. When the time comes better be at least 48 bits:)

Either way this is no longer a Prop is it, or is it?

Smart hub that could bypass unused cogs or cogs that are busy thus allowing a bigger window per revolution for cogs needing access to hub memory.

No, no, no. In your opening statement you say "...i love its structure and elegance" as do we all.
Your proposal is a subtle and seductive change. But I believe it has dire consequences and might destroy that elegance we crave.

Consider:

1) I write a supperduper high res graphics engine or high speed communications object or whatever widget.
2) I have an app that goes with it that uses one more COG.
3) Because I want the high res or high speed coms I write that driver object so that it relies on there only being two active COGS in the system. That way I get the high HUB bandwidth using your proposed HUB access mechanism.
4) Fine I'm happy, I've done something that would not be possible with the normal HUB access slots.

Now:

1) You take my high speed widget and drop it into your app that has three or more COGS active
2) Oh dear, it does not work. There is not enough HUB bandwidth in your application.

I believe that if such irregularness (sorry for the ugly word) were allowed in the Prop it would get used a lot as I describe. With the result that combining objects into apps becomes fraught with problems.

In short:

1) Total symmetry of COGs and pins.
2) Total timing determinism of code execution.
3) The absence of interrupts.
4) Anyone think of more here?

is what defines a Propeller. These things are fundamental to Propellerness and what makes building apps out of bits and pieces from OBEX etc so phenomenally easy. The Props best selling point.

P.S.

My dream Prop is like so:

1) 40, 48 or more bit instructions and LONGS.
2) Hugely increased COG space, now allowed by the src, dst field width increase.
3) Some high speed communication channels between COGS.
4) NO HUB RAM. Not needed as the COG space is so much bigger and COGS can talk via the above coms channels.
5) More speed, more pins etc.

hinv · 2011-02-10 22:59

[QUOTE=SteveW719;975978
Smart hub that could bypass unused cogs or cogs that are busy thus allowing a bigger window per revolution for cogs needing access to hub memory.

Would these things be possible? How much bigger would the die have to be? What would be the cost implications?

Thanks for indulging my curiosity[/QUOTE]

This isn't as pretty, but if you needed more bandwidth, you can have other cogs feed their slots over the pinless I/O's to the cog that needs more bandwidth. I believe 32 unpinned I/Os are planned to make buses like this.

Heater. · 2011-02-10 23:09

hinv,

This isn't as pretty, but if you needed more bandwidth, you can have other cogs feed their slots over the pinless I/O's to the cog that needs more bandwidth.

Is this true?

Seems to me if I want to move data from one COG to another via pins I have to reserve at least one bit as a flag to indicate data ready, perhaps another to indicate data consumed. By the time I'm done with setting clearing flags and masking the flags out of the data I'm reading from the pins I might as well have used the time in a simple HUB access.or two.

Not to mention that often the time "wasted" by a COG waiting for it's HUB access slot can be used to do some other useful instructions.

jazzed · 2011-02-10 23:22

Heater. wrote: »

Not to mention that often the time "wasted" by a COG waiting for it's HUB access slot can be used to do some other useful instructions.

Then there's the "Quad" transfer type ... 4 longs at a time every 100ns (50ns?).

Ale · 2011-02-11 00:13

If you are able split your workload into several threads then many of those features can be achieved in other devices by other methods. YOu can algo use one of those devices heater mentioned, they are quite affordable. But they lack special circuitry for video and are not that easy to setup and have running in a breadboard. The "the prop is up and running in 30 seconds" is very appealing, I mean a fast device that is easy to setup. I hope the prop2 does have such a feature

Cluso99 · 2011-02-11 01:13

Steve: Firstly, welcome.

Wow, what a lot of negative response you received. The response would depend on how deep are your pockets... which means, would Parallax take you seriously enough to do it.

All the things you mention, with issues to be resolved of course, are nevertheless doable. Just look at the Intel processors with >1B transistors and >2GHz, etc.

Now lets break it down....

500MHz or 1GHz = yes
Hub Ram 256K longs = expected now?
Cog ram 2k longs = yes (*but see below)
Bypass unused hub access = yes (*but see below)
Die size = 4x size (but smaller geometry would be required I think)
Cost = unknown (you need deep pockets)

Now, lets discuss this rationally.....
Hud access:
Everyone gets upset about losing determinism. However, why not have a flag that can set this, with the default for normal operation. Often, one large program runs, that is not timing sensitive (deterministic), but wants to grab any hub bandwidth available. So this cog could steal the unused cogs accesses. I have suggested this quite a bit before.

Cog Ram 2Kx32:
Firstly, from the info we have, the cogs may now be 50% of the die (more specs in an older thread), and the cog ram takes most of this space. So cog ram x4 =2Kx32 so we are increasing the chip size of half by a factor of 4. Therefore, we doubled the whole die size. This is only a cost vs yield issue.

500MHz-1GHz:
This would most likely mean a finer geometry, and that reduces die size, but increases power consumption. This is just an economics excercise.

Now, lets look differently at the problem...
Mostly, we require one main program and lots of other parallel programs. Therefore, why not one super-cog? Could this be done - Of course, as anything is relatively possible.

So lets make 1 cog a "super cog":

Give it access to unused hub windows, via a flag, so we can either preserve determinism and slower, or faster without.
Lets give it 4Kx40 bits (8x current)
This cog is started in 32 bit mode and operates as a normal cog. A special flag can be set that enables the 40bit mode and then a small bootloader loads up the code in 40bit mode, and begins to execute.
add 3 bits for D & S
add 2 bits for the instruction set or whatever. 1 bit could indicate super???
The alternative is 32bit wide and some form of bank switching and relative jump/calls.

While you did specify this as a Prop II, I believe this could/would not happen for the Prop II. The Prop II is way too far down the track.

Back to reality....
However, for a Prop III, or for a custom chip for your company only, if you were serious, you should contact Parallax offline. I am sure we would love to chip in for some features.

Cluso99 · 2011-02-11 01:23

heater: While I absolutely disagree that with larger cog ram we could dispense with the hub ram.
However, with 48 bit instructions we could have 16bit each for D & S, and 16 for the instruction+ bits which gives us 2 more bits... 1 for the instruction and 1 for ???
Now, this means the movd and movs become 16 bits, nice for building data, and the movi becomes 12 bits. Now we only need a movb for building bytes.

Ale · 2011-02-11 02:11

Cluso: Extending the instruction size seems like a good option but I think that another kind of processor I mean one with registers and stack may be better suited for larger COG memory.

Heater. · 2011-02-11 02:53

Ale,

I'm not sure I see what you mean.

In the Propellers COG architecture there is no differentiation between memory and registers. If you want to dedicate some of that COG space to "registers" for the convenience of your compiler, for example, then you can already do so.

Perhaps by "another kind of processor" you mean being able to use registers as pointers perhaps with auto increment/decrement etc and extra addressing modes. Yes that would be nice, and in the Prop architecture it would have to apply to all COG locations. Cluso has arranged for an extra bit in the instruction width to make this possible.

If you mean "normal kind of other processor" ie, without the elegant simplicity and regularity of the current instruction set then there is not much point, everyone and his dog is working on that.

Graham Stabler · 2011-02-11 03:06

Quite a few complaints about negativity, what is wrong with stating what you believe is the truth?

http://www.youtube.com/watch?v=u5um8QWWRvo

Dave Hein · 2011-02-11 04:23

SteveW719,

I think you suffered the wrath of the octogenarians (those with more than 8000 posts) because you framed your question as a change to the Prop 2. If you labeled it as the Prop 3 they might be more open to your suggestions. I think what you proposed is doable. There are obvious solutions to increasing the hub RAM, such as bank switching, relative jumps or increasing the instruction size. The argument about determinacy is silly. Determinacy is important for a small subset of applications, and in those cases a mode bit and a bit of logic would provide that requirement.

Most of us on this forum love the prop, but we should keep an open mind. Especially when we're just talking about a hypothetical scenario.

Dave

richaj45 · 2011-02-11 07:35

@Graham:

What a great you tube video.
For the most part i agree.
I was only using the term as a joke cause every body i know thinks i am two "realistic"
about live (A defined in the video).

cheers,
rich

SteveW719 · 2011-02-11 09:40

Hey, thanks to all who answered. I only used the prop2 as a jumping off point I did not mean to imply it should be changed. What prompted this was the discussion about the future of the propeller and its acceptance in the commercial world. It made me think about some feature sets on some of the widely adopted processors and could that be translated to the propeller model.

Some of the answers were informative and I learned some things.

I was also thinking about just eliminating all hardware peripherals and giving each cog a whack of FPGA to create any peripheral you may need. That may really push the die size sky high.

HShanko · 2011-02-11 11:05

Grrrr! Some people are too artistic (referring to that great YouTube video). That artist could make a great "How to quickly understand the Prop" lecture in animation form.

hinv · 2011-02-13 05:16

Graham Stabler wrote: »

Quite a few complaints about negativity, what is wrong with stating what you believe is the truth?

http://www.youtube.com/watch?v=u5um8QWWRvo

very good link, thanks

Graham Stabler · 2011-02-13 05:34

HShanko wrote: »

That artist could make a great "How to quickly understand the Prop" lecture in animation form.

My thoughts exactly, I'd love to try (I can draw a little at least) but I don't know the subject well enough to do it well. I recommend a search for RSA animate on youtube for lots of other interesting animated talks, Ken Robinson's talk on education is a particular favorite.

Graham

hinv · 2011-02-13 05:47

As for 40 or 48 bit longs....I would rather not. It wouldn't fit well with external memory, peripherals, etc and cause a lot of confusion.
The problem is that most don't think in odd bit lengths(not power of 2). I personally think the propeller cog would make a great piece for an array processor like the http://greenarrays.com/ chip. Which, coincidentally, bugs me because it is an 18bit machine. I find myself asking if only it were 32 bit....
Anyway, if we had the best of both worlds, say a 4x4 grid of cogs with up, down, left, right ports(with directions consistent) that automatically stall, hand shake and all of that, along with a round robin 16 port hub like we have now, I think this architecture would rock! As SGI found out a while back many problems bottleneck at bandwidth, not computational cycles.

Just my 10 bits
Doug

Cluso99 · 2011-02-13 16:29

The PropII will have autoincrement and autodecrement AFAIK.

Propeller chip what if ??

Comments