Propeller II update - BLOG

David Betz · 2013-12-01 14:25

Bill Henning wrote: »

I agree, with one addition:

- what will be done with the 20% of die space freed up by getting rid of the video ring?

At one point Chip suggested using the extra transistors for AUX memory.

That makes perfect sense to me - it would allow for significantly bigger stacks, which is great for LMM, and is a low-risk way of using the transistors.

The alternatives would be more hub ram (nice, but may not be a convenient size), and more cog ram - which is WAY TOO BIG a can of worms to open.

Due to the process change, Beau will be busy changing transistors, so discussion - perhaps oriented to the P3 - is not a waste of time.

Concentrating on SERDES, USB/CRC helper instructions etc is more important - and lower risk.

I would prefer more hub RAM. It is of more general use than AUX RAM. I'm not convinced that even double the amount of AUX RAM will make it useful as a C stack. You could, of course, have two different code generators for PropGCC, one that uses AUX RAM as a stack and has the associated limitations on function nesting and the other using hub memory as a stack. This is probably a big development effort though and I wonder how often the AUX RAM version would be used.

Cluso99 · 2013-12-01 14:29

Bill Henning wrote: »

I agree, with one addition:

- what will be done with the 20% of die space freed up by getting rid of the video ring?

At one point Chip suggested using the extra transistors for AUX memory.

That makes perfect sense to me - it would allow for significantly bigger stacks, which is great for LMM, and is a low-risk way of using the transistors.

The alternatives would be more hub ram (nice, but may not be a convenient size), and more cog ram - which is WAY TOO BIG a can of worms to open.

Due to the process change, Beau will be busy changing transistors, so discussion - perhaps oriented to the P3 - is not a waste of time.

Concentrating on SERDES, USB/CRC helper instructions etc is more important - and lower risk.

I also wondered about a small block of memory (say 16 * 32+1 bits where the 1bit is set when written, reset when read) in the centre of the die which could be shared by all cogs. Use similar to PortD but simpler.
One particular use was for SDRAM driver cog to pass directly between a cog.

IMHO its simple to implement, but guess the naysayers would say this is too hard to manage:- Every cog might want its own private use rather than common use; We are not smart enough to be able to arbitrate its use ourselves.

ctwardell · 2013-12-01 14:29

potatohead wrote: »

I think "I'll just go to another processor" is crappy, for what it's worth, particularly when we all were completely jazzed about the chip that got botched.

The last part of that statement isn't really worth much in that the past is the past.

Today is today and we have a potential solution to a long term issue and it seems the primary holdup is a concern over OBEX users.

C.W.

Bill Henning · 2013-12-01 14:31

Most microcontrollers, programmed in C, have a stack of 1k or less. Often much less.

They normally use 16 bit addresses, so that is 512 entries or less.

Doubling the AUX gets us to parity.

I like more HUB too, but it is precisely because of C code I think more AUX may be more useful.

Single cycle access to parameters and local variables, without having to cache it in registers, is mighty tempting.

However this is a theoretical debate, as I am perfectly happy with more hub ram, or more aux memory - whichever Chip prefers or finds more practical.

For the P2, I do NOT want more cog memory - WAY too big a can of worms at this point.

What I want is real P2 chips, as soon as practical (given transistor re-layout, video dac changes necessitated by transistor size change)

David Betz wrote: »

I would prefer more hub RAM. It is of more general use than AUX RAM. I'm not convinced that even double the amount of AUX RAM will make it useful as a C stack. You could, of course, have two different code generators for PropGCC, one that uses AUX RAM as a stack and has the associated limitations on function nesting and the other using hub memory as a stack. This is probably a big development effort though and I wonder how often the AUX RAM version would be used.

jmg · 2013-12-01 14:34

ctwardell wrote: »

What if we got rid of the 'any slot' cases and went with the pairs. A hub sharing pair cannot benefit or hinder any other COGS.

I assume you are fine with OBEX items that use multiple COGS, why not let this be a case of using multiple COGS?

Sounds fine to me.

That even works in 8 slot mode, if users accept SW handshake, or just want to sacrifice a COG-HUB path, to boost another.

The (optional) 9th slot proposed gives a HW means to allow the lessor of the pair, to still have deterministic moderate HUB bandwidth, whilst the other enjoys the deterministic Boosted HUB bandwidth.

Cluso99 · 2013-12-01 14:37

potatohead wrote: »

Cluso, you said you would always push it, right? So you've got your advocacy out there. I encourage you strongly to continue it, as others will in like kind.

I wanted to declare my potential bias.

I think "I'll just go to another processor" is crappy, for what it's worth, particularly when we all were completely jazzed about the chip that got botched.

Wrong context - I will be staying (love the P1 & P2 whatever form). Otherwise my project would have been done with another processor.
My context was from "another engineers" perspective, not mine - I want a simple sw design and multiple cogs deliver this!!!

Heck, one P1 only drives scans a 18 key keypad, drives an LCD, controls contrast etc, and communicates serially. Not exactly rocket science here, an AVR would do just fine. This way I use one set of common components and easy programming, and this means more than the cost of this project.

T Chap · 2013-12-01 14:38

The practical thing in a case like this is for Chip to set a deadline for arguments and suggestions, then take what he wants and end the discussion. Maybe Ken would appreciate a deadline for discussion as well. There is a certain magic that happens when there is a fixed deadline. For the current debate, I say add an advanced OBEX and list of gotchas, caveats, workarounds, etc.

David Betz · 2013-12-01 14:38

ctwardell wrote: »

The last part of that statement isn't really worth much in that the past is the past.

Today is today and we have a potential solution to a long term issue and it seems the primary holdup is a concern over OBEX users.

C.W.

I'm not really an OBEX user and not particularlly a Propeller user except in the process of testing PropGCC but it seems to me that the OBEX is often claimed as one of the biggest advantages of the Propeller. Would Parallax be wise to ignore that without at least a plan to provide some similar set of tested objects that can be easily deployed with a minimum of concern over interactions between objects?

ctwardell · 2013-12-01 14:38

potatohead wrote: »

Re: Pairs

I think that gets us to a 4 COG chip. We use multi-cog code now, and I think Chip's initial observation that it should be a private affair between cooperating objects speaks to this one.

I'm on the fence about it frankly.

I appreciate you being open, or at least on the fence.

I can see the case where someone that doesn't know what they are doing and they use a pair to solve a speed issue when there was a better solution available, but that is an education issue just like anything else.
I think the chance of having the OBEX become full of unnecessarily paired objects is minimal if we have a reasonable vetting process which we should have anyway.

C.W.

jmg · 2013-12-01 14:41

David Betz wrote: »

I would prefer more hub RAM. It is of more general use than AUX RAM.

Hub ram should be the first bite of the extra space.
Would be good to see the relative numbers of

* All used for extended HUB RAM - where does that get us up to ? 200k+ ?

* Split to extend HUB ram, and (double?) AUX memory

Bill Henning · 2013-12-01 14:41

That is why I was asking chip about relative transistor cost - a small 8-port memory, shared by cogs, could be extremely useful. Also to start the discussion divvying up the transistors

Edit: yes, I saw your earlier post, and such a 16x32+1 buffer would be very useful, though I'd prefer at least 32x32

I also wish either Chip's, yours, or my hub slot strategies would go in, and I am not worried about obex issues - newbies would never use it, as they stick to spin, and there is absolutely no reason for Spin to have access to setting hub slot modes. Obex could forbid hub slot modifying programs, or clearly flag them with a big flashy graphic. Personally, I'd love to have (any one of Chip's, yours or mine) hub slot control schemes, as I could sure use the bandwidth for high bandwidth uses.

I still have not heard transistor budgets from Chip, but at a guess, the 20% die are would buy us maybe another 64KB of hub space, or about 4KB more AUX for each cog. Or there could be a trade off - 2KB more aux per cog, and 32KB more hub. I am itching to know the transistor budgets...

Cluso99 wrote: »

I also wondered about a small block of memory (say 16 * 32+1 bits where the 1bit is set when written, reset when read) in the centre of the die which could be shared by all cogs. Use similar to PortD but simpler.
One particular use was for SDRAM driver cog to pass directly between a cog.

IMHO its simple to implement, but guess the naysayers would say this is too hard to manage:- Every cog might want its own private use rather than common use; We are not smart enough to be able to arbitrate its use ourselves.

potatohead · 2013-12-01 14:45

The last part is relevant, and it's relevant for perspective. Truth is, we got to talking about very high throughput, which is unrealistic given where things are right now. So a high expectation got set, and now coming back to a real expectation sucks, as always happens.

However, any of you are completely free to think what you will of that, and I'll back you in saying it. Best I can do. Nothing personal. I started down that road and removed the posts yesterday, because I mean it's not personal, not just saying it.

I'm not sure that's a primary concern at all. (obex)

Generally speaking, it's the COG as core unit of reuse and the value that has to all users, not just OBEX users.

re: "I focus on Bandwidth and Chips"

First, I think it's really about throughput, but a focus on chips boils things down to a lower common denominator. Props are props because COGS are COGS. That's a very significant differentiator being ignored. You get to do that, but I'm not going to. Everybody else can toss their points into the mix as well.

Really, I'm thinking about known revenue as opposed to unknown revenue. Given the loooong dev cycle, it's best to get this one in the can so that it can fund things that will need to happen. Parallax education and the body of existing customers are going to be the primary revenue.

Commercial may be significant! However, I don't see all the components to nail that in place right now, and I'm talking about business to business sales, which will cost Parallax more than they currently spend now. A solid design can fund this, and if it's funded then we will come to learn a lot more about where the chip will play out and what it's worth. We may come to learn a derivative makes sense too.

If so, I don't think that's a long cycle or a big risk once there is much better data behind it. One thing I do know is a bunch of us sitting around saying, "this or that is a must have" is better than nothing, but until the device is properly positioned, pitched to qualified prospects well aligned with what it offers, we really don't know anywhere near as much as we think we do.

All of which is why I was dead serious about considering two chips. One funds the next, and a part of that is nailing down what will play out commercially.

I know most of you don't think much of all that, but I've seen it play out over and over and over for about 20 years or so, and when it's done right, chance of success is very high. When it's not, those chances are very low.

So then, building in ways that may well dilute well understood revenue potentials doesn't make much sense, and that's apart from what we all think about it, or how cool it is, etc...

Remember, we are invited to the party, but it's not our dollars getting spent here. I'm gonna respect that. Again, none of you have to do that, but I'm going to. And I like the people I've come to know, and I want them to succeed well enough for this whole thing to continue on nicely.

That is my primary consideration in just about everything I've ever written here.

ctwardell · 2013-12-01 14:46

Bill Henning wrote: »

I still have not heard transistor budgets from Chip, but at a guess, the 20% die are would buy us maybe another 64KB of hub space, or about 4KB more AUX for each cog.

We didn't really gain a full 20% because the core takes more room than it used to, that was what got him looking at killing the DAC bus in the first place.
He mentioned that the easiest option was to double or quadruple the AUX RAM.

C.W.

potatohead · 2013-12-01 14:47

Cluso,

Noted. And I stand corrected.

@All, please ignore my "crappy" comment.

I'm perfectly happy to be wrong, on record, about it.

Cluso99 wrote: »

I wanted to declare my potential bias.

Wrong context - I will be staying (love the P1 & P2 whatever form). Otherwise my project would have been done with another processor.
My context was from "another engineers" perspective, not mine - I want a simple sw design and multiple cogs deliver this!!!

Heck, one P1 only drives scans a 18 key keypad, drives an LCD, controls contrast etc, and communicates serially. Not exactly rocket science here, an AVR would do just fine. This way I use one set of common components and easy programming, and this means more than the cost of this project.

Bill Henning · 2013-12-01 14:50

Thanks. That makes sense. Still want to know the transistor budget, and cost (per type of memory) as I am sitting on the fence regarding hub,aux,or some mix.

ctwardell wrote: »

We didn't really gain a full 20% because the core takes more room than it used to, that was what got him looking at killing the DAC bus in the first place.
He mentioned that the easiest option was to double or quadruple the AUX RAM.

C.W.

localroger · 2013-12-01 14:53

potatohead wrote: »

Cars are shipped to end users speed limited for precisely the same reasons we are discussing here.

What? In what world are cars shipped with speed limiters? I know of some commercial vehicles that are, but cars?

My former boss (he recently sold the company) just bought a Porsche Panamera. Back in the 1980's he owned a 928 and I guess he was feeling nostalgic. So one impulse purchase plus check for $160,000 and he drives off with a car that can go -- wait for it -- 190 miles per hour. That's about 300 km/hr for you metric folks. At a certain point the suspension changes and wings emerge to keep it stable. Has he been trained to drive it? HA. Did I mention that he is 85 years old? Sure he's in good health but he has just taken delivery of a personal rocket ship that can go a kilometer in 12 seconds. I'm only about to turn 50 and I wouldn't trust my reflexes with that kind of power.

But the flip side is that G. doesn't have to drive his car at its maximum speed. People like powerful things even when the practical utility isn't all that evident.

User Name · 2013-12-01 14:57

And to think. All I ever wanted from a PII originally was an order of magnitude faster execution speed and 256K RAM and 64 pins....

Unfortunately, heater, if you stick with that plan you'll have a processor that will already be six years old the day it hits the streets.

Final solution...

Chip: Implement it. Don't tell anyone.

OBEX: Ban the use of the instruction(s).

Problem solved, lets move on.

PERFECT!

jmg · 2013-12-01 14:57

Bill Henning wrote: »

Cluso99 wrote:

I also wondered about a small block of memory (say 16 * 32+1 bits where the 1bit is set when written, reset when read) in the centre of the die which could be shared by all cogs.

Edit: yes, I saw your earlier post, and such a 16x32+1 buffer would be very useful, though I'd prefer at least 32x32

True 8 port memory is silicon costly, and still cannot quite be multi-port, as you cannot allow two cogs to write the same cells on the same clock. Read is OK. (HUB ram solves this via time slots.)
So in use, most DP-RAM is agreement-split, so one cog reads 0..N, writes N..M, and the other does the converse, in mail box fashion.

Bill Henning wrote: »

... and I am not worried about obex issues -Personally, I'd love to have (any one of Chip's, yours or mine) hub slot control schemes, as I could sure use the bandwidth for high bandwidth uses.

Agreed, the benefits are there for serious use, and Starter OBEXs are unlikely to be impacted.

potatohead · 2013-12-01 15:02

Yep. Speed limiters.

Found on most cars in the US. There are a lot of companies selling chips to circumvent these and overall RPM limiters too. This has been true for a while.

Here in Oregon, circumventing these on little Honda type cars got popular. Kids running nitros and turbos going 160 in a Honda Civic! I would sometimes go out on the freeway to watch 'em zing by! Scary stuff, but I'm sure they got one heck of a buzz off doing it!

We now have the "speed demon" law. It's OK to mod your car, because it's yours. But, if you are caught over 100, they will take your car and crush it.

Anyway, yes. They are shipped with speed limiters, and optional ones that work for say, fleet vehicles. Maybe capping speeds at 80 Mph or something.

cgracey · 2013-12-01 15:04

In trying to find a Prop2 die picture, I stumbled onto some Prop 2 marketing that none of us knew existed:

http://www.funnyordie.com/videos/1eb5f61e51/what-is-prop-2-what-isnt-prop-2-answers-based-on-real-life-people-from-dominikawolski

potatohead · 2013-12-01 15:06

Excellent!

Bill Henning · 2013-12-01 15:06

LOL!

cgracey wrote: »

In trying to find a Prop2 die picture, I stumbled onto some Prop 2 marketing that none of us knew existed:

http://www.funnyordie.com/videos/1eb5f61e51/what-is-prop-2-what-isnt-prop-2-answers-based-on-real-life-people-from-dominikawolski

Cluso99 · 2013-12-01 15:07

Bill Henning wrote: »

Most microcontrollers, programmed in C, have a stack of 1k or less. Often much less.

They normally use 16 bit addresses, so that is 512 entries or less.

Doubling the AUX gets us to parity.

I like more HUB too, but it is precisely because of C code I think more AUX may be more useful.

Single cycle access to parameters and local variables, without having to cache it in registers, is mighty tempting.

However this is a theoretical debate, as I am perfectly happy with more hub ram, or more aux memory - whichever Chip prefers or finds more practical.

For the P2, I do NOT want more cog memory - WAY too big a can of worms at this point.

What I want is real P2 chips, as soon as practical (given transistor re-layout, video dac changes necessitated by transistor size change)

Wonder if we should take this to a separate thread as it may get lost here? Quite happy to delete this post if you agree.

I thought from Chip had said, that the hub can really only be expanded in largish blocks.
Initially I though hub was more important, but with the new aux ram instructions, I think it is now perhaps more valuable here as I don't see another 64KB of hub space fitting.

Because the aux ram block would need to be redone, I drew an alternative possibility many posts ago.

Current design is 1.5 port aux ram. The video gen reads the aux via the 0.5 port. The other r/w port is shared between hub and cog, but the hub can also r/w the quad-long cache register. For video, this design is great because it can get unlimited aux read access even while aux is being updated.

I wondered if the aux ram was 2.5 port, with a r/w port to the cog, and a separated read & write port where the r/w goes to the hub, and the read part goes to the video. That means that the hub is now on the video side instead of the hub side.
Why?
Now there could be simultaneous aux access for hub, cog and video gen.
When aux is used for video, then data flow is write to aux from hub and read from aux to video.
When aux is not used for video, then data flow is r/w to aux and hub, and r/w and cog. Both can be simultaneous - only issue to solve is what happens to write to the same aux by both hub and cog.

Remember, this block gets replicated 8 times, and it is in turn a replication of a "long" block. So IMHO its not a big deal to change the port accessing - Chip and Beau will know this anyway. SO this is a suggestion only.

potatohead · 2013-12-01 15:13

I agree, with one addition:

- what will be done with the 20% of die space freed up by getting rid of the video ring?

Totally agreed. I don't have an opinion formed. I lean toward HUB memory, because it seems to always end up being about HUB memory...

Cluso99 · 2013-12-01 15:16

potatohead wrote: »

Cluso,

Noted. And I stand corrected.

@All, please ignore my "crappy" comment. I'm perfectly happy to be wrong, on record, about it.

I am happy with your comment. I just wanted you to know its not my view. I love the prop and Parallax too. We all do here. That's the reason the debate is so lively. Otherwise we wouldn't care about the result. In the end, we are all going to use the P2.

Cluso99 · 2013-12-01 15:23

jmg wrote: »

True 8 port memory is silicon costly, and still cannot quite be multi-port, as you cannot allow two cogs to write the same cells on the same clock. Read is OK. (HUB ram solves this via time slots.)
So in use, most DP-RAM is agreement-split, so one cog reads 0..N, writes N..M, and the other does the converse, in mail box fashion.

Haven't gone into details.
I was not thinking of multiple ports. Just one cog at a time, or perhaps 2 as in one read and one write. Priority is next available cog but no fixed slots.
Can be discussed if others think its a good idea. Even willing to listen if it could totally replace the PortD?

potatohead · 2013-12-01 15:24

Cluso99 wrote: »

I am happy with your comment. I just wanted you to know its not my view. I love the prop and Parallax too. We all do here. That's the reason the debate is so lively. Otherwise we wouldn't care about the result. In the end, we are all going to use the P2.

I'm sorry I doubted. I really do know better. Whew. Hate it when I have to deal with a gaffe. Sucks.

Cluso99 · 2013-12-01 15:28

Thanks for the link Chip - you brightened up my day!

Heater. · 2013-12-01 15:39

Just a reminder. My concern is not that newbie or naive users will misuse any HUB slot sharing scheme. They won't, they will pull ready made objects from OBEX and elsewhere that implement the functionality they want. They will complete their project in Spin. After all it's those objects that fill the void of the Prop not having any dedicated peripherals in silicon. This is a big feature of the Propeller.

What I don't want to see is such a user, selecting a bunch of software "parts" and then finding that they do not play together, in mysterious ways, because of weird HUB timing requirements or excessive HUB band width demands.

If such users are no longer a concern then go ahead make it as complicated and impenetrable as you like.

User Name,

Unfortunately, heater, if you stick with that plan you'll have a processor that will already be six years old the day it hits the streets.

Quite so. I realise that. Sadly I could argue that that is also true of the Prop II as it stands.

For example I'm not taken by all the video capabilities. As impressive as they are. As far as I can see very soon there won't be a VGA screen to drive from it anywhere round here. Besides when I want wizzy graphics I now have a Raspberry Pi with a huge GPU providing accelerated OpenGL into HDMI for 30 dollars.

jmg · 2013-12-01 15:41

Cluso99 wrote: »

Haven't gone into details.
I was not thinking of multiple ports. Just one cog at a time, or perhaps 2 as in one read and one write. Priority is next available cog but no fixed slots.
Can be discussed if others think its a good idea. Even willing to listen if it could totally replace the PortD?

A FIFO model might be better, as it maps into a narrower footprint, but still gives good transport BW.
Writes would be MUX selected (so a write says which COG to go to) and each COG has a read-fifo.
Message tag Flags are useful, which could map onto C,Z flags at low cost ?

Leaves the question of do you need to tag the source COG @ every write ? - costs more bits, but allows mixed writes.
Or, do you gate at the write end, so only one COG can fill a FIFO, others have to wait ?
The latter simplifies message reads, and queues writes.

Propeller II update - BLOG

Comments