Welcome to the Parallax Discussion Forums, sign-up to participate.
David Betz wrote: »
Why do you say only 25% faster?
Bill Henning wrote: »
I run a real consulting / design company - that is my "day" job, and I do not reveal sales numbers. It is none of anyone's business.
With all due respect, what have you contributed? What right do you have to say that Ross' or my advice is valueless?
I've contributed LMM. many optimizations, a lot of contribution to the P2 design, partially responsible for saving at least one shuttle run. On a P1, I have made a product with 256 color high resolution bitmapped graphics with 20MB/sec external memory on a Propeller 1 - something that was considered impossible. I am not going to waste time going into more detail, as I do not have to justify anything to you..
Ross has also contributed a LOT over the years, including the Catalina C compiler.
In the forums, the normal rule is, make technical arguments, not ad hominem attacks.
mindrobots wrote: »
It was not meant as an unfair attack at Ross. I greatly respect Ross for the selfless and thankless contributions he makes to the community, Catalina is a brilliant piece of work and were I a C programmer, I would be using it. The terms of "now", "soon", "easy" and other ambiguous terms have been thrown around by many, he was the last poster I read using those terms. I should have prefaced my definitions with "This is what I think" and closed it with "What do you mean by those terms?" If that caused Ross any offense, I am sorry for not wrapping my post properly. I am sorry for getting frustrated by all the technical bickering.
koehler wrote: »
Bill, how many thousand units/yr can Parallax then expect you to order?
Ross's thread ( I assume he started it ) is a pretty useful measure as anyone for/against on the forum can say aye/nay.
However, both it, and your opinion as to what Parallax should actually commit to doing, and spending another $50-100,000 on, are equally-- value-less.
Actually, unless you actually do routinely order large numbers of P1's, or will do so with P2's, Ross's thread has more merit on sales alone.
What really matters, ethically speaking, is what the customers who make up the bulk of Parallax's revenue are looking for, correct?
If this were Atmel or Microchip with the deep pockets, then it'd be fine.
Parallax is at least somewhat Capitol constrained, so having them make a decision based on what is probably less than a couple thousand units is not in their best, long term interests.
Especially if anyone wants an eventual P3.
koehler wrote: »
Bill, for someone who's detailed repeatedly how logical you are, you seem to have missed the actual context of my comment. EDIT- Actually, I messed up.
However, both it, and your opinion should have read:
However, both it, and your opinion, mine, and everyone else's.
I'm smart enough at least to know that you probably do know the Prop inside out, sideways, and backwards and forwards.
Thats one reason I find your exchanges with rmg so interesting to follow. You both are obviously professionals in a sea of enthusiasts.
However, my specific comment is I think, still valid.
For the most part, your's, Ross's, mine, and everyone else's 'opinion' on what path is best is useless to Parallax.
I did CYA though, insofar as and unless someone here were in fact buying commercial quantities, as (1) and only (1) person in this thread has attested too. And since you don't want to reveal whether or not you are such a customer, you can't complain that you're not accounted as one either.
You may 'want' to have the next product be a 4,5,6 P2 Cog device, thats fine. That doesn't mean that its actually the best plan for Parallax.
Its quite possible a lot of the commercial quantity customers find the downgrade to 4 Cogs, addition of multi-threading to be enough to not embrace.
What does Parallax do then? What of comparable value do all the forumista pushing for P2 or death lose?
You have no problem dishing-out 'sarcasm' and make a mountain out of a mole hill regarding what was really a fair, open to everyone poll.
But someone dare ask you a legitimate question in a bit of a cheeky manner, and its all all out ad hominem attack...
I think if you read the next post after that one, you'd see where I also commented that someone's post about Ross seemed a rather poor strawman ambush on someone who has demonstrably given real value back to the Prop community.
I think a number of folks including myself have already given the advise that Ken was already probably going to be doing next week anyways.
Disregard consensus polls, and opinions on the forum, and see what path your current large customers are willing to go along with.
Feel free to ignore my posts in the future.
RossH wrote: »
Yes, I thought Bill had established that the P16X32B would be between 2 and 5 times faster. We can argue whether it is closer to 2 than 5, but it is certainly not just "25%"
In any case, after reading through the last page or two of posts here, I believe I can honestly say two things:
I am not offended by anyone's posts. Everybody here has the right to disagree with my opinions, no matter how silly that makes them appear!
The P16X32B would appear to be a perfect fit for the immediate needs Parallax's customers have evinced - i.e. a code-compatible P1 with faster speed, extra RAM, extra I/O pins and better analog capabilities.
Bill Henning wrote: »
I gave a detailed response to David's question, sorry, I don't have the link handy, will try to find it for you later, showing why a simple hubexec (no cache lines in hardware, only 32 bit wide hub bus, exactly like on a P1, which is the stated goal of P16E32) could only be slightly faster than LMM. Quite simply it is due to hub windows, and no cache.
Ken Gracey wrote: »
Those customers have asked for:
- more RAM
- faster speed
- code protect
. . .and those who didn't use it due to language choices would like efficient use of C.
pedward wrote: »
There is tons of code right now that depends on having 8 resources. Cutting them in half, then having to figure out how to mix 4 of your previously separate cogs into 1 cog is just making life more difficult for developers, the customers who buy the product. Let's not forget that if you divide the cogs by 2, you get half the counters, so for applications that previously required a lot of counters, they may not fit the 4 cog P2.
RossH wrote: »
I don't know what a P16E32 is. I am talking about Chip's proposed 200Mhz, P16X32B. That will easily be much more than twice as fast as the P1.
pedward wrote: »
And this is why I'm going to say again: Reduce the complexity of the P2 COG so that you can reduce power footprint.
I can't imagine the power envelope wasn't modeled before the last shuttle run, so if the last shuttle run was PEP compatible with the old package, why the heck are we talking 4 cogs and potentially 4-5W at this time?
It seems so obvious to me that all the cheerleading has lead the chip off course and into an area where all of these neat theoretical features have hamstrung the rest of the development objective.
I strongly recommend paring back the logic to just have hubex with a single cache line, get rid of the hubex logic for the other 3 threads, get rid of the task switching, and rollback any of the instruction complications that compromise manufacturability.
Right now there is talk of 4 cogs for the P2, 16 cogs for the P1B, this dichotomy is unreal. Yes the P2 has multi-threading, but do I have to remind everyone that this is achieved by interleaving the 4 pipeline stages? You divide the base instruction clock by 4, plus overhead due to threading (jumps, etc).
Hubex with 1 thread makes C really accessible and allows code to be generated easily -- making efficient use of multiple pipeline threads with compiled C code is going to be much more difficult and may not be achievable with the current GCC resources available. (I said *efficient*)
Having 4 cogs is going to mean just 4 processes the vast majority of users that just want to use the chip. Accessing the threading from a high level language will be possible, but it won't be clean looking at all.
I am quite disheartened that the P2 development seems to have derailed due to a crazy amount of suggestions since the last shuttle run. The objective of this period was to *fix* the P2 from the last shuttle run and remedy a couple of shortcomings.
I was initially against the hubex because I told Chip it would sideline development for 4 months to get it right. Well, in those 4 months a *lot* more has happened than just hubex, the kitchen sink made it into the P2, and now you can use it as a coffee warmer!!!
Yeah, Bill is going complain about what I've written above, maybe jmg will weigh in too, but the bottom line is that the P2 as envisioned right now is *NOT MANUFACTURABLE*.
Simply cutting off appendages to make it fit a PEP is cutting away the trademark that made the Propeller special: 8 cogs. There is tons of code right now that depends on having 8 resources. Cutting them in half, then having to figure out how to mix 4 of your previously separate cogs into 1 cog is just making life more difficult for developers, the customers who buy the product. Let's not forget that if you divide the cogs by 2, you get half the counters, so for applications that previously required a lot of counters, they may not fit the 4 cog P2.
Bill Henning wrote: »
Roughly 2.25x to 2.5x faster than p1 lmm using simple hubexec (no new hub jump/call/ret instructions, no cache, 32 bit wide long hub access, just like p1)
Everybody here has the right to disagree with my opinions, no matter how silly that makes them appear!
RossH wrote: »
Ok, 2.5 for "bare-bones" LMM. I still think it will be more with caching and for CMM, but I'm not going to argue it since we can't actually benchmark it.
But 2.5x means it is significantly faster than the P1, and so it would seem that the P16X32B is exactly what Parallax's customers are clamoring for.
koehler wrote: »
Bill, I'm a big boy, I have no problem admitting my mistakes.
However, lets be honest, because literal minded-ness doesn't explain your post.
If it did, it would have been a bit different as its pretty clear from my post that my comment related to the value of posters vis-a-vis the big picture for Parallax, ie Revenue. and big capitol outlays.
Its clear in the sentence directly after my naming yourself and Ross, and twice again in the paragraph below that.
Being literal, the sentence where I also explicitly say that it is so, unless you happen to be a big commercial customer would have driven that home.
More likely what happened is your one of the top guys on the forum.
Its acceptable to you, to publicly dis someone else's poll as rigged or duplicitous.
However if someone dares to question you, you take umbrage.
If you are literally minded, you should recognize this as a bit double-standard-ish.
No worries, I fly off the handle sometimes before fully comprehending something too.
We can discuss/argue/ignore off thread.
No...AUX is needed for...
The premise of the Prop has been simplicity, and interruptless, single-threading functionality.
The P2 option now available appears to jettison that simplicity, and require multi-threading. Without useful interrupts at that.
Roy Eltham wrote: »
I'm really bummed right now, because going down to 4 cogs pretty much kills the P2 for me.
I really hate the idea of having to do the multitasking thing to get enough parallel stuff happening. Squeezing 4 cog drivers into 1 cogs memory is a drag. You also have to constantly be aware of the multitasking issues and limitations. It's going to be really un-fun to code.
Most of the "real" P1 projects I have worked on (or am working on) utilize 6-8 cogs, and several of them are using most or all of the cogs memory. I'd MUCH prefer nuking a bunch of features and reducing HUB memory to keep 8 cogs on the P2.
For me 8 verses 4 cogs is the difference between coding the on the P2 being fun verses it being a chore (that I don't want to do, and probably wont).
cgracey wrote: »
I agree. Four cogs feels very claustrophobic. It can be argued that they are so much more powerful than Prop1 cogs, but you'd have to carefully mix programs into them - which would NOT be fun. I like the feeling of being able to fire up another cog without any other contingencies.
cgracey wrote: »
Four cogs feels very claustrophobic. It can be argued that they are so much more powerful than Prop1 cogs, but you'd have to carefully mix programs into them - which would NOT be fun...
...That thing just needs a way smaller process to be viable. Trying to do it in 180nm is a mess of compromises.
jmg wrote: »
Then why not add some P1E Cogs to a group of P2 COGS ?
Users get a superset of P1 they can easily ramp, and power users get some P2 COGs.
Hubexec come in P2, so is not needed in P1E, keeping things simple.
Brian Fairchild wrote: »
To be a viable chip the P2 has to move onto a smaller process and making that move will cost $$$.
Heater. wrote: »
Mixing up P1 and PII style COGS is a total nightmare of an idea.
Coley wrote: »
rogloh wrote: »
I've been wondering the same recently but there are some issues to consider....
For example if you had a device with 4 P2 + 4 P1 COGs combined
Could potentially reuse existing some P1 codebase giving an instant step up for P2. Though I/O needs consideration and complicates this a whole lot.
Extra P1 COGs could still do all deterministic I/O stuff, no hubexec, no tasking, leaving P2 for more powerful things
Eases/delays the transition to P2 for existing P1 users
How do you boot the thing?
Tools could be a total nightmare to manage if not integrated well, needing both P1 and P2 objects in your app using two instruction sets which is very weird indeed
Non-uniform system, requires careful planning to partition between P1 and P2 COGs for best performance
New P2 users have to learn old P1 stuff as well as the new P2 stuff, quite a lot of potential baggage to deal with
Might be a very complex hardware development to integrate COGs together, with more opportuntities for mistakes/problems
Still may not fit die size/budget
I think the cons do significantly outweigh the pros there.
jmg wrote: »
You have missed the biggest issue : 8 x P2 COGS is not a solution and 8 x P1 COGs is not a large enough step.
None of the cons you list are brick walls, in the same way a Power Envelope is a brick wall.
COGS timeslice into the HUB just like they do now, A P2 cog does not care if his neighbour is P1E or P2
Yes it is non uniform, that is the strength - it allows a device to exist, that otherwise would not.
Yes, Software does the housekeeping stuff, that is what it is good at.