Oh there you go! Moving the goalposts! Great! Keep that Smile up, and there really will end up being a free lunch! Bet the powers that be temporally ban that too. Watch what you think!
I loved the BBC series! Very well done. NPR used to air it here, from time to time. I snagged it on cassette as a kid. Yes, maybe I will go get a copy. Really great radio theatre. Having read the books close to that time, the series had a great impact. Hmmm... where is that towel?
Actually, it will be able to do a lot of them. Skip MAME, just emulate the specific hardware.
@David: I enjoy your language related comments and often learn something new each time. Please reconsider, in moderation of course. The language wars here have frequently been enlightening to me personally. I enjoy the perspective.
I wonder if Chip has read Adams? Might have to ask one of these days, but not just now.
@David: I enjoy your language related comments and often learn something new each time. Please reconsider, in moderation of course. The language wars here have frequently been enlightening to me personally. I enjoy the perspective.
In my old age I'm beginning to think all languages are pretty much the same anyway so there is little reason to argue about them.
Oh well. Somebody will, and there is good stuff between the cracks of those things. Entertainment too! That's just how I roll.
I just want the FPGA. Chip will get it done, and then we can go for a test drive.
I suppose so. Maybe I'm just getting tired of everything. With these endless iterations of a never ending set of obscure features its hard to maintain interest or have any confidence that things will converge.
...I'm beginning to think all languages are pretty much the same anyway...
There speaks a guru.
Maybe I'm just getting tired of everything. With these endless iterations of a never ending set of obscure features its hard to maintain interest or have any confidence that things will converge.
Yep, know what you mean. Perhaps that why we have devolved into the bizarre exchange that has been going on here.
And it's part of why I'm gonna kick off a picture contest for P1 this weekend. We need a distraction, or at least some cool stuff to consider. Maybe that can be one for some of us.
In any case, I feel all that too, and I felt it really strong when the other design proved out crappy given the process physics. We were close!! (and I still think that one needs a go in a suitable process)
There is one little difference on this one. For the first time, Chip is expressing some need to resolve it this go around. Notice how not all the decisions are getting completely fleshed out? Dialogs here among us haven't changed, but his interaction has. And he's expressed flow, and I think we all know what that is. I love flow, and when it happens, things get done. Often done well.
The most tiresome thing for me is sorting out what Chip is really doing from the massive amount of commentary surrounding all of that. I realize I get more frustrated over the idea of some things than there exists a reality of them happening this time around too.
Overall, it was the best when we had the other design running GCC, things were starting to click, and then...
I don't have a lot of support for this opinion, but what I do have is here. Not much. But I really do think it's gonna click this time. The hard learning is done, lots of great pieces to work from, process understood. Maybe we are in for some good times a few weeks from now.
Some tuning out is likely very good right now.
@Heater: Definitely true for me. Some humor, fun thoughts, musings? Good food for the soul. If I've been of any service, you are quite welcome. That was my intent, and I assumed it was yours and others too.
Honestly, this thing has taken me into ugly places recently. I almost never do that. And I like everybody here, so doing that sucks big. Some, actually most of us, have been building things, exchanging code, cracking jokes, and sharing thoughts for way too long for it to go bad. I would hate it personally.
I suppose so. Maybe I'm just getting tired of everything. With these endless iterations of a never ending set of obscure features its hard to maintain interest or have any confidence that things will converge.
I know exactly what you mean. I haven't been so close to giving up on the Propeller anytime in the last 5 years.
Yes, this is definitely a drawback of HALTP - If we draw down on future determinism now, we are actually increasing the entropy of the future universe faster - effectively shortening the life of the universe as a result.
In fact, I've just done a quick calculation, and if Parallax sells as many of these new chip as they are currently forecasting, the universe would effectively end around 2017.
Still - think of all the cool stuff we could do in the meantime! "Live for today" is my motto!
pedward: I tried to dissuade Chip from pursuing Hub execution and DDR RAM on the P2.
Atta boy. Me too. It almost feels like the technology available to us (DDR RAM, SRAM, manufacturing processes) continues to change and that if we try to keep up with it we will never finish. However, just like buying a computer it saves a bunch to be a half-generation or more behind the latest available process. The idea about crowd funding $2M through Kickstarter for a faster process was an interesting one, however.
I thought you were in favor of hub execution because it increased C performance which is one of the things your customers asked for. I guess I'm completely confused about what Parallax wants from this chip.
I thought you were in favor of hub execution because it increased C performance which is one of the things your customers asked for. I guess I'm completely confused about what Parallax wants from this chip.
I'm not sure what happened here. It looks like several people including Ken deleted posts leaving my reply making little sense. Sorry.
Don't you hate when that happens. People don't realize that they cannot be so fast as to prevent posts being read and replied to ten times before they delete them on this forum. Still, it made sense to me.
Potatohead,
A little play seems the right thing to do.
Yep. There have been many disturbingly heated exchanges in these PII discussions as people propose, attack, defend, rinse, repeat, their preferred design choices. Getting far to personal at times.
Like Ross I can't help goofing off and putting up bizarro suggestions for light relief. Then we get attacked as being frivolous a holes.
Speaking of bizarro. Chip and Roy's new HUB arbiter scheme is so far removed from any kind of normality I just love it
Don't you hate when that happens. People don't realize that they cannot be so fast as to prevent posts being read and replied to ten times before they delete them on this forum. Still, it made sense to me.
Potatohead,
Yep. There have been many disturbingly heated exchanges in these PII discussions as people propose, attack, defend, rinse, repeat, their preferred design choices. Getting far to personal at times.
Like Ross I can't help goofing off and putting up bizarro suggestions for light relief. Then we get attacked as being frivolous a holes.
Speaking of bizarro. Chip and Roy's new HUB arbiter scheme is so far removed from any kind of normality I just love it
I want to think of a P1 project to work on that will help me forget P2 for a while...
Stared at that diagram for a while, just thinking... Of all things! Can't wait to give it a go. I think we all have concerns. Thing is, we all have or had concerns about the round robin too, and any of the schemes presented.
There are always concerns.
There will always be concerns too.
So, I'm going to roll with it. Get the FPGA, write some code and give it a go. Did that with the P1 long ago, and once it clicked, it clicked! Bet this one works the same way, just different enough to require an exercise or two to get sorted.
Some of the "fix" proposals mentioned are about maximizing it in some cases. Frankly, a lot of the more frustrating discussions have been or are surrounding some idea to maximize something or other. In the end, nearly every time, Chip has chosen some balance, and basic thing out of all of it.
Seems to me, this "mix master" scheme is common to every single thing mentioned! And it's fundamental, which I see as very important to Chip. He appreciates fundamental things when they can be realized. And by fundamental, I may mean "pure", in that the round robin was pure. It can get no simpler, and is a singular idea. This thing can get no simpler, and is a singular idea; namely, going from one serial round robin point to a parallel approach where all round robin points act together.
Brilliant, if a bit daunting to consider right now.
It's almost as if all the "make it work in parallel" discussion triggered a basic realization that the core of the P1 isn't actually parallel, but sequential. Fixed it in this one, didn't he?
Most of the time, I find myself thinking in terms of a singular event, my code on a COG, but the reality is we need to think in multiple events more, because that is what we will have going on now. So it gets slower on random accesses, bytes, words, longs, but every single COG is working at the same time now! 16 times more, "somewhat slower" things are going on concurrently!
Bet it plays out well. Hoping so. But we've got to bang on it some, just like we did the P1.
Speaking of bizarro. Chip and Roy's new HUB arbiter scheme is so far removed from any kind of normality I just love it
As do I. The devil is in the details, though. Chip said it required the same amount of silicon as two cogs to do the commutation -- I guess that means gobs of multiplexers. I can't begin to imagine the topological complexity of simulating a rotating memory on silicon that doesn't actually move! ... Hmmm ... I wonder if he's considered a MEMS implementation?
I'm not sure what happened here. It looks like several people including Ken deleted posts leaving my reply making little sense. Sorry.
The essence of your post and Ken's reply to pedward sort of explain a lot though.
After the Uber-P2 broke at 5W, the 'Way Forward' was proclaimed to be the 16 Core P1x, and the world was good.
Then, though there had been weeks or discussion and hundreds/thousands of posts on addressing the lack of bandwdith,
seems like going around the process put out by Ken caused this latest revision.
I get a very distinct 1984 impression at this point.
You are a slow learner, Winston."
"How can I help it? How can I help but see what is in front of my eyes? Two and two are four."
"Sometimes, Winston. Sometimes they are five. Sometimes they are three. Sometimes they are all of them at once. You must try harder. It is not easy to become sane.
The essence of your post and Ken's reply to pedward sort of explain a lot though.
After the Uber-P2 broke at 5W, the 'Way Forward' was proclaimed to be the 16 Core P1x, and the world was good.
Then
Well, it doesn't seem to be hubexec that is complicating things at this point. It's the new hub architecture. I'm sure it will be great though. Chip seems to have a good feel for what makes a good mix of features so I guess we'll have to wait to see what he comes up with.
Well, it doesn't seem to be hubexec that is complicating things at this point. It's the new hub architecture. I'm sure it will be great though. Chip seems to have a good feel for what makes a good mix of features so I guess we'll have to wait to see what he comes up with.
Thats the funny thing. Before it was a feature that was potentially impacting, now its the underlying infrastructure that is, and there are undoubtedly going to be more and more 'issues' that either require h/w fixes, or s/w snapcount widgets.....
Ken has a good feel for what are features that customers want and will pay for.
Unfortunately he acceded that he doesn't control Chip, and the rest is well, history.
potatohead, LOL, I've been trying and sometimes doing the same. I feel your pain....
Not trying to be offensive, however I think this is one area where your video bias is only going to be able to effectively reach the choir.
You're a video guy through and through. Its your passion, no problem.
You've been arguing non-stop that all that matters is that P: Ship the P16 now, B/W doesn't really matter, and everything in the P2 must behave as expected time/determinism-wise as in the P1.
Now that huge theoretical B/W numbers have been shown, most of what you were arguing for, you seem to be arguing no longer matters. You're now even fine if it adds a year to product....
For a video guy that to be expected, just as for someone who thinks 16 more 'normal' Cores makes me think Cluso's idea is more useful/profitable.
Its like arguing politics, we're just never going to change each other's mind.
In my old age I'm beginning to think all languages are pretty much the same anyway so there is little reason to argue about them.
I've long argued that all computer languages *ARE* the same, being constrained as they are by the underlying architecture. In many ways it's like spoken languages; they're only used by humans who all work the same way and who all have the same basic needs. Sure, the finer points of implementation might change a bit but they all express the same concepts. And current thinking seems to be that they all derive from a common source anyway.
I know exactly what you mean. I haven't been so close to giving up on the Propeller anytime in the last 5 years.
I am very close to giving up, sticking my dev kits on eBay and moving on.
Having watched the P2 development from the sidelines I *really* thought that an end was in sight, with silicon soon, when Chip made his '5W' post and 6 days later his 'New 16-Cog, 512KB, 64 analog I/O Propeller Chip' post.
The very harsh reality is that I, along with everyone else, is spoilt for choice in the 32-bit uC market. Chips that a few years ago were just a pipedream can now be had for $5 supported by $10 dev boards and free tools. Not a day goes by without an email from a distributor announcing a new chip and a new sub $10 board.
What's kept me interested is that the P1+, as originally outlined, was a perfect fit for how I design. My boards tend to have multiple uCs; need to drive a character LCD and read a keypad? Stick a small $1 20-pin uC in there and talk to it over SPI/I2C/USART. It's a great way to partition designs so that things can be developed incrementally and in a way that isolates functions from each other. As such the proposed 16-core chip looked great as it would allow me to keep the same design philosophy and use just one chip.
Ship the P16 now, B/W doesn't really matter, and everything in the P2 must behave as expected time/determinism-wise as in the P1.
I've argued it needs to get done. I've argued that we don't change the basic design idea of COG code always running the same on COGS. I didn't argue that P2 must behave as P1 did specifically.
Just to nit-pick a bit, I didn't say B/W didn't matter. What I did say was it could be had by using the COGS in parallel. Two different things.
The most potent argument I made against the various hub sharing schemes had to do with the fact that they would break one very basic strength of the Propeller and that is COG code runs on COGS equally. If you go back and read my words earlier, about the time Chip put tasks into the COG, Heater and I centered on the core unit of reuse being the COG. And that's where I've been at the whole time. Code running in one COG should not modulate code running in another COG, and with the round robin it doesn't. With the various table schemes, it does, because COGS perform differently, etc...
The most potent counter argument made was having the core unit of reuse be universal isn't worth the throughput, bandwidth and or latency costs inherent in a 16 COG design. That is a very strong argument! A secondary, and very strong argument, was one of individual control over having the silicon be more solid in it's behavior. Maximize the special case.
Finally, there was an overall desire to compromise put out there. I consider that a strong argument as well, and that's why I was looking to agree where I felt I could, and Cluso's pairing fell in to that idea, as did the super-cog. Though I liked mooch, it was problematic. No worries. The one time I put big text here, "I'm against all of it." was an expression of frustration over how complicated and painful HUB discussion had become.
This scheme is very interesting!
COG code still runs on COGS equally. There have been a couple of pathalogical cases shown where that's maybe not true, and I see that as a compromise!
For the most part, following a few basic rules Roy put out there means we can write code and put it on a COG, and it's going to perform just fine on any COG. Somebody can get down and dirty, one might say detailed, or pathological depending on how you want to characterize it, and create code that might only work well on a specfic COG. Guess what? There is the individual control out there for those who want to do it. Funny how that works, isn't it?
Finally, this scheme takes what was a serial core nature and turns it into a parallel nature. Instead of just one COG having access, all of them round robin at the same time! That opens the door for throughput and bandwidth, though latency is needing some code work to validate IMHO.
Hubex code can go fast as well! Nice bonus. And, unlike the slot schemes, it can go fast on every single COG at the same time, if somebody wants to do that. Spiffy!
Sum that up, and it's a winner!
Nobody got everything they wanted, everybody gets a lot of what they wanted, and that takes me to the final bit:
I also made a strong argument about how doing things in parallel instead of serially is a core design idea behind how the Propeller works. Use the COGS together, etc... This has been both affirmed and rebuffed, depending on how people want to do things. We get a lot more COGS now, and we get the speed to actually use them in parallel too. Dynamically, if we want to. Additionally, that has been done successfully, and often easily on P1 for a long time now. It works.
Well, this design very strongly emphasizes parallel operation in that all of the COGS can be doing stuff all of the time. So we might get some slower code in cases where we could fine tune with the round robin, but those all happen at the same time now!
It's a 16x "level up" on using COGS together, and I can't wait to see how it plays out. I think it's going to play out in precisely the way Roy said, and that is we put the instructions NEEDED in with HUB operations, instead of packing them to the max as we do with P1. And the trade off for that slop is the fact that we get to do it at the same time on all the COGS, not possible on the P1, which frankly kicks the Smile out of the various slot sharing schemes in terms of just what can get done and how fast.
Yes. I like video related things. It's been a hobby of mine since the 80's. Notice I'm not hammering on Chip for advanced features this time around? Read that again, it's important.
There is a reason for that, and the reason is what he's got planned is modest and it will serve the purposes he's mentioned and that others here express either desire for, or use regularly. That's good enough.
Given how aggressive the other engine was, I think poking fun at me for doing video related things is unwarranted overall. And everybody thought it was possible in the process too. Can't forget that.
I will note others here have asked for some rather aggressive and expensive features over and over and over and over. Just saying. I didn't do that, and have rarely done that, for what should be obvious reasons.
Video isn't the only thing I do with Propellers, but it's one that I do share here and collaborate with others on, because it's fun. Part of being here is fun. Part of it is learning things. Part of it is to learn to get stuff I want done that I don't share here, and I've got my reasons. Many people do.
That all said, "so what?"
I'm here sharing what I think same as anybody else. We get to do that. Generally, when we are not beating one another up, we are better for doing that too.
May I remind you AGAIN, of that poll? It's near 3:1 against "normal cores" on that other thread, so you can fixate on me if you want to, but it's not very productive. That one is just not in the majority at present.
As for the HUB scheme, now that I've clarified my core arguments, along with other worthy ones out there, and how that scheme tends to mesh with them, it's time to let the FPGA get done and we write some code and check it out.
Until then, no amount of thrashing around will resolve anything. Have an open mind. We may well find this is excellent!
I was very excited by the Uber-P2 only to see it burn in the fires of power consumption.
Now I am not so sure what we are getting. Seems every time we turn around it is just some hack to work around some limitation in the original P1 that is a bottleneck instead of just FIXING the issue.
What is probably the # 1 issue with the P1? Memory limitations. Primarily in the Cog. So just make cog ram bigger! Easy solution that every other micro does... Nope, can't because we have a locked in instruction design where everything has to be in a 32 bit register. Why? So we can have 'fun' self modifying code? The limitation is the D and S with 9 bits inline in the instruction. Just add real D and S registers and now that limitation goes away. Boom we can have cogs with more direct program usable memory. Chop the hub back to being a data sharing pool where you don't need to run programs out of it. Then you can go back to a simple round robin access method and not this Whirling cyclone of Data we now have.
Problem 2 is I/O pins. That is an easy fix just add 32 (or better yet 48) more with simple A/D that was asked for. But no, now we have to go with the ever expanding smart pins. They are a cool concept but they grow in complexity every day it seems.
Instead of eliminating bottlenecks by going with simple, and often tried and true solutions, we have another Prop monster waiting to hatch that will probably put us right back to where we were with Uber-P2.
I think Chip is brilliant! Problem is, as often is the case with people that amazingly smart, they need solid goal based management or they get so wrapped up in all of their ever expanding ideas that they never finish what they start.
The P2 development after all these years is a perfect example.
I would have been happy with a faster P1 core with a few new instructions (bit and pin control operators mainly like we had on the SX) and 16K or 32K cog ram for programs/data and more I/O with simple analog capabilities. Shared Hub ram could be 16K or 32K.
This is turning out to be the best threat on the P2 forum. I've got mixed feelings about a couple things:
- The new HUB scheme seems pretty cool but makes me a little nervous. It sounds like we can get very fast LMM (+/- hubexec)
- I really like composite video because the monitors are so cheap, but I'm afraid of the tangents and complexity to support it...
Overall, I think that once there is a chip on the way, Parallax will need to pay much, much more attention to software. Since SPIN will likely be compiled rather than interpretive, I'd like to see memory usage and procedural compatibility with GCC, so we can use objects interchangeably between languages. I like SPIN, but C is the langua franca. We cannot afford to have multiple OBEX or code libraries.
Just add real D and S registers and now that limitation goes away. Boom we can have cogs with more direct program usable memory.
I'm curious to know what you mean exactly by that.
On every processor architecture we have instructions which contain multiple fields. These fields indicate the operation to be performed and one or more operands that the operations should apply to. Often two operands, source and destination.
These operand fields need to exist in every single instruction, or often a large majority of them.
So, when you say "add real D and S registers" I'd be interested to see what your instruction encoding format would look like.
For sure it is a totally different machine to the Propeller.
Since SPIN will likely be compiled rather than interpretive, I'd like to see memory usage and procedural compatibility with GCC, so we can use objects interchangeably between languages. I like SPIN, but C is the langua franca. We cannot afford to have multiple OBEX or code libraries.
I agree with this and I think it should be possible if there is a will to do it.
First, let me state that I am not arguing in favor of this. I would be quite happy with the current 16 cogs and the hub that Chip proposed. IMHO once folks get familiar with it we will have a repeat of the amazing things the P1 was found to be capable of. Just saying it's not that hard to do.
A cog could address up to 64K of registers by having a 16 bit address counter and using the D or S field as a pointer to one of the first 512 memory locations, and that register would contain 16 bit D and S values for the instruction. It does not even slow down execution since fetching the register data can overlap with the instruction decode. Yes, it would need some other changes, but they would be relatively simple.
I'm curious to know what you mean exactly by that.
On every processor architecture we have instructions which contain multiple fields. These fields indicate the operation to be performed and one or more operands that the operations should apply to. Often two operands, source and destination.
These operand fields need to exist in every single instruction, or often a large majority of them.
So, when you say "add real D and S registers" I'd be interested to see what your instruction encoding format would look like.
For sure it is a totally different machine to the Propeller.
Comments
I loved the BBC series! Very well done. NPR used to air it here, from time to time. I snagged it on cassette as a kid. Yes, maybe I will go get a copy. Really great radio theatre. Having read the books close to that time, the series had a great impact. Hmmm... where is that towel?
@David: I enjoy your language related comments and often learn something new each time. Please reconsider, in moderation of course. The language wars here have frequently been enlightening to me personally. I enjoy the perspective.
I wonder if Chip has read Adams? Might have to ask one of these days, but not just now.
Half Life 3 confirmed!
Oh well. Somebody will, and there is good stuff between the cracks of those things. Entertainment too! That's just how I roll.
I just want the FPGA. Chip will get it done, and then we can go for a test drive.
Been done - see here. Didn't work out too well. Explosion of sun only just avoided. Best not to mess with this parallel universe stuff.
Ross.
And it's part of why I'm gonna kick off a picture contest for P1 this weekend. We need a distraction, or at least some cool stuff to consider. Maybe that can be one for some of us.
In any case, I feel all that too, and I felt it really strong when the other design proved out crappy given the process physics. We were close!! (and I still think that one needs a go in a suitable process)
There is one little difference on this one. For the first time, Chip is expressing some need to resolve it this go around. Notice how not all the decisions are getting completely fleshed out? Dialogs here among us haven't changed, but his interaction has. And he's expressed flow, and I think we all know what that is. I love flow, and when it happens, things get done. Often done well.
The most tiresome thing for me is sorting out what Chip is really doing from the massive amount of commentary surrounding all of that. I realize I get more frustrated over the idea of some things than there exists a reality of them happening this time around too.
Overall, it was the best when we had the other design running GCC, things were starting to click, and then...
I don't have a lot of support for this opinion, but what I do have is here. Not much. But I really do think it's gonna click this time. The hard learning is done, lots of great pieces to work from, process understood. Maybe we are in for some good times a few weeks from now.
Some tuning out is likely very good right now.
@Heater: Definitely true for me. Some humor, fun thoughts, musings? Good food for the soul. If I've been of any service, you are quite welcome. That was my intent, and I assumed it was yours and others too.
Honestly, this thing has taken me into ugly places recently. I almost never do that. And I like everybody here, so doing that sucks big. Some, actually most of us, have been building things, exchanging code, cracking jokes, and sharing thoughts for way too long for it to go bad. I would hate it personally.
A little play seems the right thing to do.
I know exactly what you mean. I haven't been so close to giving up on the Propeller anytime in the last 5 years.
Ross.
Guru! What a great name for a new programming language!
Don't you hate when that happens. People don't realize that they cannot be so fast as to prevent posts being read and replied to ten times before they delete them on this forum. Still, it made sense to me.
Potatohead, Yep. There have been many disturbingly heated exchanges in these PII discussions as people propose, attack, defend, rinse, repeat, their preferred design choices. Getting far to personal at times.
Like Ross I can't help goofing off and putting up bizarro suggestions for light relief. Then we get attacked as being frivolous a holes.
Speaking of bizarro. Chip and Roy's new HUB arbiter scheme is so far removed from any kind of normality I just love it
Completely unexpected! I had the same response!
Stared at that diagram for a while, just thinking... Of all things! Can't wait to give it a go. I think we all have concerns. Thing is, we all have or had concerns about the round robin too, and any of the schemes presented.
There are always concerns.
There will always be concerns too.
So, I'm going to roll with it. Get the FPGA, write some code and give it a go. Did that with the P1 long ago, and once it clicked, it clicked! Bet this one works the same way, just different enough to require an exercise or two to get sorted.
Some of the "fix" proposals mentioned are about maximizing it in some cases. Frankly, a lot of the more frustrating discussions have been or are surrounding some idea to maximize something or other. In the end, nearly every time, Chip has chosen some balance, and basic thing out of all of it.
Seems to me, this "mix master" scheme is common to every single thing mentioned! And it's fundamental, which I see as very important to Chip. He appreciates fundamental things when they can be realized. And by fundamental, I may mean "pure", in that the round robin was pure. It can get no simpler, and is a singular idea. This thing can get no simpler, and is a singular idea; namely, going from one serial round robin point to a parallel approach where all round robin points act together.
Brilliant, if a bit daunting to consider right now.
It's almost as if all the "make it work in parallel" discussion triggered a basic realization that the core of the P1 isn't actually parallel, but sequential. Fixed it in this one, didn't he?
Most of the time, I find myself thinking in terms of a singular event, my code on a COG, but the reality is we need to think in multiple events more, because that is what we will have going on now. So it gets slower on random accesses, bytes, words, longs, but every single COG is working at the same time now! 16 times more, "somewhat slower" things are going on concurrently!
Bet it plays out well. Hoping so. But we've got to bang on it some, just like we did the P1.
-Phil
The essence of your post and Ken's reply to pedward sort of explain a lot though.
After the Uber-P2 broke at 5W, the 'Way Forward' was proclaimed to be the 16 Core P1x, and the world was good.
Then, though there had been weeks or discussion and hundreds/thousands of posts on addressing the lack of bandwdith,
seems like going around the process put out by Ken caused this latest revision.
I get a very distinct 1984 impression at this point.
You are a slow learner, Winston."
"How can I help it? How can I help but see what is in front of my eyes? Two and two are four."
"Sometimes, Winston. Sometimes they are five. Sometimes they are three. Sometimes they are all of them at once. You must try harder. It is not easy to become sane.
Doesn't matter in the scheme of things. Sorry.
Thats the funny thing. Before it was a feature that was potentially impacting, now its the underlying infrastructure that is, and there are undoubtedly going to be more and more 'issues' that either require h/w fixes, or s/w snapcount widgets.....
Ken has a good feel for what are features that customers want and will pay for.
Unfortunately he acceded that he doesn't control Chip, and the rest is well, history.
potatohead, LOL, I've been trying and sometimes doing the same. I feel your pain....
Not trying to be offensive, however I think this is one area where your video bias is only going to be able to effectively reach the choir.
You're a video guy through and through. Its your passion, no problem.
You've been arguing non-stop that all that matters is that P: Ship the P16 now, B/W doesn't really matter, and everything in the P2 must behave as expected time/determinism-wise as in the P1.
Now that huge theoretical B/W numbers have been shown, most of what you were arguing for, you seem to be arguing no longer matters. You're now even fine if it adds a year to product....
For a video guy that to be expected, just as for someone who thinks 16 more 'normal' Cores makes me think Cluso's idea is more useful/profitable.
Its like arguing politics, we're just never going to change each other's mind.
I've long argued that all computer languages *ARE* the same, being constrained as they are by the underlying architecture. In many ways it's like spoken languages; they're only used by humans who all work the same way and who all have the same basic needs. Sure, the finer points of implementation might change a bit but they all express the same concepts. And current thinking seems to be that they all derive from a common source anyway.
I am very close to giving up, sticking my dev kits on eBay and moving on.
Having watched the P2 development from the sidelines I *really* thought that an end was in sight, with silicon soon, when Chip made his '5W' post and 6 days later his 'New 16-Cog, 512KB, 64 analog I/O Propeller Chip' post.
The very harsh reality is that I, along with everyone else, is spoilt for choice in the 32-bit uC market. Chips that a few years ago were just a pipedream can now be had for $5 supported by $10 dev boards and free tools. Not a day goes by without an email from a distributor announcing a new chip and a new sub $10 board.
What's kept me interested is that the P1+, as originally outlined, was a perfect fit for how I design. My boards tend to have multiple uCs; need to drive a character LCD and read a keypad? Stick a small $1 20-pin uC in there and talk to it over SPI/I2C/USART. It's a great way to partition designs so that things can be developed incrementally and in a way that isolates functions from each other. As such the proposed 16-core chip looked great as it would allow me to keep the same design philosophy and use just one chip.
I think this stems from the fact that they don't know.
I've argued it needs to get done. I've argued that we don't change the basic design idea of COG code always running the same on COGS. I didn't argue that P2 must behave as P1 did specifically.
Just to nit-pick a bit, I didn't say B/W didn't matter. What I did say was it could be had by using the COGS in parallel. Two different things.
The most potent argument I made against the various hub sharing schemes had to do with the fact that they would break one very basic strength of the Propeller and that is COG code runs on COGS equally. If you go back and read my words earlier, about the time Chip put tasks into the COG, Heater and I centered on the core unit of reuse being the COG. And that's where I've been at the whole time. Code running in one COG should not modulate code running in another COG, and with the round robin it doesn't. With the various table schemes, it does, because COGS perform differently, etc...
The most potent counter argument made was having the core unit of reuse be universal isn't worth the throughput, bandwidth and or latency costs inherent in a 16 COG design. That is a very strong argument! A secondary, and very strong argument, was one of individual control over having the silicon be more solid in it's behavior. Maximize the special case.
Finally, there was an overall desire to compromise put out there. I consider that a strong argument as well, and that's why I was looking to agree where I felt I could, and Cluso's pairing fell in to that idea, as did the super-cog. Though I liked mooch, it was problematic. No worries. The one time I put big text here, "I'm against all of it." was an expression of frustration over how complicated and painful HUB discussion had become.
This scheme is very interesting!
COG code still runs on COGS equally. There have been a couple of pathalogical cases shown where that's maybe not true, and I see that as a compromise!
For the most part, following a few basic rules Roy put out there means we can write code and put it on a COG, and it's going to perform just fine on any COG. Somebody can get down and dirty, one might say detailed, or pathological depending on how you want to characterize it, and create code that might only work well on a specfic COG. Guess what? There is the individual control out there for those who want to do it. Funny how that works, isn't it?
Finally, this scheme takes what was a serial core nature and turns it into a parallel nature. Instead of just one COG having access, all of them round robin at the same time! That opens the door for throughput and bandwidth, though latency is needing some code work to validate IMHO.
Hubex code can go fast as well! Nice bonus. And, unlike the slot schemes, it can go fast on every single COG at the same time, if somebody wants to do that. Spiffy!
Sum that up, and it's a winner!
Nobody got everything they wanted, everybody gets a lot of what they wanted, and that takes me to the final bit:
I also made a strong argument about how doing things in parallel instead of serially is a core design idea behind how the Propeller works. Use the COGS together, etc... This has been both affirmed and rebuffed, depending on how people want to do things. We get a lot more COGS now, and we get the speed to actually use them in parallel too. Dynamically, if we want to. Additionally, that has been done successfully, and often easily on P1 for a long time now. It works.
Well, this design very strongly emphasizes parallel operation in that all of the COGS can be doing stuff all of the time. So we might get some slower code in cases where we could fine tune with the round robin, but those all happen at the same time now!
It's a 16x "level up" on using COGS together, and I can't wait to see how it plays out. I think it's going to play out in precisely the way Roy said, and that is we put the instructions NEEDED in with HUB operations, instead of packing them to the max as we do with P1. And the trade off for that slop is the fact that we get to do it at the same time on all the COGS, not possible on the P1, which frankly kicks the Smile out of the various slot sharing schemes in terms of just what can get done and how fast.
Yes. I like video related things. It's been a hobby of mine since the 80's. Notice I'm not hammering on Chip for advanced features this time around? Read that again, it's important.
There is a reason for that, and the reason is what he's got planned is modest and it will serve the purposes he's mentioned and that others here express either desire for, or use regularly. That's good enough.
Given how aggressive the other engine was, I think poking fun at me for doing video related things is unwarranted overall. And everybody thought it was possible in the process too. Can't forget that.
I will note others here have asked for some rather aggressive and expensive features over and over and over and over. Just saying. I didn't do that, and have rarely done that, for what should be obvious reasons.
Video isn't the only thing I do with Propellers, but it's one that I do share here and collaborate with others on, because it's fun. Part of being here is fun. Part of it is learning things. Part of it is to learn to get stuff I want done that I don't share here, and I've got my reasons. Many people do.
That all said, "so what?"
I'm here sharing what I think same as anybody else. We get to do that. Generally, when we are not beating one another up, we are better for doing that too.
May I remind you AGAIN, of that poll? It's near 3:1 against "normal cores" on that other thread, so you can fixate on me if you want to, but it's not very productive. That one is just not in the majority at present.
As for the HUB scheme, now that I've clarified my core arguments, along with other worthy ones out there, and how that scheme tends to mesh with them, it's time to let the FPGA get done and we write some code and check it out.
Until then, no amount of thrashing around will resolve anything. Have an open mind. We may well find this is excellent!
---says, "spud" that crazy video guy. lol
Now I am not so sure what we are getting. Seems every time we turn around it is just some hack to work around some limitation in the original P1 that is a bottleneck instead of just FIXING the issue.
What is probably the # 1 issue with the P1? Memory limitations. Primarily in the Cog. So just make cog ram bigger! Easy solution that every other micro does... Nope, can't because we have a locked in instruction design where everything has to be in a 32 bit register. Why? So we can have 'fun' self modifying code? The limitation is the D and S with 9 bits inline in the instruction. Just add real D and S registers and now that limitation goes away. Boom we can have cogs with more direct program usable memory. Chop the hub back to being a data sharing pool where you don't need to run programs out of it. Then you can go back to a simple round robin access method and not this Whirling cyclone of Data we now have.
Problem 2 is I/O pins. That is an easy fix just add 32 (or better yet 48) more with simple A/D that was asked for. But no, now we have to go with the ever expanding smart pins. They are a cool concept but they grow in complexity every day it seems.
Instead of eliminating bottlenecks by going with simple, and often tried and true solutions, we have another Prop monster waiting to hatch that will probably put us right back to where we were with Uber-P2.
I think Chip is brilliant! Problem is, as often is the case with people that amazingly smart, they need solid goal based management or they get so wrapped up in all of their ever expanding ideas that they never finish what they start.
The P2 development after all these years is a perfect example.
I would have been happy with a faster P1 core with a few new instructions (bit and pin control operators mainly like we had on the SX) and 16K or 32K cog ram for programs/data and more I/O with simple analog capabilities. Shared Hub ram could be 16K or 32K.
.
- The new HUB scheme seems pretty cool but makes me a little nervous. It sounds like we can get very fast LMM (+/- hubexec)
- I really like composite video because the monitors are so cheap, but I'm afraid of the tangents and complexity to support it...
Overall, I think that once there is a chip on the way, Parallax will need to pay much, much more attention to software. Since SPIN will likely be compiled rather than interpretive, I'd like to see memory usage and procedural compatibility with GCC, so we can use objects interchangeably between languages. I like SPIN, but C is the langua franca. We cannot afford to have multiple OBEX or code libraries.
On every processor architecture we have instructions which contain multiple fields. These fields indicate the operation to be performed and one or more operands that the operations should apply to. Often two operands, source and destination.
These operand fields need to exist in every single instruction, or often a large majority of them.
So, when you say "add real D and S registers" I'd be interested to see what your instruction encoding format would look like.
For sure it is a totally different machine to the Propeller.
A cog could address up to 64K of registers by having a 16 bit address counter and using the D or S field as a pointer to one of the first 512 memory locations, and that register would contain 16 bit D and S values for the instruction. It does not even slow down execution since fetching the register data can overlap with the instruction decode. Yes, it would need some other changes, but they would be relatively simple.