Lots of COGs is a much better fit with the 'soft-peripheral microcontroller' model than a small number of more powerful COGs. You want a USART; it's a COG. You want a 32-bit timer; it's a COG. You want an ADC; it's a COG.
It is still smarter to do bit level stuff in Silicon, not software.
Hopefully the P2 timer can drop in, to give true PWM and capture. (to 200MHz)
Power is still going to matter on this part, and 32 COGs have a budget of just ~100mW each, before you blow past 3W.
Code size is still limited to ~500, and you want small code and HW waits, to keep the average power down
I did not follow the Video section. I presume we are talking about P1 video, but with inbuilt DACs?
If its the P1 video, we can use the video driver to output serial already. Could the video be modified simply to be able to use it to read serially too? Then we have a very basic SERDES. Postedit: It's not necessary though, just perhaps nice. This chip is so much faster, dedicating a cog to serialise the input is quite simple.
I'm guessing that in the usage cases where video will be used with a Propeller that large screen, hi-rez (1080P) monitors are not the most common target. Let the chips with built-in HDMI handle those.
I'd agree, Moving pictures, and > 1000 pixels, can be HDMI.
What about for lower video resolutions, such as VGA (640x480) or WVGA (800x480), could they be handled by one cog?
...
Small displays have really come down in price. And from the ads I've seen from suppliers, the VGA interface is FAR FROM DEAD! I'd be surprised if it wasn't currently growing.
As well as the VGA standard, the obvious next step is direct LCD connection, so the Prop effectively swallows/includes all the functionality of a SSD1963.
ie Design it to drive 800x480 LCD (and all smaller ones) directly.
The Prop1 cogs are super minimal and you don't have to feel like your wasting one by having it do almost nothing. I'm thinking that if there's 32 of them, people will get over their inhibitions about using them for small jobs. When they WAIT, they take no power, too, so they're economical.
There are also choices between 16 and 32, should Die area, or Power ceilings come to spoil the party. (as they tend to do)
eg I would rather have more RAM, and 24 COGS, than 32 COGS.
JRetSapDoog, the original P1 could handle VGA 640x480 drawing 4 different coloured pixel per 320x480, and new one is 5* faster, but half hub speed sped up, so at 2.5* faster it's still fast enough for 640x480 different colour per pixel
Chip, to go for the 32cogs could you do 2x16cog with 256KB each? that way they still get the 1:16 access and have inter inner-prop comms channel? maybe have cog 16 in each have a direct port, it won't matter if we throw two cogs from 32 at inner-prop comms, acting like the SDRAM driver cog we had initially in P2.
I don't know... I sleep over night, and you guys went crazy in the opium den! haha but I'm liking the outcome!
As well as the VGA standard, the obvious next step is direct LCD connection, so the Prop effectively swallows/includes all the functionality of a SSD1963.
ie Design it to drive 800x480 LCD (and all smaller ones) directly.
Right, but as I recall, the stated design for the video circuitry of the P2 over the last year or two didn't allow feeding multiple pins for one color. The video circuitry took the data straight to a DAC connected with one pin each for R, G and B (and one pin for H, I think, with V being handled separately in software). There is no provision to break the data out into chunks of 8 or however many bits for direct connection (i.e., w/o D/A), unlike on the P1. I was a little confused about what style of video we were talking about in this latest speculation, hence my question. I agree that it'd be nice to also allow direct RGB connections to bypass the need for VGA boards or LCD driver boards like the SSD1963 (which is quite popular but somewhat in need of a revamp). Edit: I'm talking about when using waitvid, of course.
Power is still going to matter on this part, and 32 COGs have a budget of just ~100mW each, before you blow past 3W.
It'd be interesting to know what a P1 COG, in the OnSemi process, consumes. I guess this is one of the most important questions Chip needs to ask them before we go to far down this route.
JRetSapDoog, the original P1 could handle VGA 640x480 drawing 4 different coloured pixel per 320x480, and new one is 5* faster, but half hub speed sped up, so at 2.5* faster it's still fast enough for 640x480 different colour per pixel
Chip, to go for the 32cogs could you do 2x16cog with 256KB each? that way they still get the 1:16 access and have inter inner-prop comms channel? maybe have cog 16 in each have a direct port, it won't matter if we throw two cogs from 32 at inner-prop comms, acting like the SDRAM driver cog we had initially in P2.
I don't know... I sleep over night, and you guys went crazy in the opium den! haha but I'm liking the outcome!
Yes, its been a hectic day!!
BTW With 16 cogs we still get hub access every 8 instructions so the whole enchilada is 5* faster
With 32 cogs there are some alternate possibilities so I have put up some ideas.
I am even more excited than when Chip got Hubexec on P2 working! Not sure if it is because this suits me better, or that we could get it imminently.
I kind of hate to make multiple hubs, but 1/32 hub access is pretty slow. Maybe THIS is where some hub-priority assignments are absolutely needed. A universal 512KB hub would be great.
Wow. I picked a really bad time to stop reading the forums! what a crazy, crazy turn of events! I, for one, would love to see this 32-cog P1 with analog I/O. As for the hub access, could you use an approach like I discuss in one of my blog entries? You lose determinism, but are still guarantee fairness and a predictable worst-case scenario.
I am even more excited than when Chip got Hubexec on P2 working! Not sure if it is because this suits me better, or that we could get it imminently.
But just how imminent is "imminently? That's the question. Seriously, that is a good question!
I wonder what the best-case (eh, best reasonable case) turn-around time would be for such a chip. Those fab time windows for samples are kind of infrequent. But, for example, if a chip "had to" be designed and sampled within, say, four months, could it be done? Okay, okay, by the end of the year? With all the experience gained and design work accomplished, Chip/Parallax might well be able to crank a chip out. And if it's positioned between the P1 and the P2 (P3?), it would diversify the product catalog of Parallax Semiconductor.
I'd be willing to (paid) pre-order 100 chips if I were reasonably confident that they'd work or perhaps be given the option for store credit if a new chip wasn't released within a given time frame (that might be an alternative to a Kickstarter-type of crowd-sourcing).
I'm liking the 100 pin TQFP, very manageable both in size and production etc and 64 I/O is more than sufficient whereas 32 has many times just fallen short.. When it comes to video there will NEVER be enough RAM but 512K is good too. 16 cogs should be more than sufficient however the throwaway nature of a cog when they are had in abundance translates into simplified coding. If 32 is just as possible and practical then let it be.
Now, there is one other thing that I don't know if it's ever been touched upon, and that's the OR'ing of the I/O. Many signals are by nature active low, an inheritance from TTL days I believe and this is at odds with the OR'ing nature of the I/O. If one cog deasserts a signal such as a chip select it must take it high however this means that no other cog can have access to that pin now because it cannot pull it low. Would there be a mechanism that exists that would either allow us to specify OR or AND for a particular pin? (or maybe just a global DIR). I know also with serial transmit that I would like another cog to also transmit data but it can't pull the line low etc. Is there any possibility of this OR or AND or has that already been taken care of?
Sure there's a long list of desirables but I won't go there, this has got to be reasonable and practical for real silicon to materialize sometime soon.
Peter, surely any cog having control of an I/O pin as per P1, would be the only one to be able to affect the 0/1 state of the pin, as the other cogs setting 0 won't make it go to 0, if your driving cog has set it to one! Or am I just mis-understanding your point?
I'm liking the 100 pin TQFP, very manageable both in size and production etc and 64 I/O is more than sufficient whereas 32 has many times just fallen short.. When it comes to video there will NEVER be enough RAM but 512K is good too. 16 cogs should be more than sufficient however the throwaway nature of a cog when they are had in abundance translates into simplified coding. If 32 is just as possible and practical then let it be.
Now, there is one other thing that I don't know if it's ever been touched upon, and that's the OR'ing of the I/O. Many signals are by nature active low, an inheritance from TTL days I believe and this is at odds with the OR'ing nature of the I/O. If one cog deasserts a signal such as a chip select it must take it high however this means that no other cog can have access to that pin now because it cannot pull it low. Would there be a mechanism that exists that would either allow us to specify OR or AND for a particular pin? (or maybe just a global DIR). I know also with serial transmit that I would like another cog to also transmit data but it can't pull the line low etc. Is there any possibility of this OR or AND or has that already been taken care of?
Sure there's a long list of desirables but I won't go there, this has got to be reasonable and practical for real silicon to materialize sometime soon.
I use DIR to set a pin low, with a pullup high. That way, a number of cogs can alternate the use of a shared resource such as SRAM.
But, it seems we might get the I/O features of the P2 which means there is inversion available and pullups/pulldowns with various strengths. I'm not sure if Chip has thought this thru' yet.
There is no commitment for this yet, but it could certainly be realised quickly.
Sorry for being sarcastic, but all this seems to me as a classical case of second system effect. I love the propeller, most of my products contain one or more. I love this forum, it has been very helpful to find solutions. But it's definitely not a good base for making decisions that have a considerable impact on the future of the whole Parallax company.
My advice:
1. If you have to make a major design or marketing decision, don't ask the public. While I don't doubt that there are many clever experts here, this usually leads to long discussions (26 pages!) that makes the decision more difficult instead of easier. You can't please them all and you don't have to. If you want to, ask 10 of your best customers or who you think could be a potential customer for the future product. But even then, if you think they tell nonsense, do what your intuition tells you.
2. If a decision is difficult because all options have nearly equal adavntages and disadvantages, then it doesn't matter which option you take. There's only one big mistake you can make: doing nothing. The P1 has been out for ~10 years now. It is still superior to many other µPs in some applications. But the competitors don't sleep. In another ten years there will surely be some enthusiasts who still use it. But you won't make much money with it any longer.
My personal opinion about the original subject:
5W for running all cogs at 160MHz or more is no issue. If you really need computing power you usually have power line supply. And a small heatsink or heat spreader is accepatble as long as you don't need a fan. Battery powered applications can still reduce clock speed to trade speed against current draw. For small embedded systems the number of different supply voltages you need cost more than a watt more or less.
BGA packages are also no problem as long as the pitch is not too small. 1.27 or 1mm pitch can still be soldered by hand (hot air gun) or inexpensive P&P machines. Most of my boards are 4 layer. You can reach the outer 4 ball rows without the need for filled micro vias and >4 layers, so up to say 256 pin BGA is OK if the most important signals are in the outer 4 rows. For breadboard setups and prototyping DIP-modules can be used.
So please don't cripple the design. Don't make any changes not absolutely necessary and get something produced as soon as possible. I currently buy ~2k P1s per year and as long as the performance/price ratio is right and it has the same "fun-factor" programming it like I'm used to with the P1 I'd love to choose it for new designs.
I don't mean to be rude, however this is becoming a sore point in both the wasted bandwidth and rather flippant royal "we"-ness, which is really rather denigrating to those at Parallax who've actually been doing the hard work and sweating it.
Its great to hear how 'we' all did the hard part, discussing options on the forum.
Chip? Oh he just waves his hands and everything comes together all magic-like....
Did someone say shrink? Heck yeah, put me down for a shrink too.
We'll just have Parallax mortgage the firm, or get a kickstarter going....yeah, thats the ticket.
Between this, and people pushing for retro-grade P1B from a previous decade, or cutting the P2 into 4 Core eunechs, its like people have stopped taking their Adderall.
Before we have any real data to work with, people who are not IC Designers, but now play them on this forum, are deciding what to slice off of the current design. And this is after a few poster make some pretty simple observations about what will and what will not actually save 'power'.
There are a couple of experts at Parallax, and a couple on this forum.
Maybe people who want to push for their favorite P1B, or go on at length on all sorts of tangents could start their own threads?
We could then perhaps keep this one a bit more narrowly focused on the issue at hand, and give Chip a break from numerous questiona arcana, and basically get on with dealing with the problem instead of inventing all sorts of useless scenarios?
I'd apologize again, however I won't.
I get the distinct impression from a number here that their main/only concern is that their personal hobby is going to be impacted.
My impression is that Parallax may be facing a more existential threat than that.
The Prop on account of the power draw is on the verge of being booted out of any uController design by most if not all such designers. Its power draw places it squarely in the PSOC family, and there is arguably more competition there than in the uC space. Even if its only 3W, how it ever got to that point shows a design process failure.
a 5W uController is quite likely to FAIL. Not be a slow seller, or low margin, just a $0 revenue dud.
It doesn't mean squat if several hundred devotees buy 1, 2 or 10 if they want to show their support.
That won't be enough to order another run of production wafers.
How some of you expect Parallax to remain healthy and hale is beyond me, unless their other products can fill the revnue hole.
Yeah, if it's enough. I'm totally up for a shrink, and in fact, think the shrink is the necessary thing to continue with volume in hobby land. I don't think it's needed for other opportunities.
With it shrunk, will it be attractive enough?
That discussion is as important as the other ones. And it also means some feature commits and a date too. Things all necessary at this point.
Wow. I picked a really bad time to stop reading the forums! what a crazy, crazy turn of events! I, for one, would love to see this 32-cog P1 with analog I/O. As for the hub access, could you use an approach like I discuss in one of my blog entries? You lose determinism, but are still guarantee fairness and a predictable worst-case scenario.
Hi Seairth,
Yes, you missed a lot today! Looks like we may get a 512kb, 32 cog, 64 I/O pin (w/ADC) QuadProp. The rest we can do in software until the P2 comes along.
However, I think losing the determinism is not such a good idea at this point - it was a major feature of the P1, and it is a feature is very hard to compensate for in software.
While all of this P1 variant discussion is interesting does it not miss a critical point?
The reason that the P2 sucks so much power is the number of logic elements and the clock speed. If you fill up the same area with the same number of elements, with 32 or whatever P1 cogs, and then bump the clock up to 200MHz would we not end up back exactly where we started with the P2?
In order to save power you need fewer elements and/or lower clock speed.
We can already go lower clock speed with our uber P2.
This whole discussion reminds me of the old days with people panicing and talking about yanking entire mechanical rooms off of buildings to build 'something else' because they could not keep the building comfortable at peak loads. The real solution was often to just look at the individual zones and control the usage there (going from constant air flow to variable air volume flow at each zone). Suddenly by not wasting resources where they did not need to go we had plenty of capacity to keep the building warm/cool without a major system redesign.
This is exactly where the P2 is now. It is a constant flow system (all elements suck power even if not needed) and just needs to be optimized to be a variable flow system where Chip gates things so that only elements really used in an instruction get powered. From an efficiencly stand point this is how it should be anyway. Just be careful, I bet there is an ARM patent out there somewhere for 'power gating'...
While all of this P1 variant discussion is interesting does it not miss a critical point?
The reason that the P2 sucks so much power is the number of logic elements and the clock speed. If you fill up the same area with the same number of elements, with 32 or whatever P1 cogs, and then bump the clock up to 200MHz would we not end up back exactly where we started with the P2?
Yes, you missed a critical post by Chip earlier in this thread, which is that the P2 currently requires about 16x more logic elements than the P1. So 4 much simpler P1 "cores" would still only be around 1/4 the complexity of the P2, and at the same clock speeds would presumably only consume about 1/4 the power (yes, I know it's more complex than this, but as a "back of an envelope" guesstimate it's probably within the ballpark).
Step1: Denial. This has to be an April fools joke, right?
Step2: Ok, never mind...
Personally, I like the idea of a simple, super P1 with more cores, more memory, and more I/O and near term availability.
Some of the things that I would like to interface with (HDMI transmitters, RGB DACs, LCD displays) need more I/O pins.
And, they don't benefit from the analog capabilities of P2.
If it could have 512kB, that would really open some doors for graphical displays.
And, sounds like very little learning curve to get started.
Frankly, I am now stopping all P2 work I've been doing until this settles down.
Same here. And the first, of five, prototype PCB boards for my new design are showing up today. It is the 'temporary' brain board that uses a P1 + Raspberry Pi to simulate what I want to do with a P2 system... The orders for the other 4 boards (I/O) will get placed on hold until the P2 is resolved or I can convince myself that just staying with the P1+Pi will be 'good enough'.
It looks like from the recent posts this morning that a consensus is coming around to solving the P2 problem. Your customers will always design a product for you if you listen to them. The reason I always liked the P1 was that no matter the limitations, it was simple, low power, and flexible. Technology is fleeting, so adding external ICs to cover for new peripherals/technology makes sense.
Not being a chip designer I will not comment on specific die design, etc. I do the software thing to make new products. So give me as many I/O and memory as you can, and I will always find a way to use all of it. I like the specs listed in post 498. The most important things for me are memory, I/O, and ADC/DAC. I do a good amount of battery based products, so heat and power usage is (somewhat) important. Speed, price, and die size, not so much.
I'm thinking more than 16 cogs is excessive for only 64 I/O pins. So, what about this:
100 pin 14x14mm exposed thermal pad TQFP package (Tja=20) with internal down-bonds to GND, so no pins needed for GND
16 x 1.8V VDD pins, at four per side, with internal down-bonds for GND
XI, XO, RESn, BOEn pins
64 I/O pins with a unique 3.3V VDD pin for every 4 pins (and internal down-bond for GND) - this is important for analog and high-speed switching
(that makes 100 pins, not including 32 internal down-bonds for GND)
16 two-clocks-per-instruction Prop1 cogs
256KB hub RAM with simple round-robin cog access - this maintains the same 8:1 instruction:hub-cycle ratio as the current Prop1
200MHz clock - cogs run at 100 MIPS, CTRs at 200 MHz
1600 total MIPS - 10x faster than current Prop1 chip
Center of die is only 4.3 x 4.3mm. Die is 5.4 x 5.4mm with pad frame, or 29.16 square mm (54% of current projected Prop2 die size)
This chip could be called P16X32B, as per current convention.
Manufacturing cost would be ~$1.90, assuming $950/wafer, 950 die per wafer, 80% yield, $0.50 per package, $0.15 total testing. 1K piece price would be ~$6.00.
This chip would behave like Prop1 with twice the cogs, all running 5x faster, and with 8x the hub RAM, plus 2x the I/O pins with new analog capabilities.
This would be really easy to make happen and the chance of latent bugs would be very low.
GREAT !! It could be possible to select the voltage for I/O pins? (like having different voltages banks, to avoid external voltage translators).
At first I was thinking about the idea of mixed P1 and P2 that Ariba and Peter Jakacki exposed. My proposal: A two groups independent 2xP2 cogs with 8xP1b. Each with their own hub cycles and IO pins. This retains best of both worlds a faster 8 x P1 100% binary compatible plus two new (and independent) P2. But maybe is too complex, and as power hungry as the current design. So I like more your last proposal.
About Video and RAM: I consider the minimum useful size to be 800x480x3 (1,152,000 bytes) 24 bpp. This is consistent with current low cost $45 US displays at sizes between 5"~ 9".
I'd agree, except the bean counters would want to recoup all P1B and P2 development costs before going ahead with P2, thus 5-10 year delay before we see P2.
Possibly end of P2 and Parallax if P1B does not see huge sales.
And it will not see huge sales.
Parallax Semiconductor has been promising P2 for years now... with commercial customers waiting to see what happens. Another year or two of delay, or an "interim chip" with indeterminate delay before P2 sees light of day - and they will write Parallax off, due to not making P2 as promised.
Parallax has been talking P2 even longer - what is it now, eight years?
The sudden reversal in direction will look like Parallax decided they could not do it.
Don't get me wrong, P1B would be an interesting chip for many uses - it is just not the promised P2.
Technical comparison between P32X32B @ 100MIPS (200MHz) and P2 @ 160MHz
Maximum Hub Bandwidth per Cog:
P1B: 200/32 * 4 = 25MB/sec
P2: 160/8 * 32 = 640MB/sec
Winner: P2 by a factor of 25.6!
Total hub bandwidth per chip:
P1B: 32*25 = 800MB/sec
P2: 640*8 = 5,120MB/sec
Winner: P2 by a factor of 6.4!
MIPS per cog
P1B: 100MIPS
P2: 160MIPS
Winner: P2 by a factor of 1.6-10 depending on instruction mix.
MIPS per chip
P1B: 3,200 max mips
P2: 1,280 max mips
Winner: Depends on instruction mix
For simple cog only code not needing incrementing addresses, P1.
For LMM/hubexec, code that no longer has to self modify, hub using code, P2.
*NOTE: P2 can do a lot more work per cycle, has auto increment address modes, mul, div, cordic etc
Hub bandwidth makes a huge difference. Compiled code would run much faster on P2. Overall, P2 system throughput will be faster.
Video Limits
1080p60, 8 bits per R,G,B
165Mhz dot clock
2,073,600 pixels per frame (1920*1080, 8bpp) ==> 124.42MB/sec (no clut, so RRRGGGBB is best possible color) ==> NOT POSSIBLE ON P1B (maybe using 5 cogs)
6,220,800 bytes per frame (24bpp) ==> 373.24MB/sec @ 60Hz (NOT POSSIBLE ON P1B (maybe using 15 cogs)
NOT POSSIBLE ON P1B because it CANNOT read even an 8 bit 1080p60 bitmap from SDRAM fast enough to refresh the screen!
On P2, possible as 1/4 of a cog (1 task)
Winner: P2 by 20:1 (8bpp) 60:1 (24bpp) ... a landslide
32 mini cogs I think it ends up as a GreenArray chip and that one is not a success and then still add everything else P2 have I think you end up with something that is a kludge.
I think I would rather have a 4cog P2 (call it quad-core P2) and as it can slice those 4 cogs by 4 you end up with 16 cogs useful when you have many but lighter tasks.
Comments
It is still smarter to do bit level stuff in Silicon, not software.
Hopefully the P2 timer can drop in, to give true PWM and capture. (to 200MHz)
Power is still going to matter on this part, and 32 COGs have a budget of just ~100mW each, before you blow past 3W.
Code size is still limited to ~500, and you want small code and HW waits, to keep the average power down
RC5 in SW ? sure.
If its the P1 video, we can use the video driver to output serial already. Could the video be modified simply to be able to use it to read serially too? Then we have a very basic SERDES.
Postedit: It's not necessary though, just perhaps nice. This chip is so much faster, dedicating a cog to serialise the input is quite simple.
I'd agree, Moving pictures, and > 1000 pixels, can be HDMI.
That still leaves
As well as the VGA standard, the obvious next step is direct LCD connection, so the Prop effectively swallows/includes all the functionality of a SSD1963.
ie Design it to drive 800x480 LCD (and all smaller ones) directly.
There are also choices between 16 and 32, should Die area, or Power ceilings come to spoil the party. (as they tend to do)
eg I would rather have more RAM, and 24 COGS, than 32 COGS.
Chip, to go for the 32cogs could you do 2x16cog with 256KB each? that way they still get the 1:16 access and have inter inner-prop comms channel? maybe have cog 16 in each have a direct port, it won't matter if we throw two cogs from 32 at inner-prop comms, acting like the SDRAM driver cog we had initially in P2.
I don't know... I sleep over night, and you guys went crazy in the opium den! haha but I'm liking the outcome!
Right, but as I recall, the stated design for the video circuitry of the P2 over the last year or two didn't allow feeding multiple pins for one color. The video circuitry took the data straight to a DAC connected with one pin each for R, G and B (and one pin for H, I think, with V being handled separately in software). There is no provision to break the data out into chunks of 8 or however many bits for direct connection (i.e., w/o D/A), unlike on the P1. I was a little confused about what style of video we were talking about in this latest speculation, hence my question. I agree that it'd be nice to also allow direct RGB connections to bypass the need for VGA boards or LCD driver boards like the SSD1963 (which is quite popular but somewhat in need of a revamp). Edit: I'm talking about when using waitvid, of course.
Oh, absolutely. The trick is going to be to find the sweet-spot where the balance between all the competing elements come together.
It'd be interesting to know what a P1 COG, in the OnSemi process, consumes. I guess this is one of the most important questions Chip needs to ask them before we go to far down this route.
Agreed.
It'll be interesting to see what the compiler guys think when they get to read these posts.
BTW With 16 cogs we still get hub access every 8 instructions so the whole enchilada is 5* faster
With 32 cogs there are some alternate possibilities so I have put up some ideas.
I am even more excited than when Chip got Hubexec on P2 working! Not sure if it is because this suits me better, or that we could get it imminently.
Wow. I picked a really bad time to stop reading the forums! what a crazy, crazy turn of events! I, for one, would love to see this 32-cog P1 with analog I/O. As for the hub access, could you use an approach like I discuss in one of my blog entries? You lose determinism, but are still guarantee fairness and a predictable worst-case scenario.
But just how imminent is "imminently? That's the question. Seriously, that is a good question!
I wonder what the best-case (eh, best reasonable case) turn-around time would be for such a chip. Those fab time windows for samples are kind of infrequent. But, for example, if a chip "had to" be designed and sampled within, say, four months, could it be done? Okay, okay, by the end of the year? With all the experience gained and design work accomplished, Chip/Parallax might well be able to crank a chip out. And if it's positioned between the P1 and the P2 (P3?), it would diversify the product catalog of Parallax Semiconductor.
I'd be willing to (paid) pre-order 100 chips if I were reasonably confident that they'd work or perhaps be given the option for store credit if a new chip wasn't released within a given time frame (that might be an alternative to a Kickstarter-type of crowd-sourcing).
Now, there is one other thing that I don't know if it's ever been touched upon, and that's the OR'ing of the I/O. Many signals are by nature active low, an inheritance from TTL days I believe and this is at odds with the OR'ing nature of the I/O. If one cog deasserts a signal such as a chip select it must take it high however this means that no other cog can have access to that pin now because it cannot pull it low. Would there be a mechanism that exists that would either allow us to specify OR or AND for a particular pin? (or maybe just a global DIR). I know also with serial transmit that I would like another cog to also transmit data but it can't pull the line low etc. Is there any possibility of this OR or AND or has that already been taken care of?
Sure there's a long list of desirables but I won't go there, this has got to be reasonable and practical for real silicon to materialize sometime soon.
But, it seems we might get the I/O features of the P2 which means there is inversion available and pullups/pulldowns with various strengths. I'm not sure if Chip has thought this thru' yet.
There is no commitment for this yet, but it could certainly be realised quickly.
C.W.
My advice:
1. If you have to make a major design or marketing decision, don't ask the public. While I don't doubt that there are many clever experts here, this usually leads to long discussions (26 pages!) that makes the decision more difficult instead of easier. You can't please them all and you don't have to. If you want to, ask 10 of your best customers or who you think could be a potential customer for the future product. But even then, if you think they tell nonsense, do what your intuition tells you.
2. If a decision is difficult because all options have nearly equal adavntages and disadvantages, then it doesn't matter which option you take. There's only one big mistake you can make: doing nothing. The P1 has been out for ~10 years now. It is still superior to many other µPs in some applications. But the competitors don't sleep. In another ten years there will surely be some enthusiasts who still use it. But you won't make much money with it any longer.
My personal opinion about the original subject:
5W for running all cogs at 160MHz or more is no issue. If you really need computing power you usually have power line supply. And a small heatsink or heat spreader is accepatble as long as you don't need a fan. Battery powered applications can still reduce clock speed to trade speed against current draw. For small embedded systems the number of different supply voltages you need cost more than a watt more or less.
BGA packages are also no problem as long as the pitch is not too small. 1.27 or 1mm pitch can still be soldered by hand (hot air gun) or inexpensive P&P machines. Most of my boards are 4 layer. You can reach the outer 4 ball rows without the need for filled micro vias and >4 layers, so up to say 256 pin BGA is OK if the most important signals are in the outer 4 rows. For breadboard setups and prototyping DIP-modules can be used.
So please don't cripple the design. Don't make any changes not absolutely necessary and get something produced as soon as possible. I currently buy ~2k P1s per year and as long as the performance/price ratio is right and it has the same "fun-factor" programming it like I'm used to with the P1 I'd love to choose it for new designs.
Its great to hear how 'we' all did the hard part, discussing options on the forum.
Chip? Oh he just waves his hands and everything comes together all magic-like....
Did someone say shrink? Heck yeah, put me down for a shrink too.
We'll just have Parallax mortgage the firm, or get a kickstarter going....yeah, thats the ticket.
Between this, and people pushing for retro-grade P1B from a previous decade, or cutting the P2 into 4 Core eunechs, its like people have stopped taking their Adderall.
Before we have any real data to work with, people who are not IC Designers, but now play them on this forum, are deciding what to slice off of the current design. And this is after a few poster make some pretty simple observations about what will and what will not actually save 'power'.
There are a couple of experts at Parallax, and a couple on this forum.
Maybe people who want to push for their favorite P1B, or go on at length on all sorts of tangents could start their own threads?
We could then perhaps keep this one a bit more narrowly focused on the issue at hand, and give Chip a break from numerous questiona arcana, and basically get on with dealing with the problem instead of inventing all sorts of useless scenarios?
I'd apologize again, however I won't.
I get the distinct impression from a number here that their main/only concern is that their personal hobby is going to be impacted.
My impression is that Parallax may be facing a more existential threat than that.
The Prop on account of the power draw is on the verge of being booted out of any uController design by most if not all such designers. Its power draw places it squarely in the PSOC family, and there is arguably more competition there than in the uC space. Even if its only 3W, how it ever got to that point shows a design process failure.
a 5W uController is quite likely to FAIL. Not be a slow seller, or low margin, just a $0 revenue dud.
It doesn't mean squat if several hundred devotees buy 1, 2 or 10 if they want to show their support.
That won't be enough to order another run of production wafers.
How some of you expect Parallax to remain healthy and hale is beyond me, unless their other products can fill the revnue hole.
Yes, you missed a lot today! Looks like we may get a 512kb, 32 cog, 64 I/O pin (w/ADC) QuadProp. The rest we can do in software until the P2 comes along.
However, I think losing the determinism is not such a good idea at this point - it was a major feature of the P1, and it is a feature is very hard to compensate for in software.
Ross.
The reason that the P2 sucks so much power is the number of logic elements and the clock speed. If you fill up the same area with the same number of elements, with 32 or whatever P1 cogs, and then bump the clock up to 200MHz would we not end up back exactly where we started with the P2?
In order to save power you need fewer elements and/or lower clock speed.
We can already go lower clock speed with our uber P2.
This whole discussion reminds me of the old days with people panicing and talking about yanking entire mechanical rooms off of buildings to build 'something else' because they could not keep the building comfortable at peak loads. The real solution was often to just look at the individual zones and control the usage there (going from constant air flow to variable air volume flow at each zone). Suddenly by not wasting resources where they did not need to go we had plenty of capacity to keep the building warm/cool without a major system redesign.
This is exactly where the P2 is now. It is a constant flow system (all elements suck power even if not needed) and just needs to be optimized to be a variable flow system where Chip gates things so that only elements really used in an instruction get powered. From an efficiencly stand point this is how it should be anyway. Just be careful, I bet there is an ARM patent out there somewhere for 'power gating'...
Yes, you missed a critical post by Chip earlier in this thread, which is that the P2 currently requires about 16x more logic elements than the P1. So 4 much simpler P1 "cores" would still only be around 1/4 the complexity of the P2, and at the same clock speeds would presumably only consume about 1/4 the power (yes, I know it's more complex than this, but as a "back of an envelope" guesstimate it's probably within the ballpark).
Ross.
Step1: Denial. This has to be an April fools joke, right?
Step2: Ok, never mind...
Personally, I like the idea of a simple, super P1 with more cores, more memory, and more I/O and near term availability.
Some of the things that I would like to interface with (HDMI transmitters, RGB DACs, LCD displays) need more I/O pins.
And, they don't benefit from the analog capabilities of P2.
If it could have 512kB, that would really open some doors for graphical displays.
And, sounds like very little learning curve to get started.
1) I like the idea of a P32X32B with 64 I/O and 512KB, BUT ONLY IF IT DOES NOT KILL THE P2
Note that with hardware tasks, the 8 cog P2 is already capable of 32 hardware tasks, granted at 50Mhz not 100Mhz, BUT
What would the OnSemi estimate be for the 32 cog 512KB hub 200MHz 2 cycle P1B? 5W? More?
2) The proposed P1B would still require LMM, and thus be significantly slower for compiled code.
3) I am not at all convinced it would save power
Having said that
The solution I propose for sharing the 512MB hub bandwidth is as follows:
a 64 entry five bit table, and six bit counter incremented each clock cycle.
The hub slot is assigned by the content of the table, indexed by the counter.
Allows TOTAL flexibility in allocating slots in a deterministic fashion.
But remember
P2 is the path we were on and has many excellent HLL and PASM features.
As a stepping stone, the revised "simpler" chip is not a horrible idea.
If it kills the P2/P3, it is a TERRIBLE idea.
Can Parallax afford to do both?
Is it really smart to change designs at the last minute, over unproved power envelope fears?
Frankly, I am now stopping all P2 work I've been doing until this settles down.
Same here. And the first, of five, prototype PCB boards for my new design are showing up today. It is the 'temporary' brain board that uses a P1 + Raspberry Pi to simulate what I want to do with a P2 system... The orders for the other 4 boards (I/O) will get placed on hold until the P2 is resolved or I can convince myself that just staying with the P1+Pi will be 'good enough'.
I like the new ideas....512 RAM rocks...it satisfies my needs more pins, more speed, more memory, and more cogs...BUILD IT:)
Plus less items to test out...
Not being a chip designer I will not comment on specific die design, etc. I do the software thing to make new products. So give me as many I/O and memory as you can, and I will always find a way to use all of it. I like the specs listed in post 498. The most important things for me are memory, I/O, and ADC/DAC. I do a good amount of battery based products, so heat and power usage is (somewhat) important. Speed, price, and die size, not so much.
GREAT !! It could be possible to select the voltage for I/O pins? (like having different voltages banks, to avoid external voltage translators).
At first I was thinking about the idea of mixed P1 and P2 that Ariba and Peter Jakacki exposed. My proposal: A two groups independent 2xP2 cogs with 8xP1b. Each with their own hub cycles and IO pins. This retains best of both worlds a faster 8 x P1 100% binary compatible plus two new (and independent) P2. But maybe is too complex, and as power hungry as the current design. So I like more your last proposal.
About Video and RAM: I consider the minimum useful size to be 800x480x3 (1,152,000 bytes) 24 bpp. This is consistent with current low cost $45 US displays at sizes between 5"~ 9".
Possibly end of P2 and Parallax if P1B does not see huge sales.
And it will not see huge sales.
Parallax Semiconductor has been promising P2 for years now... with commercial customers waiting to see what happens. Another year or two of delay, or an "interim chip" with indeterminate delay before P2 sees light of day - and they will write Parallax off, due to not making P2 as promised.
Parallax has been talking P2 even longer - what is it now, eight years?
The sudden reversal in direction will look like Parallax decided they could not do it.
Don't get me wrong, P1B would be an interesting chip for many uses - it is just not the promised P2.
Maximum Hub Bandwidth per Cog:
P1B: 200/32 * 4 = 25MB/sec
P2: 160/8 * 32 = 640MB/sec
Winner: P2 by a factor of 25.6!
Total hub bandwidth per chip:
P1B: 32*25 = 800MB/sec
P2: 640*8 = 5,120MB/sec
Winner: P2 by a factor of 6.4!
MIPS per cog
P1B: 100MIPS
P2: 160MIPS
Winner: P2 by a factor of 1.6-10 depending on instruction mix.
MIPS per chip
P1B: 3,200 max mips
P2: 1,280 max mips
Winner: Depends on instruction mix
For simple cog only code not needing incrementing addresses, P1.
For LMM/hubexec, code that no longer has to self modify, hub using code, P2.
*NOTE: P2 can do a lot more work per cycle, has auto increment address modes, mul, div, cordic etc
Hub bandwidth makes a huge difference. Compiled code would run much faster on P2. Overall, P2 system throughput will be faster.
Video Limits
1080p60, 8 bits per R,G,B
165Mhz dot clock
2,073,600 pixels per frame (1920*1080, 8bpp) ==> 124.42MB/sec (no clut, so RRRGGGBB is best possible color) ==> NOT POSSIBLE ON P1B (maybe using 5 cogs)
6,220,800 bytes per frame (24bpp) ==> 373.24MB/sec @ 60Hz (NOT POSSIBLE ON P1B (maybe using 15 cogs)
NOT POSSIBLE ON P1B because it CANNOT read even an 8 bit 1080p60 bitmap from SDRAM fast enough to refresh the screen!
On P2, possible as 1/4 of a cog (1 task)
Winner: P2 by 20:1 (8bpp) 60:1 (24bpp) ... a landslide
Signal Capture / Generation Limits
P1B: 100Mhz, interleaving 5 cogs, to write/read 32 bits @ 100Mhz to/from hub.
P2: 160Mhz, one cog
Winner: P2 by 8:1 (5*1.6)
SDRAM
P1B: has to be bit banged, best guess 50MB/sec, 20 pins left after 16 bit SDRAM
P2: 320MB/sec with 16 bit interface (640MB/sec with 32 bit interface), 48 user pins left after 16 bit SDRAM
Winner: P2 by 12.8:1
LMM vs HUBEXEC
P1B Cog: 25MIPS
P2 Cog: 160MIPS
Winner: P2 by 6.8:1
Pins
P1B: 64 less flash + serial, so 58 user pins
P2: 92 less flash + serial, so 86 user pins
Winner: P2 by 28 pins
Power consumption
Roughly the same
And much much more.
P1B should be made AFTER P2 as an I/O expander for P2, and for lower end applications.
I know bean counters. If the P1B is made first, we will NEVER see the P2.
I think I would rather have a 4cog P2 (call it quad-core P2) and as it can slice those 4 cogs by 4 you end up with 16 cogs useful when you have many but lighter tasks.