Maybe, but it seems the $100 DE0-Nano would have life again with 1P2+NP1 COGs, and the DE2 emulation could be even bigger.
Do we have then have to have COGNEW1 and COGNEW2 instructions to load a new P1 or P2 COG? And both instruction sets will have to have these instructions. Is that the plan?
Seems to me mixing P1 and P2 on the same chip, would be pretty confusing to the marketplace. Outside of a portion of the hobbyist community or maybe a handful of low volume users, most potential customers will be using a compiler and really won't be digging very deep into the chip, so the intricacies of the assembly language will be hidden. But then it becomes an issue of handling the compiler and saying this code is for this COG and can't be shared with this other COG.
Seems to me mixing P1 and P2 on the same chip, would be pretty confusing to the marketplace. Outside of a portion of the hobbyist community or maybe a handful of low volume users, most potential customers will be using a compiler and really won't be digging very deep into the chip, so the intricacies of the assembly language will be hidden. But then it becomes an issue of handling the compiler and saying this code is for this COG and can't be shared with this other COG.
When in doubt KISS
KISS is great, but comes in at 8W.
So, a more pragmatic solution is needed.
Designers resource map all the time already, so this housekeeping is not a brick wall. Software steadily improves.
If serious consideration is given to having mixed cogs on the chip the the C compiler guys nedd to be consulted to make sure this mix can be integrated into the C tools otherwise, Parallax will be back to having the flagship chip unsupported by any C tools. This would be a complete fail. There needs to be complete, simple (as possible) integration of the two cog types into C.
Do we have then have to have COGNEW1 and COGNEW2 instructions to load a new P1 or P2 COG? And both instruction sets will have to have these instructions. Is that the plan?
I'd like to spend more time listening to possible feedback on the mix ideas from Chip than wasting time on details that may eventually have no value. But since you brought it up, yes assuming mixed cogs, it would mean some differences except for when the P2 disable fuse Ariba mentioned is set.
To me the alternative is continuing to use P1, using the P16x32b if available, or moving on to something like an STM32F4xx permanently. Although I'm sure some folks here would love to see me out of their <adjective here> hair ;-)
You mention just a 25% improvement for hubexec vs. LMM on P1 cogs. Since the added complexity makes this seem not worth it to support hubexec, would more COG RAM and some fast memory copy mechanism improve LMM performance on the P1 cog?
I don't like the idea of mixing COG types on the same chip because it seems complicated. A final successful design should be simple and elegant.
Of course, everyone would love 8 * P2 COGS - Problem is, Physics says no @ 180nm.
Chip is already looking at removing the big Mathops from a P2 COG, into a common area, which will shrink the P2 COG. (at the cost of total peak Math OPS)
A P1 COG is too simple, and needs some changes, so some elements of P2 will be needed, and that takes time.
P2 is FPGA proven, so whist the idea may seem less purist, it does take what we have, to make what we want.
Which is what engineering is all about.
Any Smarter/Smaller mix of COGs would share a common software base, and be able to fit the power envelope.
P2 is FPGA proven, so whist the idea may seem less purist, it does take what we have, to make what we want.
Which is what engineering is all about.
I beg to differ. The P2 is totally unproven. We have only just received the fpga code that incorporates the major changes done since November 2013. While we did receive an interim version in February, I don't believe it has been widely used. And Chip has countered the latest version is only a prelim version and the docs have not been updated so we know what/how to test.
At least the P1+ is based on a known working model and is an order of magnitude simpler to test.
Please stop distorting the facts. I want the P2 as much as the next person, but reality has to take its place, and that is currently the P16+X32B makes the most sense.
So how about refining that slightly to give Chip a handle on what we might be happy with?
Chip & Ken will ultimately decide, so lets give them the best help we can, without such blatant bias.
I beg to differ. The P2 is totally unproven. We have only just received the fpga code that incorporates the major changes done since November 2013. While we did receive an interim version in February, I don't believe it has been widely used. And Chip has countered the latest version is only a prelim version and the docs have not been updated so we know what/how to test.
At least the P1+ is based on a known working model and is an order of magnitude simpler to test.
Please stop distorting the facts. I want the P2 as much as the next person, but reality has to take its place, and that is currently the P16+X32B makes the most sense.
So how about refining that slightly to give Chip a handle on what we might be happy with?
Chip & Ken will ultimately decide, so lets give them the best help we can, without such blatant bias.
Wow, I missed the link to a FPGA release of the P16+X32B ?! Do you have that ?
Without that, clearly the P2 IS more proven than any P16+X32B !
The P2 has FPGA code builds, in the field, and it has On Semi Power Sim results.
The P16+X32B has neither of those. P16+X32B is not even properly defined.
The physics deny us the current design at any clock speed we find broadly acceptable.
I would run it at 60Mhz, and get along just fine, but that time resolution is crappy for a lot of use cases. It's crappy for a couple of mine I had planned frankly. Not gonna happen.
We know a ton now, and we know how some things play out too. We did test out a lot of things. That P2 COG isn't far from the mark. If it's got a bug or two, that's all it's got.
I think the smartest thing is to take the P1 COG, add some things we know are a slam dunk, and move to the same testing we were going to do for the P2 design. Doing this is more favorably aligned with the process physics. No getting around that one.
There will be a testing round, and there will be one because Chip is going to use the fuses and I/O developed already no matter what. Those are proven and high value and are a sunk cost at this point.
That means no matter what, we will be integrating those with the P1 COGS and we will be maximizing the P1 COGS given what is available to us, which boils down to precisely what JMG said: it does take what we have, to make what we want.
And I'm fine with all of that. It's important that we maximize this opportunity so that future ones are funded. We get to continue on and we get a new device that we can all go and profit from.
Wow, I missed the link to a FPGA release of the P16+X32B ?! Do you have that ?
Without that, clearly the P2 IS more proven than any P16+X32B !
The P2 has FPGA code builds, in the field, and it has On Semi Power Sim results.
The P16+X32B has neither of those. P16+X32B is not even properly defined.
There is no bias here, just simple reality.
I trust Chip when he says he had the P1 2clock version (presume 8 cogs) working on fpga 6 years ago. Validation of the P1 code is simple, P2 code is extremely complex, as you well know.
A combo P2 + P1x chip, while complicating the tool chain, would not be a marketing problem. ARM is currently BRAGGING about their big.LITTLE multicore designs. But that is the problem, since they are they probably have a patent on it.
For industrial process control the ideal chip would actually be 2 x P2 cogs (with 1mb memory) + 16 P1x cogs (a few quality of life instructions added... ie bit manipulation etc.) + 96 I/O. Not all of the I/O needs to be full analog. I know, that breaks the traditional Prop layout, but I would rather have 96 I/O with a bank being plain digital for connection to external memory + 48 to 64 useful I/O than to have 64 everything pins that leave 20 or less free after connecting external RAM + boot serial ROM + primary serial com.
If the P1x cogs had a subset of the P2 cogs instructions then creating a compiler to work with both should not be that hard. I would not want to see a P1x included that had a different instruction map/set.
With that configuration you could have 1 P2 cog running primary business logic, 1 P2 cog running the HMI with nice resolution graphics and 16 P1x cogs connected to the I/O getting the work done!
A combo P2 + P1x chip, while complicating the tool chain, would not be a marketing problem. ARM is currently BRAGGING about their big.LITTLE multicore designs. But that is the problem, since they are they probably have a patent on it.
For industrial process control the ideal chip would actually be 2 x P2 cogs (with 1mb memory) + 16 P1x cogs (a few quality of life instructions added... ie bit manipulation etc.) + 96 I/O. Not all of the I/O needs to be full analog. I know, that breaks the traditional Prop layout, but I would rather have 96 I/O with a bank being plain digital for connection to external memory + 48 to 64 useful I/O than to have 64 everything pins that leave 20 or less free after connecting external RAM + boot serial ROM + primary serial com.
If the P1x cogs had a subset of the P2 cogs instructions then creating a compiler to work with both should not be that hard. I would not want to see a P1x included that had a different instruction map/set.
With that configuration you could have 1 P2 cog running primary business logic, 1 P2 cog running the HMI with nice resolution graphics and 16 P1x cogs connected to the I/O getting the work done!
I believe that big.LITTLE consists of compatible processors though. Code can be compiled that runs on either. That won't be the case with a P1+P2 chip.
Has anyone noticed Chip today?
Maybe he's preparing a P16X32B FPGA code for the DE0/DE2
Or maybe he's getting some much needed rest. Parallax needs to make a major decision within the next few days, and they should be well-rested and thinking with clear minds when they make this decision. On the one hand they can figure out ways to resolve the P2 power issue, and continue to move forward with a competitive chip. Or they can decide to go backwards and make an improved P1 chip, and hope they can stave off competition from other chips that have kept pace with the technology. I think the biggest issue is whether they stick with 180nm or move to 65nm. I believe there is a path where they can do this that limits the risks, and I have mentioned it a few times.
The conservative approach would be to make the improved P1 chip. This has the lowest risks, and will get a new chip out earlier. However, I believe that is a very short-sighted approach that will help Parallax for the next couple of years, but harm them further down the road. Perhaps Parallax's strategy is to remain small. They don't seem to be interested in external investment or ever going public. So if that is their business strategy, then going with an improved P1 may fit well into their plans. We'll see which road they take in the next few days.
In the Cortex ARMs most multiple processors consist of an M4 with floating point and M0s that do not have it.
Also M0 run Thumb (16 bit opcodes) while M4 use Thumb2 which is a superset of Thumb codes. So you can share code between the 2, though most toolsets I've seen are pretty clunky in multiprocessor support.
In the ARM9/11 cases the multiprocessors are usually identical.
Or maybe he's getting some much needed rest. Parallax needs to make a major decision within the next few days, and they should be well-rested and thinking with clear minds when they make this decision. On the one hand they can figure out ways to resolve the P2 power issue, and continue to move forward with a competitive chip. Or they can decide to go backwards and make an improved P1 chip, and hope they can stave off competition from other chips that have kept pace with the technology. I think the biggest issue is whether they stick with 180nm or move to 65nm. I believe there is a path where they can do this that limits the risks, and I have mentioned it a few times.
The conservative approach would be to make the improved P1 chip. This has the lowest risks, and will get a new chip out earlier. However, I believe that is a very short-sighted approach that will help Parallax for the next couple of years, but harm them further down the road. Perhaps Parallax's strategy is to remain small. They don't seem to be interested in external investment or ever going public. So if that is their business strategy, then going with an improved P1 may fit well into their plans. We'll see which road they take in the next few days.
Hi Dave,
The most obvious outcome, would of course be to do both. Developing a 65nm P2 will take time and cost money. Developing the PX16X32B would be relatively quick and raise money - which could then be used to fund the 65nm P2.
In the end, Parallax would have a family of three superb chips that all have quite a long life ahead of them.
Developing a P1+ will take 6 months minimum, and probably more like a year given the time spent on the P2. So this would delay the P2 by 6 months to a year. The P1+ will require support resources beyond just the chip development effort. So if you factor in all the resources required to develop and support the P1+ I doubt if Parallax would raise sufficient funds from it to drive the P2 development. They will need to sell the P1+ for one or two years before they even break even.
The most obvious outcome, would of course be to do both. Developing a 65nm P2 will take time and cost money. Developing the PX16X32B would be relatively quick and raise money - which could then be used to fund the 65nm P2.
In the end, Parallax would have a family of three superb chips that all have quite a long life ahead of them.
Ross.
@Ross, That's what I'm thinking as well. I have to admit I find it amusing that people here that were thrilled by the power and capabilities of the P1 as little as a year ago can't live with a "P1b" with twice as many IO pins that are capable of analog in & out, 16cogs, 8x+ hub ram and a 160MHZ - 200MHZ clock!!
In the Cortex ARMs most multiple processors consist of an M4 with floating point and M0s that do not have it.
Also M0 run Thumb (16 bit opcodes) while M4 use Thumb2 which is a superset of Thumb codes. So you can share code between the 2, though most toolsets I've seen are pretty clunky in multiprocessor support.
In the ARM9/11 cases the multiprocessors are usually identical.
I'm aware the not all ARM cores are code compatible but I believe all of the ones paired in a big.LITTLE arrangement are. The ARM site talks about scheduling threads dynamically on the different CPU so it must be possible to compile code that will run on both the big and the little cores.
...[/I] Developing a 65nm P2 will take time and cost money. Developing the PX16X32B would be relatively quick and raise money - which could then be used to fund the 65nm P2...
I'm not sure that I get this. At a minimum, won't manufacturing either chip take $$$ even at the 180nm process? So the P1+ vs P2 would cost the same to make. I'm also not sure that Parallax could crank out a P1+ in much less time than it takes to finish a P2. After all, they'll have to figure out all the "extras" to add.
I believe that big.LITTLE consists of compatible processors though. Code can be compiled that runs on either. That won't be the case with a P1+P2 chip.
True, however if the P1x is a true subset of the P2 instruction set then the core logic code would be sharable between them. The P2 enhancements would be instructions for the higher power business logic like full hubexec, external SDRAM, big math, etc. along with nice video. All of your standard I/O logic would program exactly the same on either cog. Yes, users would have to think about where their programs are going but right now what are their options? Do I/O with a Prop and then try to kludge something together with a Pi or STM/PIC for logic, user interface and acceptable video? I am doing that right now as an interim to getting an all-in-one solution from Parallax. It is NOT commercial grade do to the compromises needed to kludge it together. It is only a test bed.
I strongly believe the NEXT Parallax chip has to be able to deliver complete system level capabilities in order to really be successful. That means instruction sets that have the creature comforts of other chips, the ability do to serious business logic and deliver an integral user interface. That is what the P2 would have delivered, but at too high a power cost with 8 killer cogs.
The old saying is "Lead, Follow or Get Out of the Way". The P2 is LEAD, the P1x is Follow and the later will happen if Parallax ends up delaying for too long.
@Ross, That's what I'm thinking as well. I have to admit I find it amusing that people here that were thrilled by the power and capabilities of the P1 as little as a year ago can't live with a "P1b" with twice as many IO pins that are capable of analog in & out, 16cogs, 8x+ hub ram and a 160MHZ - 200MHZ clock!!
Yes, it's odd isn't it - the answer is so obvious that it makes me wonder why it has taken us all so long to realize it. The "shiny bauble" of the P2 seems to have had us all mesmerized for the past year or two.
In seems in retrospect the problem is that Parallax let the P2 design process get out of hand, and instead of developing the P2 they ended up developing a P3 that requires too much power at the 180nm process. Parallax needs to wind the clock back a bit and build the P2 first in the 180nm process, and then build the P3 design that we've ended up with in the 65nm process.
In seems in retrospect the problem is that Parallax let the P2 design process get out of hand, and instead of developing the P2 they ended up developing a P3 that requires too much power at the 180nm process. Parallax needs to wind the clock back a bit and build the P2 first in the 180nm process, and then build the P3 design that we've ended up with in the 65nm process.
Exactly! That's why so many people now want the P16X32B - it offers pretty much exactly what everyone wanted from the P2 in the first place! And if they are quick about producing it, the P16X32B can still benefit from whatever business revenue Parallax originally thought the P2 would generate for them.
Then the P3 can come along when Parallax can afford a 65nm chip - maybe next year, maybe the year after.
Comments
When in doubt KISS
Then you will be happy to wait for the 65nm version, if you find a choice of two COGs inside one device a culture shock.
Or, you can never enable the full P2 features and pretend they are all the same P2S cores, and you will be fine as well
KISS is great, but comes in at 8W.
So, a more pragmatic solution is needed.
Designers resource map all the time already, so this housekeeping is not a brick wall. Software steadily improves.
To me the alternative is continuing to use P1, using the P16x32b if available, or moving on to something like an STM32F4xx permanently. Although I'm sure some folks here would love to see me out of their <adjective here> hair ;-)
I will welcome the 65nm version of the P2 if I am still alive when it ships. In the meantime, I hope to enjoy programming and using the P16X32B.
Ross.
You mention just a 25% improvement for hubexec vs. LMM on P1 cogs. Since the added complexity makes this seem not worth it to support hubexec, would more COG RAM and some fast memory copy mechanism improve LMM performance on the P1 cog?
Of course, everyone would love 8 * P2 COGS - Problem is, Physics says no @ 180nm.
Chip is already looking at removing the big Mathops from a P2 COG, into a common area, which will shrink the P2 COG. (at the cost of total peak Math OPS)
A P1 COG is too simple, and needs some changes, so some elements of P2 will be needed, and that takes time.
P2 is FPGA proven, so whist the idea may seem less purist, it does take what we have, to make what we want.
Which is what engineering is all about.
Any Smarter/Smaller mix of COGs would share a common software base, and be able to fit the power envelope.
At least the P1+ is based on a known working model and is an order of magnitude simpler to test.
Please stop distorting the facts. I want the P2 as much as the next person, but reality has to take its place, and that is currently the P16+X32B makes the most sense.
So how about refining that slightly to give Chip a handle on what we might be happy with?
Chip & Ken will ultimately decide, so lets give them the best help we can, without such blatant bias.
The P2 is not ready yet. No SERDES no USB support, No fixed instruction set.
Now we produce a mixed chip. What about missing instructions of P2. Can we NOW fix the instruction set for the P2 to put one in the P1whatever?
I do not think so.
Delaying both until P2 done? Having three compiler?
This has to be a P1+ not a early release of a P2 beta.
Enjoy!
Mike
Wow, I missed the link to a FPGA release of the P16+X32B ?! Do you have that ?
Without that, clearly the P2 IS more proven than any P16+X32B !
The P2 has FPGA code builds, in the field, and it has On Semi Power Sim results.
The P16+X32B has neither of those. P16+X32B is not even properly defined.
There is no bias here, just simple reality.
I would run it at 60Mhz, and get along just fine, but that time resolution is crappy for a lot of use cases. It's crappy for a couple of mine I had planned frankly. Not gonna happen.
We know a ton now, and we know how some things play out too. We did test out a lot of things. That P2 COG isn't far from the mark. If it's got a bug or two, that's all it's got.
I think the smartest thing is to take the P1 COG, add some things we know are a slam dunk, and move to the same testing we were going to do for the P2 design. Doing this is more favorably aligned with the process physics. No getting around that one.
There will be a testing round, and there will be one because Chip is going to use the fuses and I/O developed already no matter what. Those are proven and high value and are a sunk cost at this point.
That means no matter what, we will be integrating those with the P1 COGS and we will be maximizing the P1 COGS given what is available to us, which boils down to precisely what JMG said: it does take what we have, to make what we want.
And I'm fine with all of that. It's important that we maximize this opportunity so that future ones are funded. We get to continue on and we get a new device that we can all go and profit from.
Maybe he's preparing a P16X32B FPGA code for the DE0/DE2
For industrial process control the ideal chip would actually be 2 x P2 cogs (with 1mb memory) + 16 P1x cogs (a few quality of life instructions added... ie bit manipulation etc.) + 96 I/O. Not all of the I/O needs to be full analog. I know, that breaks the traditional Prop layout, but I would rather have 96 I/O with a bank being plain digital for connection to external memory + 48 to 64 useful I/O than to have 64 everything pins that leave 20 or less free after connecting external RAM + boot serial ROM + primary serial com.
If the P1x cogs had a subset of the P2 cogs instructions then creating a compiler to work with both should not be that hard. I would not want to see a P1x included that had a different instruction map/set.
With that configuration you could have 1 P2 cog running primary business logic, 1 P2 cog running the HMI with nice resolution graphics and 16 P1x cogs connected to the I/O getting the work done!
The conservative approach would be to make the improved P1 chip. This has the lowest risks, and will get a new chip out earlier. However, I believe that is a very short-sighted approach that will help Parallax for the next couple of years, but harm them further down the road. Perhaps Parallax's strategy is to remain small. They don't seem to be interested in external investment or ever going public. So if that is their business strategy, then going with an improved P1 may fit well into their plans. We'll see which road they take in the next few days.
Also M0 run Thumb (16 bit opcodes) while M4 use Thumb2 which is a superset of Thumb codes. So you can share code between the 2, though most toolsets I've seen are pretty clunky in multiprocessor support.
In the ARM9/11 cases the multiprocessors are usually identical.
Hi Dave,
The most obvious outcome, would of course be to do both. Developing a 65nm P2 will take time and cost money. Developing the PX16X32B would be relatively quick and raise money - which could then be used to fund the 65nm P2.
In the end, Parallax would have a family of three superb chips that all have quite a long life ahead of them.
Ross.
@Ross, That's what I'm thinking as well. I have to admit I find it amusing that people here that were thrilled by the power and capabilities of the P1 as little as a year ago can't live with a "P1b" with twice as many IO pins that are capable of analog in & out, 16cogs, 8x+ hub ram and a 160MHZ - 200MHZ clock!!
I'm not sure that I get this. At a minimum, won't manufacturing either chip take $$$ even at the 180nm process? So the P1+ vs P2 would cost the same to make. I'm also not sure that Parallax could crank out a P1+ in much less time than it takes to finish a P2. After all, they'll have to figure out all the "extras" to add.
True, however if the P1x is a true subset of the P2 instruction set then the core logic code would be sharable between them. The P2 enhancements would be instructions for the higher power business logic like full hubexec, external SDRAM, big math, etc. along with nice video. All of your standard I/O logic would program exactly the same on either cog. Yes, users would have to think about where their programs are going but right now what are their options? Do I/O with a Prop and then try to kludge something together with a Pi or STM/PIC for logic, user interface and acceptable video? I am doing that right now as an interim to getting an all-in-one solution from Parallax. It is NOT commercial grade do to the compromises needed to kludge it together. It is only a test bed.
I strongly believe the NEXT Parallax chip has to be able to deliver complete system level capabilities in order to really be successful. That means instruction sets that have the creature comforts of other chips, the ability do to serious business logic and deliver an integral user interface. That is what the P2 would have delivered, but at too high a power cost with 8 killer cogs.
The old saying is "Lead, Follow or Get Out of the Way". The P2 is LEAD, the P1x is Follow and the later will happen if Parallax ends up delaying for too long.
Yes, it's odd isn't it - the answer is so obvious that it makes me wonder why it has taken us all so long to realize it. The "shiny bauble" of the P2 seems to have had us all mesmerized for the past year or two.
Only because too many people here can't seem to leave well enough alone!
Exactly! That's why so many people now want the P16X32B - it offers pretty much exactly what everyone wanted from the P2 in the first place! And if they are quick about producing it, the P16X32B can still benefit from whatever business revenue Parallax originally thought the P2 would generate for them.
Then the P3 can come along when Parallax can afford a 65nm chip - maybe next year, maybe the year after.
Ross.