Mark this as a major mile stone for the Propeller II
We have a tapeout submission on the 5th of February, 2013 that we are ready for ...
Finally! You and Chip have been lazing about for far too long, and it's good to hear you've gotten around to getting some work done!
Seriously, we'll be keeping fingers crossed, burning incense, helping old ladies cross the street, anything we need for a successful outcome. And I'm sure it will be. You guys have been very careful and diligent in getting it done right.
So this means once the P2 is Gold, Beau has time to do the LVS for the P8X32B, right???
Yes, we need to get around to that. Beau will be working on some other stuff while the chip is being fab'd, but we need to get that LVS work done, once and for all.
Since you did the 2 power pins per 8 bit port on the P2, I'd guess you'd probably do the same on the P8X32B.
10*8 = 80 pins for data and power
XI
XO
BOEn
RESn
That adds up to 84, which is perfect for a PLCC, but you'd probably trim 4 power pins off to fit it into a QFP 80 package.
That would be about 18mm square.
How bout putting EEPROM on the die, since you got more room (I'm pretty sure I asked you whether that was possible before)?
Instead of adding RAM, which would break compatibility, just put some EEPROM on the chip, hanging out where the usual EEPROM does, but then you have a 64 I/O 1 chip solution!!!
Yes, we need to get around to that. Beau will be working on some other stuff while the chip is being fab'd, but we need to get that LVS work done, once and for all.
I suppose this is a dumb question but what is P8X32B? A version of P1 with 64 I/O pins (drib/inb/outb implemented)?
@Chip - I'm sure that we could LVS the P8X32B today. The problem that I remember is that hierarchy was broken in too many places between the Layout and the Schematic, causing the LVS tool to lose track of where it needed to be. At the time we had another guy working the layout on this and I wasn't an available resource because I had my hands tied in Propeller II.
@pedward - EEPROM or NVRAM is more difficult than you think and requires different process layers. It's not a direct drop in for existing memory and most of the real-estate gained is not proportioned where you think it would be, or at least not optimally without reworking most of the interconnections. Take for example a 3x3 with a square area of 9 versus a 4x4 with a square area of 16... Yes, you have about a 78% gain in area, but utilizing the original 3x3 core within the 4x4 only gives you a peripheral of about 0.5 units around the 3x3 core. This gain isn't exactly always where you would need it to be.
@Chip - I'm sure that we could LVS the P8X32B today. The problem that I remember is that hierarchy was broken in too many places between the Layout and the Schematic, causing the LVS tool to lose track of where it needed to be. At the time we had another guy working the layout on this and I wasn't an available resource because I had my hands tied in Propeller II.
@pedward - EEPROM or NVRAM is more difficult than you think and requires different process layers. It's not a direct drop in for existing memory and most of the real-estate gained is not proportioned where you think it would be, or at least not optimally without reworking most of the interconnections. Take for example a 3x3 with a square area of 9 versus a 4x4 with a square area of 16... Yes, you have about a 78% gain in area, but utilizing the original 3x3 core within the 4x4 only gives you a peripheral of about 0.5 units around the 3x3 core. This gain isn't exactly always where you would need it to be.
Could always do a stacked die if native silicon isn't possible. The A is 7x7mm, correct? Assuming no process change and using the same fab, and you've got an 18mmx18mm package, wouldn't it be feasible to just make a bigger die, maybe rectangular? Say 7x10mm?
I know those EEPROM dies are super tiny, heck they can fit them in an SOT-23!
I don't want to be a nay sayer, it's just that there are many caveats along the way that can make these proposals not feasible.
A stacked dies needs to be planned from the beginning and is similar to BGA only at the silicon level... In areas of the interconnects there is a 'keepout' of any active circuitry of at least 80 microns.
As far as the die cut, a "princess cut' or perfect square is the best real-estate to $$ ratio.
Since you did the 2 power pins per 8 bit port on the P2, I'd guess you'd probably do the same on the P8X32B.
10*8 = 80 pins for data and power
XI
XO
BOEn
RESn
That adds up to 84, which is perfect for a PLCC, but you'd probably trim 4 power pins off to fit it into a QFP 80 package.
That would be about 18mm square.
I've just seen a data sheet revision from Korean ABOV, and they tweaked an 80 pin uC to move from a smaller package to a a larger one
(I've also noticed Renesas is moving to offer two-pitch choices too.)
The new ABOV changed
80-Pin LQFP-1414 Package (0.65mm) => 80-Pin MQFP-1420 Package (0.8mm)
- that will have been either price, or customer demand ? (my guess is customer demand, following Renesas lead )
So I would suggest the same 80-Pin MQFP-1420 Package (0.8mm).
They also offer a 64-Pin LQFP-1010 Package (0.5mm) which has the advantage of being the same Gull-outline as the present Prop.
Only question is if the 64 I/O die would still fit in a 10mm package ?
and on the topic of trends, I see Infineon have a new 32 bit uC, in TSSOP 16/28/38, they claim in 65nm process, and this part is spec'd over the wide supply between 1.8V and 5.5V
Clearly, a fine process no longer need exclude 5V i/o, & now would be a good time to question FABs on this ?
How bout putting EEPROM on the die, since you got more room (I'm pretty sure I asked you whether that was possible before)?
EE memory is not practical within the same wafer, but some companies do dual-die, and bond at lead-frame-time.
So from the outside users do not know it is two die.
This approach raises questions of Size of Memory, How to disable, or is that two part-codes ? (one with EE-inside and one without ?)
A stacked dies needs to be planned from the beginning and is similar to BGA only at the silicon level... In areas of the interconnects there is a 'keepout' of any active circuitry of at least 80 microns.
I think some vendors mount the second die on the lead frame, and wire bond ? So that has less die constraints than true stacked die.
Becomes more a packaging-time question - and how small IS the die on a Serial EE these days ?
I'm glad Parallax is still planning to pursue the P8X32B. That will be a great solution for simpler low-power systems that need fast external memory and I/O at the same time.
Maybe ~60KB hub ram + ~4KB hub rom (with similar monitor rom code as P2)
Use SPI Flash for boot
Use alternate method for extra IO other than using C bit
Simple ADC on I/O
Any chance to increase clock ?
Any chance to increase hub access to 1 in 8 ?
Any chance to decrease instruction clock cycles ? (probably not without major changes)
Any chance to tighten the onboard oscillator so that it could be more widely used instead of xtal (store calabration in flash/eeprom)
Any additional power saving possible ?
Consider process geometry
From what has been said about the tools/design flow, some of that would be optimistic.
Think along the lines of PCB design a generation ago & MCAD packages.
Thus copy/paste of more IO is lowish-risk, and maybe if the larger IO ring bumps up the inner die area, then an expanded RAM block could paste-in. I think they are locked-into a process node, given the MCAD nature of the design and especially test/verify.
Things like SPI boot. ADC pins, etc would likely go into the risk-outweighs-advantage basket.
At 64io, some means to level shift could broaden the device appeal, and as I mentioned above if even some IO could be 5V, that would be great. The 1-8-5V wide supply spec is becoming a pretty common standard now.
Cluso99, others:
Remember Prop1 was hand layed out and built. It's not a VHDL change and resynthesis to get new features like with the Prop2.
Anything beyond the second I/O bank is going to be out of the realm of feasable. Even adding more memory would require new address lines. Changing ROM for RAM is also different on the Prop1, since they are separate blocks. I also think it would be a serious mistake to change the Prop1 setup of rom/ram and so forth in this version. It's just a Prop1 with the second I/O bank enabled and exposed to pins.
Also, all we've heard is Chip say he'd like to get it squared away, but I'm not sure that means He's talked with Ken and the rest of the Parallax folks to decide if it really makes sense for the company to do it. Plus, they are likely putting their resources into prop 2 for the time being, we don't want to interfere with that.
It sounded from Chip's last post that the P8X32B was, if not on the schedule, on the agenda. I would not expect a lot -- or really any -- changes from P8X32A other than the second I/O bank implemented as described in the docs, 32K RAM, current ROM, carry bit select for port B and all. Of course this might be overly conservative; if they're talking about VHDL they may have gotten the P1 design into more modern tools. I just like the idea of having the same CPUx8 we have now but with the extra I/O.
Seconded. If nothing at all changed but for the addition of PORTB, it would be an excellent development. We've seen that a Prop can use all the pins it gets!
Seconded. If nothing at all changed but for the addition of PORTB, it would be an excellent development. We've seen that a Prop can use all the pins it gets!
It sounded from Chip's last post that the P8X32B was, if not on the schedule, on the agenda. I would not expect a lot -- or really any -- changes from P8X32A other than the second I/O bank implemented as described in the docs, 32K RAM, current ROM, carry bit select for port B and all. Of course this might be overly conservative; if they're talking about VHDL they may have gotten the P1 design into more modern tools. I just like the idea of having the same CPUx8 we have now but with the extra I/O.
A VHDL variant does exist, it is called the Prop 2
VHDL prop 1 will likely be larger-die-area and maybe also slower, and I think to knowthe speed, costs them expensive external route+sim time.
Depending on what the extra IO does to the die-area, it may be practical to fit more RAM.
The native address reach is 32 bits, so it is a couple more latch/mux bits and wire channels per COG, plus the memory cell itself.
The adr lines may even be already routed to the memory cell edges.
I wonder how hard it would be to implement the 2nd half of the 64K hub memory as a RAM-shadowed ROM:
- After booting (which would be exactly identical to booting P1, i.e. download 32K from EEPROM), reading from 0x8000-FFFF results in reading from ROM
- Writing to 0x8000-FFFF would always write to RAM so with a simple RDLONG/WRLONG (to the same address) loop you can copy ROM to RAM
- After "flipping a bit" somewhere (are there any unused register bits anywhere?), reading from 0x8000-FFFF results in reading from RAM instead of ROM.
This would be fully backwards compatible with all P8X32A software (unless it inadvertently flips the shadow-bit), but it would provide up to 32K extra hub RAM for software that doesn't need the ROM.
By the way, in the case of my current project, a Propeller 1 with 64 ports would definitely make the Propeller 2 less interesting: the main reason that I'm interested in eventually switching to the Prop2 is the extra I/O lines. I'm pretty sure that from a business perspective, redirecting attention away from the Prop2 (in which I'm sure they invested a lot of money over the last couple of years) might not be the best idea for Parallax. I have to speculate that a P8X32B is not going to happen any time soon.
Just adding the additional I/O would make a P8X32B very interesting; I have run out of I/O's many times.
Like others suggested, I'd love to get rid of the ROM and replace it with RAM - and a P2 style monitor and loader, perhaps using an SPI flash instead of an eeprom, in a TQFP-80 or PLCC-84 package.
PLCC-84 would be hobbyist friendly, as throughole sockets for PLCC-84 are readily available.
More ram and SPI flash support would make it very PropGCC / Catalina / LMM friendly.
*** BUT ***
While I'd love the above, Parallax may not wish to expand the time/resources to do this - remember, getting the P2 out and promoting it will take a lot of resources, and while I'd love to see a P1B, I don't want it if it would adversely affect the P2.
While I'd love the above, Parallax may not wish to expand the time/resources to do this - remember, getting the P2 out and promoting it will take a lot of resources, and while I'd love to see a P1B, I don't want it if it would adversely affect the P2.
Correct.
If it were me, I would not do a P1B, especially not if the P2 hits a low price. ( ~ 1.5x P1 has been waved about, even < 2x P1 is cheaper per io)
That does not leave much elbow room for P1B sales.
A P1B willbe a larger die, with smaller sales volumes, and so push toward a P2.price - a P1B that costs the same or more than a P2, would sell how many ?
(and remember it is likely possible to drop a P2 die into a 80-Pin MQFP-1420 Package (0.8mm) mentioned above )
Of course, if P2 is delayed significantly, or the price jumps, then there might be space for a P1B... a few weeks will tell us
Mark this as a major mile stone for the Propeller II
We have a tapeout submission on the 5th of February, 2013 that we are ready for ... The foundry requires 15 days for tooling and proper design setup, so the actual shuttle run will be on the 20th of February.
Everyone wish us luck and cross your fingers. This initial run will be a small batch of Propeller II's (40 or so) , In approximately three weeks time after the 20th of February (some time around mid-March) we will have chips back from the fab so that we can test various key parameters of the Propeller II. If all goes well, then full-production chips should be underway.
Nice work, you guys have put your heart and souls into this chip and it will be great to finally see it on the market.
I'm glad Parallax is still planning to pursue the P8X32B. That will be a great solution for simpler low-power systems that need fast external memory and I/O at the same time.
I thought at one time Chip made passing mention of a P1 running at 160Mhz (based on the success he was having with P2). I can't find it now despite the fact that I remember posting about how helpful that would be. I think either an 160Mhz P1 or an 80Mhz P1 with 64 I/O would have a great many uses for low-power systems.
P1B or P8XxxB:
A lot depends on precisely what can be reused of the prior work done on this, and what can be done simply and cheaply. It may be that to put the P1B into VHDL is easy and if so, then a few easy things without risk could be done, such as increasing hub ram in the place of the existing rom (means no more address pins required) or as suggested, shaddow the extra ram over rom.
Obviously a faster P1B would be nice - we already know we can overclock the P1 significantly (I always run at 104MHz) so there may just be a couple of simple tricks to make it run much faster.
It would also be worth looking at seeing if the hub access could be increased to 1 in 8 instead of the current 1 in 16. This would be a big boost for spin and LMM based programs.
Certainly worth looking at using SPI flash instead of I2C eeprom - but this will require an extra pin.
Also worth looking at a 64pin package vs 84 pin package - only 56 IO on my calcs but makes for a smaller pcb. If the die can be made small enough it could also be used in a new 40 & 44 pin P1 replacement by not using the extra IO. Just a thought.
BUT, remember the P1 uses less power than the P2 will use. Any P1B should keep this low power as that is a benefit of the existing P1.
Perhaps there could be a reason to migrate to one level smaller geometry.
Are there any other ideas learnt from the P2 that could be put into a P1B ???
Ultimately, we can only speculate here because we do not know what is involved. Obviously for Parallax to do this, there needs to be minimal effort and risk. 3 years ago the P1B in it's form then was worthwhile. Unfortunately a lot of us just wanted the P2 (me included). A lot of changes have taken place since, and the P1B has less of a lead than it once would have had. I think it is certainly worth a look by Chip & Co to see just what could be done with P1B and at what cost.
A VHDL variant does exist, it is called the Prop 2
I'd like to point out that this isn't really the case; P2 has drifted far from P1 in a lot of ways, mostly good of course but a few bad. The biggest problems are P2 needing 1.8V power for the core and essentially not being capable of low power long-battery operation due to leakage in the 160 nm process. P2 is also going to require very different optimization due to its more efficient pipeline, the availability of the CLUT, and other nifty features. P1B would be mostly object code compatible (we'd have to check a lot of carry bits on port access) but for some applications it would be a much better fit.
I'm less intrigued by a 160 MHz P1 than by a P1B that can go to microamp low-battery mode in software. Speed is what the P2 is about. A P1 with the same clock speed and power requirements but 32 more I/O would be a very useful thing in roles the P2 will not be able to fill.
Incidentally, on the scaling it in VHDL thing at the last UPEW Chip teased us with the possibility of rescaling P2 to 45nm and creating chips that can run at 1 GHz. Of course they'd also require 100 watt power supplies because of the leakage...
Comments
Finally! You and Chip have been lazing about for far too long, and it's good to hear you've gotten around to getting some work done!
Seriously, we'll be keeping fingers crossed, burning incense, helping old ladies cross the street, anything we need for a successful outcome. And I'm sure it will be. You guys have been very careful and diligent in getting it done right.
-- Gordon
"I love the smell of hot silicon....it reminds me of Propeller I"
So, how's the preliminary design on the Propeller III coming?
Yes, we need to get around to that. Beau will be working on some other stuff while the chip is being fab'd, but we need to get that LVS work done, once and for all.
10*8 = 80 pins for data and power
XI
XO
BOEn
RESn
That adds up to 84, which is perfect for a PLCC, but you'd probably trim 4 power pins off to fit it into a QFP 80 package.
That would be about 18mm square.
How bout putting EEPROM on the die, since you got more room (I'm pretty sure I asked you whether that was possible before)?
Instead of adding RAM, which would break compatibility, just put some EEPROM on the chip, hanging out where the usual EEPROM does, but then you have a 64 I/O 1 chip solution!!!
Great work everyone over at Parallax! : D
@pedward - EEPROM or NVRAM is more difficult than you think and requires different process layers. It's not a direct drop in for existing memory and most of the real-estate gained is not proportioned where you think it would be, or at least not optimally without reworking most of the interconnections. Take for example a 3x3 with a square area of 9 versus a 4x4 with a square area of 16... Yes, you have about a 78% gain in area, but utilizing the original 3x3 core within the 4x4 only gives you a peripheral of about 0.5 units around the 3x3 core. This gain isn't exactly always where you would need it to be.
Yup!
Could always do a stacked die if native silicon isn't possible. The A is 7x7mm, correct? Assuming no process change and using the same fab, and you've got an 18mmx18mm package, wouldn't it be feasible to just make a bigger die, maybe rectangular? Say 7x10mm?
I know those EEPROM dies are super tiny, heck they can fit them in an SOT-23!
A stacked dies needs to be planned from the beginning and is similar to BGA only at the silicon level... In areas of the interconnects there is a 'keepout' of any active circuitry of at least 80 microns.
As far as the die cut, a "princess cut' or perfect square is the best real-estate to $$ ratio.
I've just seen a data sheet revision from Korean ABOV, and they tweaked an 80 pin uC to move from a smaller package to a a larger one
(I've also noticed Renesas is moving to offer two-pitch choices too.)
The new ABOV changed
80-Pin LQFP-1414 Package (0.65mm) => 80-Pin MQFP-1420 Package (0.8mm)
- that will have been either price, or customer demand ? (my guess is customer demand, following Renesas lead )
So I would suggest the same 80-Pin MQFP-1420 Package (0.8mm).
They also offer a 64-Pin LQFP-1010 Package (0.5mm) which has the advantage of being the same Gull-outline as the present Prop.
Only question is if the 64 I/O die would still fit in a 10mm package ?
and on the topic of trends, I see Infineon have a new 32 bit uC, in TSSOP 16/28/38, they claim in 65nm process, and this part is spec'd over the wide supply between 1.8V and 5.5V
Clearly, a fine process no longer need exclude 5V i/o, & now would be a good time to question FABs on this ?
EE memory is not practical within the same wafer, but some companies do dual-die, and bond at lead-frame-time.
So from the outside users do not know it is two die.
This approach raises questions of Size of Memory, How to disable, or is that two part-codes ? (one with EE-inside and one without ?)
I think some vendors mount the second die on the lead frame, and wire bond ? So that has less die constraints than true stacked die.
Becomes more a packaging-time question - and how small IS the die on a Serial EE these days ?
P8X32B ---> P8x64B
If there is quite a bit of work to fix the work done, might it be opportune to look at a few other options:
From what has been said about the tools/design flow, some of that would be optimistic.
Think along the lines of PCB design a generation ago & MCAD packages.
Thus copy/paste of more IO is lowish-risk, and maybe if the larger IO ring bumps up the inner die area, then an expanded RAM block could paste-in. I think they are locked-into a process node, given the MCAD nature of the design and especially test/verify.
Things like SPI boot. ADC pins, etc would likely go into the risk-outweighs-advantage basket.
At 64io, some means to level shift could broaden the device appeal, and as I mentioned above if even some IO could be 5V, that would be great. The 1-8-5V wide supply spec is becoming a pretty common standard now.
Remember Prop1 was hand layed out and built. It's not a VHDL change and resynthesis to get new features like with the Prop2.
Anything beyond the second I/O bank is going to be out of the realm of feasable. Even adding more memory would require new address lines. Changing ROM for RAM is also different on the Prop1, since they are separate blocks. I also think it would be a serious mistake to change the Prop1 setup of rom/ram and so forth in this version. It's just a Prop1 with the second I/O bank enabled and exposed to pins.
Also, all we've heard is Chip say he'd like to get it squared away, but I'm not sure that means He's talked with Ken and the rest of the Parallax folks to decide if it really makes sense for the company to do it. Plus, they are likely putting their resources into prop 2 for the time being, we don't want to interfere with that.
Roy
One addition to that.
PORTC for internal communication between COG's
Coulda had that with PORTB, but didn't happen. There isn't a register free for PORTC, unless some liberties are taken.
A VHDL variant does exist, it is called the Prop 2
VHDL prop 1 will likely be larger-die-area and maybe also slower, and I think to knowthe speed, costs them expensive external route+sim time.
Depending on what the extra IO does to the die-area, it may be practical to fit more RAM.
The native address reach is 32 bits, so it is a couple more latch/mux bits and wire channels per COG, plus the memory cell itself.
The adr lines may even be already routed to the memory cell edges.
- After booting (which would be exactly identical to booting P1, i.e. download 32K from EEPROM), reading from 0x8000-FFFF results in reading from ROM
- Writing to 0x8000-FFFF would always write to RAM so with a simple RDLONG/WRLONG (to the same address) loop you can copy ROM to RAM
- After "flipping a bit" somewhere (are there any unused register bits anywhere?), reading from 0x8000-FFFF results in reading from RAM instead of ROM.
This would be fully backwards compatible with all P8X32A software (unless it inadvertently flips the shadow-bit), but it would provide up to 32K extra hub RAM for software that doesn't need the ROM.
By the way, in the case of my current project, a Propeller 1 with 64 ports would definitely make the Propeller 2 less interesting: the main reason that I'm interested in eventually switching to the Prop2 is the extra I/O lines. I'm pretty sure that from a business perspective, redirecting attention away from the Prop2 (in which I'm sure they invested a lot of money over the last couple of years) might not be the best idea for Parallax. I have to speculate that a P8X32B is not going to happen any time soon.
===Jac
Like others suggested, I'd love to get rid of the ROM and replace it with RAM - and a P2 style monitor and loader, perhaps using an SPI flash instead of an eeprom, in a TQFP-80 or PLCC-84 package.
PLCC-84 would be hobbyist friendly, as throughole sockets for PLCC-84 are readily available.
More ram and SPI flash support would make it very PropGCC / Catalina / LMM friendly.
*** BUT ***
While I'd love the above, Parallax may not wish to expand the time/resources to do this - remember, getting the P2 out and promoting it will take a lot of resources, and while I'd love to see a P1B, I don't want it if it would adversely affect the P2.
Yes, sockets are somewhat available, but I suspect the package itself is heading up the EOPL price curve.
Correct.
If it were me, I would not do a P1B, especially not if the P2 hits a low price. ( ~ 1.5x P1 has been waved about, even < 2x P1 is cheaper per io)
That does not leave much elbow room for P1B sales.
A P1B willbe a larger die, with smaller sales volumes, and so push toward a P2.price - a P1B that costs the same or more than a P2, would sell how many ?
(and remember it is likely possible to drop a P2 die into a 80-Pin MQFP-1420 Package (0.8mm) mentioned above )
Of course, if P2 is delayed significantly, or the price jumps, then there might be space for a P1B... a few weeks will tell us
Nice work, you guys have put your heart and souls into this chip and it will be great to finally see it on the market.
I thought at one time Chip made passing mention of a P1 running at 160Mhz (based on the success he was having with P2). I can't find it now despite the fact that I remember posting about how helpful that would be. I think either an 160Mhz P1 or an 80Mhz P1 with 64 I/O would have a great many uses for low-power systems.
A lot depends on precisely what can be reused of the prior work done on this, and what can be done simply and cheaply. It may be that to put the P1B into VHDL is easy and if so, then a few easy things without risk could be done, such as increasing hub ram in the place of the existing rom (means no more address pins required) or as suggested, shaddow the extra ram over rom.
Obviously a faster P1B would be nice - we already know we can overclock the P1 significantly (I always run at 104MHz) so there may just be a couple of simple tricks to make it run much faster.
It would also be worth looking at seeing if the hub access could be increased to 1 in 8 instead of the current 1 in 16. This would be a big boost for spin and LMM based programs.
Certainly worth looking at using SPI flash instead of I2C eeprom - but this will require an extra pin.
Also worth looking at a 64pin package vs 84 pin package - only 56 IO on my calcs but makes for a smaller pcb. If the die can be made small enough it could also be used in a new 40 & 44 pin P1 replacement by not using the extra IO. Just a thought.
BUT, remember the P1 uses less power than the P2 will use. Any P1B should keep this low power as that is a benefit of the existing P1.
Perhaps there could be a reason to migrate to one level smaller geometry.
Are there any other ideas learnt from the P2 that could be put into a P1B ???
Ultimately, we can only speculate here because we do not know what is involved. Obviously for Parallax to do this, there needs to be minimal effort and risk. 3 years ago the P1B in it's form then was worthwhile. Unfortunately a lot of us just wanted the P2 (me included). A lot of changes have taken place since, and the P1B has less of a lead than it once would have had. I think it is certainly worth a look by Chip & Co to see just what could be done with P1B and at what cost.
I'd like to point out that this isn't really the case; P2 has drifted far from P1 in a lot of ways, mostly good of course but a few bad. The biggest problems are P2 needing 1.8V power for the core and essentially not being capable of low power long-battery operation due to leakage in the 160 nm process. P2 is also going to require very different optimization due to its more efficient pipeline, the availability of the CLUT, and other nifty features. P1B would be mostly object code compatible (we'd have to check a lot of carry bits on port access) but for some applications it would be a much better fit.
I'm less intrigued by a 160 MHz P1 than by a P1B that can go to microamp low-battery mode in software. Speed is what the P2 is about. A P1 with the same clock speed and power requirements but 32 more I/O would be a very useful thing in roles the P2 will not be able to fill.
Incidentally, on the scaling it in VHDL thing at the last UPEW Chip teased us with the possibility of rescaling P2 to 45nm and creating chips that can run at 1 GHz. Of course they'd also require 100 watt power supplies because of the leakage...