Here are some current measurements on the old vs. new silicon:
Wow. This thread has been an exciting read! It looks like the updated P2 is exceeding everyone's expectations before even a single board has shipped!
My vague recollection about the v1 chip was that the hub RAM (or was it the eggbeater?) was a significant factor in power consumption. How much has that changed with the v2 chip?
The 4-bit post divider is outside the closed loop of the PLL, right? I've been using that divider, by setting it to /2, to test the top frequency of the PLL of the v1 chip. The result I got was around 415 MHz at room temp.
Hi evanh
During the tests you've done, to determine the top frequency of the PLL, are you sure you were using a post divider of /2 (%PPPP = %0000) and not /1 (%PPPP = %1111, thus, post divider bypassed)?
IIRC (and also based on the linked post, from Chip) if one wants to exercise the top limits of the PLL, the post divider needs to be bypassed, hence the /1.
Anyway, I can give the PLL impossibly-high-frequency settings and it goes as fast as it can and becomes temperature dependent, since it can't lock. This makes me think that the VCO divider is faster than the VCO, which is better than the opposite, since things top out safely.
I doubt there is a way to do this, but I will ask anyway Is there a way of knowing that the VCO is locked? Is there a register that would hold the VCO lock status?
I made a program tonight to check for any remaining race conditions between DIR and OUT signals, which caused pins to glitch low when going from high to float on the first silicon. On the new silicon, DIR is supposed to transition before OUT, to avoid glitching.
On the new silicon, pin 16 still exhibits a race condition, but only slightly. All the other pins are clean. I need to ask Wendy why this could be, since we made timing constraints to assure that DIR transitions before OUT on all the pins. It's not a big deal, but in future versions of P2 chips, we'll want this problem gone, all the way.
Be interesting to see how different pin16 is in the OnSemi reports.
Did you run this test across all the mounted glob sample you have ? It may even vary with each device, and temperature.
Better may be to include an explicit Verilog delay, so the drive and float do not both try to happen on the exact same clock edge ?
Even a latch to hold for half a clock duration, gives a definite margin ?
As errata go, this one is very minor...
I made a program tonight to check for any remaining race conditions between DIR and OUT signals, which caused pins to glitch low when going from high to float on the first silicon. On the new silicon, DIR is supposed to transition before OUT, to avoid glitching.
On the new silicon, pin 16 still exhibits a race condition, but only slightly. All the other pins are clean. I need to ask Wendy why this could be, since we made timing constraints to assure that DIR transitions before OUT on all the pins. It's not a big deal, but in future versions of P2 chips, we'll want this problem gone, all the way.
Be interesting to see how different pin16 is in the OnSemi reports.
Did you run this test across all the mounted glob sample you have ? It may even vary with each device, and temperature.
Better may be to include an explicit Verilog delay, so the drive and float do not both try to happen on the exact same clock edge ?
Even a latch to hold for half a clock duration, gives a definite margin ?
As errata go, this one is very minor...
I agree, its minor.
When I looked into this on the previous silicon, P15 seemed worst and P16 was second worst. There seemed to be a possible relationship between the test pin functions and how bad the glitches were
Anyway, I can give the PLL impossibly-high-frequency settings and it goes as fast as it can and becomes temperature dependent, since it can't lock. This makes me think that the VCO divider is faster than the VCO, which is better than the opposite, since things top out safely.
I doubt there is a way to do this, but I will ask anyway Is there a way of knowing that the VCO is locked? Is there a register that would hold the VCO lock status?
Hi ke4pjw
Despite there are many ways to know that a PLL has aquired its lock status, either analog and/or digital, none was implemented at neither silicon version of P2, that could be accessed externally to the PLL itself.
I made a program tonight to check for any remaining race conditions between DIR and OUT signals, which caused pins to glitch low when going from high to float on the first silicon. On the new silicon, DIR is supposed to transition before OUT, to avoid glitching.
On the new silicon, pin 16 still exhibits a race condition, but only slightly. All the other pins are clean. I need to ask Wendy why this could be, since we made timing constraints to assure that DIR transitions before OUT on all the pins. It's not a big deal, but in future versions of P2 chips, we'll want this problem gone, all the way.
Be interesting to see how different pin16 is in the OnSemi reports.
Did you run this test across all the mounted glob sample you have ? It may even vary with each device, and temperature.
Better may be to include an explicit Verilog delay, so the drive and float do not both try to happen on the exact same clock edge ?
Even a latch to hold for half a clock duration, gives a definite margin ?
As errata go, this one is very minor...
I only have one sample chip running on a board. So, my sample lot is ONE. I'm pretty sure this P16 glitch is just baked into the design.
I cooled the part with freeze spray, but the 250mV spike amplitude hardly budged, at all.
Here are some current measurements on the old vs. new silicon:
Wow. This thread has been an exciting read! It looks like the updated P2 is exceeding everyone's expectations before even a single board has shipped!
My vague recollection about the v1 chip was that the hub RAM (or was it the eggbeater?) was a significant factor in power consumption. How much has that changed with the v2 chip?
All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.
The 4-bit post divider is outside the closed loop of the PLL, right? I've been using that divider, by setting it to /2, to test the top frequency of the PLL of the v1 chip. The result I got was around 415 MHz at room temp.
Yes, the 4-bit post-VCO divider is outside of the PLL loop.
All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.
Chip, in the very early days with OnSemi I remember you teasing us that once you had P2V1, it might just be a matter of paying for another run of the layout software to make a design at some stupidly small pitch like 20 nm that would run at 1 GHz but would be a toaster oven due to the insulation leakage, and we all went OOOH we want that. Sounds like we're a good fraction of the way to that performance at this design pitch and it's not even a toaster oven. That's some serious exceedance of expectations there.
I got my credit card out and am waving it for when the new eval boards become available.
Here are some current measurements on the old vs. new silicon:
Wow. This thread has been an exciting read! It looks like the updated P2 is exceeding everyone's expectations before even a single board has shipped!
My vague recollection about the v1 chip was that the hub RAM (or was it the eggbeater?) was a significant factor in power consumption. How much has that changed with the v2 chip?
All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.
And that is a glob-top chip which we saw ran hotter than the proper packaged chips!
I think that heat is more due to the available spread on P2ES pcb vs P2V2 pcb, which is much more compact. The previous glob tops were all soldered to P2D2's
All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.
Chip, in the very early days with OnSemi I remember you teasing us that once you had P2V1, it might just be a matter of paying for another run of the layout software to make a design at some stupidly small pitch like 20 nm that would run at 1 GHz but would be a toaster oven due to the insulation leakage, and we all went OOOH we want that. Sounds like we're a good fraction of the way to that performance at this design pitch and it's not even a toaster oven. That's some serious exceedance of expectations there.
I got my credit card out and am waving it for when the new eval boards become available.
It makes me wonder what kind of speed we could really achieve at 20nm. I hope we get to find out.
All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.
Chip, in the very early days with OnSemi I remember you teasing us that once you had P2V1, it might just be a matter of paying for another run of the layout software to make a design at some stupidly small pitch like 20 nm that would run at 1 GHz but would be a toaster oven due to the insulation leakage, and we all went OOOH we want that. Sounds like we're a good fraction of the way to that performance at this design pitch and it's not even a toaster oven. That's some serious exceedance of expectations there.
I got my credit card out and am waving it for when the new eval boards become available.
From my reading, the finer geometries do reduce leakage per transistor, as well as current draw. It’s the fact that there is usually way more transistors on a finer geometry. You’ll note that the latest CPU’s from Intel use quite a bit less power than the previous generation, despite there being lots more transistors.
All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.
Chip, in the very early days with OnSemi I remember you teasing us that once you had P2V1, it might just be a matter of paying for another run of the layout software to make a design at some stupidly small pitch like 20 nm that would run at 1 GHz but would be a toaster oven due to the insulation leakage, and we all went OOOH we want that. Sounds like we're a good fraction of the way to that performance at this design pitch and it's not even a toaster oven. That's some serious exceedance of expectations there.
I got my credit card out and am waving it for when the new eval boards become available.
It makes me wonder what kind of speed we could really achieve at 20nm. I hope we get to find out.
I wonder where the next sweet spot is these days. eg 90nm, ~50nm ?
It makes me wonder what kind of speed we could really achieve at 20nm. I hope we get to find out.
P2 is already fairly 'topped out' at the speeds a processor pin in CMOS can manage.
The monster parts that are in 20nm and below, are all using GHz differential LVDS style connection schemes, and none of those are what could be called 'microcontrollers'.
I am perfectly happy with 320Mhz, even 160. Mine is running 24/7 with whatever last prog I loaded, sometimes over Days when I do other work.
No glitch no fail, perfectly stable.
Programming the beast is challenging. one has to dig deep into chips way of thinking and then one ends up with some astonishing small amount of lines needed to do it. Simple serial ? a couple of lines, no library needed. Those smart pins are wonderful.
But with Parallax Style documentation this will be solved. Once you get it, it's quite easy, logical and understandable. I learn by coding it myself and so far I went thru SP's, Streamer and now I try to grook XBYTE and SKIPping.
Like the P1 was different from all others, the P2 is different from all others. So far I really like it.
The 4-bit post divider is outside the closed loop of the PLL, right? I've been using that divider, by setting it to /2, to test the top frequency of the PLL of the v1 chip. The result I got was around 415 MHz at room temp.
Yes, the 4-bit post-VCO divider is outside of the PLL loop.
Good, so my testing proves that the v1 PLL can go faster than the synthesised core. I'm guessing the v2 PLL can go even faster, therefore the PLL is not the limiting factor for 390 MHz at room temp.
Hi evanh
During the tests you've done, to determine the top frequency of the PLL, are you sure you were using a post divider of /2 (%PPPP = %0000) and not /1 (%PPPP = %1111, thus, post divider bypassed)?
IIRC (and also based on the linked post, from Chip) if one wants to exercise the top limits of the PLL, the post divider needs to be bypassed, hence the /1.
The XDIVP of 2 means the PLL internally is going double the frequency of the sysclock. Which means we can keep operating the chip at half the PLL's VCO frequency to see at what point the achieved sysclock frequency stops matching the calculated/specified frequency.
EDIT: Attached an example terminal capture, and the source that produced it.
EDIT2: Updated diag-init.spin2 to show sub-MHz sysclock setting. Example now using XDIVP = 4.
EDIT3: Fixed a bug in fractional comport divider calculation when at very low sysclock rates.
Programming the beast is challenging. one has to dig deep into chips way of thinking and then one ends up with some astonishing small amount of lines needed to do it. Simple serial ? a couple of lines, no library needed. Those smart pins are wonderful.
Mike
This chip IS a beast! I've been doing most of my testing at 160mhz but often times I end up running RCFast just so I can get LA caps! Really exciting to see the power reduction in the respin. I would have been happy with the fixing the sign extension issue only, tbh.
I'd love to see what this thing could do at 90nm, although could it still be a 3v3 chip? It would be really nice, even if the pins are the limiting factor here. I remember there was some import of the custom ring of the chip, would that require an entire manual layout or could the shrink be done without completely redesigning the custom part?
I'm loving programming with fastspin and the inline assembly. P2asm is just wicked and I haven't even hardly touched most of the new features. Those smartpins, WOW. I loved the counters and now having 64 smartpins, instead of just 16 counters. AND FINALLY 64 PINS!!!
I'm slightly confused, chip has mentioned the iq modulator problem in rev1 chip and so I'm thinking all the pixel mixing instructions are borked? I'm about at a place where they'd come in handy!!
The P2 ?notUniPad? is coming along (slowly) and I can't wait till I'm adding radios and RTC and everything!
The iq modulator has been fixed - see the color NTSC output for an example of it in use. Previously NTSC needed to be greyscale due to the iq modulator issue
The hardware encoder counting was also similarly affected, and has been fixed in this respin.
I'm slightly confused, chip has mentioned the iq modulator problem in rev1 chip and so I'm thinking all the pixel mixing instructions are borked? I'm about at a place where they'd come in handy!!
Yes, I think that was also sign-bug related, and fails in ES1 and is ok in ES2 ? ie if you want IQ, or quadcounting, you need ES2 - but they look shippable
The iq modulator has been fixed - see the color NTSC output for an example of it in use. Previously NTSC needed to be greyscale due to the iq modulator issue
The hardware encoder counting was also similarly affected, and has been fixed in this respin.
I guess my question isn't properly stated, sorry. I'm wondering if the Pixel Mixer instructions work properly in REVa silicon? I haven't tried them and my understanding of video is poor tbh. I'm not even sure 100% how these work but was looking for a fun project to play around on while taking a break from debugging the ton of code I just poorly wrote, lol...
The iq modulator has been fixed - see the color NTSC output for an example of it in use. Previously NTSC needed to be greyscale due to the iq modulator issue
The hardware encoder counting was also similarly affected, and has been fixed in this respin.
I guess my question isn't properly stated, sorry. I'm wondering if the Pixel Mixer instructions work properly in REVa silicon? I haven't tried them and my understanding of video is poor tbh. I'm not even sure 100% how these work but was looking for a fun project to play around on while taking a break from debugging the ton of code I just poorly wrote, lol...
Those instructions have always worked. They are not related to the old sign-extension problem.
Hi evanh
During the tests you've done, to determine the top frequency of the PLL, are you sure you were using a post divider of /2 (%PPPP = %0000) and not /1 (%PPPP = %1111, thus, post divider bypassed)?
IIRC (and also based on the linked post, from Chip) if one wants to exercise the top limits of the PLL, the post divider needs to be bypassed, hence the /1.
The XDIVP of 2 means the PLL internally is going double the frequency of the sysclock. Which means we can keep operating the chip at half the PLL's VCO frequency to see at what point the achieved sysclock frequency stops matching the calculated/specified frequency.
EDIT: Attached an example terminal capture, and the source that produced it.
EDIT2: Updated diag-init.spin2 to show sub-MHz sysclock setting. Example now using XDIVP = 4.
Hi evanh
Thanks for the explanation and linked Cluso's post; just now the coin has dropped, inside my head...
Unfortunately, the attached .zip was swallowed by forum's software. Passing the mouse cursor over it, says max-PLL.zip carries 10kB worth of information, but saving it resulted a 0-lenght file that can't be unziped.
I feel like I've sufficiently checked out the new silicon and it is good.
Speed is way higher and power consumption is much lower than I anticipated. We've got all the old silicon bugs fixed and all the new enhancements seem to work fine. Also, our I/O pins seem to synchronously interface all the way to the top end at 390MHz at room temperature, which was quite unexpected. Maybe tightening their propagation times to within 1ns of each other helped them work way beyond the timing spec of 175MHz at 150C junction temperature.
The only problem I can find is this P16 high-to-float 250mV downward glitch, which is pretty minor.
We need to get every die packaged that we can from these initial six engineering wafers. There should be at least 1,000 and maybe up to 2,400 chips from them. We are waiting to hear from ON Semi about when that can happen. We will build an initial run of 400 P2 Eval boards with more, as needed. The full mask set is ready to go, and was used to make these six engineering wafers.
As we transition from "engineering" to "production" at On Semi, there are some testing details to work out, but that all looks very promising.
I keep wracking my brain, trying to figure out if there's anything else I need to be worrying about, at this point. I'll probably transition back to working on Spin2 tomorrow.
P.S. VonSzarvas has got the new P2 Eval board thoroughly tested for all kinds of operating and fault conditions. It is going to be a very solid platform for the new P2 silicon. Looks simple, too, without all the jumpers. We're ready to build those at Parallax as soon as we get Amkor-packaged parts from ON Semi.
Comments
Wow. This thread has been an exciting read! It looks like the updated P2 is exceeding everyone's expectations before even a single board has shipped!
My vague recollection about the v1 chip was that the hub RAM (or was it the eggbeater?) was a significant factor in power consumption. How much has that changed with the v2 chip?
Hi evanh
During the tests you've done, to determine the top frequency of the PLL, are you sure you were using a post divider of /2 (%PPPP = %0000) and not /1 (%PPPP = %1111, thus, post divider bypassed)?
IIRC (and also based on the linked post, from Chip) if one wants to exercise the top limits of the PLL, the post divider needs to be bypassed, hence the /1.
https://forums.parallax.com/discussion/comment/1466494/#Comment_1466494
Henrique
Be interesting to see how different pin16 is in the OnSemi reports.
Did you run this test across all the mounted glob sample you have ? It may even vary with each device, and temperature.
Better may be to include an explicit Verilog delay, so the drive and float do not both try to happen on the exact same clock edge ?
Even a latch to hold for half a clock duration, gives a definite margin ?
As errata go, this one is very minor...
I agree, its minor.
When I looked into this on the previous silicon, P15 seemed worst and P16 was second worst. There seemed to be a possible relationship between the test pin functions and how bad the glitches were
https://forums.parallax.com/discussion/comment/1447922/#Comment_1447922
Hi ke4pjw
Despite there are many ways to know that a PLL has aquired its lock status, either analog and/or digital, none was implemented at neither silicon version of P2, that could be accessed externally to the PLL itself.
Henrique
I only have one sample chip running on a board. So, my sample lot is ONE. I'm pretty sure this P16 glitch is just baked into the design.
I cooled the part with freeze spray, but the 250mV spike amplitude hardly budged, at all.
All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.
I realized it doesn't really matter, because OUT is AND'd with DIR before the signals even leave the cog.
Yes, the 4-bit post-VCO divider is outside of the PLL loop.
Chip, in the very early days with OnSemi I remember you teasing us that once you had P2V1, it might just be a matter of paying for another run of the layout software to make a design at some stupidly small pitch like 20 nm that would run at 1 GHz but would be a toaster oven due to the insulation leakage, and we all went OOOH we want that. Sounds like we're a good fraction of the way to that performance at this design pitch and it's not even a toaster oven. That's some serious exceedance of expectations there.
I got my credit card out and am waving it for when the new eval boards become available.
And that is a glob-top chip which we saw ran hotter than the proper packaged chips!
It makes me wonder what kind of speed we could really achieve at 20nm. I hope we get to find out.
From my reading, the finer geometries do reduce leakage per transistor, as well as current draw. It’s the fact that there is usually way more transistors on a finer geometry. You’ll note that the latest CPU’s from Intel use quite a bit less power than the previous generation, despite there being lots more transistors.
I wonder where the next sweet spot is these days. eg 90nm, ~50nm ?
P2 is already fairly 'topped out' at the speeds a processor pin in CMOS can manage.
The monster parts that are in 20nm and below, are all using GHz differential LVDS style connection schemes, and none of those are what could be called 'microcontrollers'.
No glitch no fail, perfectly stable.
Programming the beast is challenging. one has to dig deep into chips way of thinking and then one ends up with some astonishing small amount of lines needed to do it. Simple serial ? a couple of lines, no library needed. Those smart pins are wonderful.
But with Parallax Style documentation this will be solved. Once you get it, it's quite easy, logical and understandable. I learn by coding it myself and so far I went thru SP's, Streamer and now I try to grook XBYTE and SKIPping.
Like the P1 was different from all others, the P2 is different from all others. So far I really like it.
Mike
P2 assembler is a dream if you forget about all the other instructions to do some quirky faster things. I want those quirky faster things
The XDIVP of 2 means the PLL internally is going double the frequency of the sysclock. Which means we can keep operating the chip at half the PLL's VCO frequency to see at what point the achieved sysclock frequency stops matching the calculated/specified frequency.
EDIT: Attached an example terminal capture, and the source that produced it.
EDIT2: Updated diag-init.spin2 to show sub-MHz sysclock setting. Example now using XDIVP = 4.
EDIT3: Fixed a bug in fractional comport divider calculation when at very low sysclock rates.
This chip IS a beast! I've been doing most of my testing at 160mhz but often times I end up running RCFast just so I can get LA caps! Really exciting to see the power reduction in the respin. I would have been happy with the fixing the sign extension issue only, tbh.
I'd love to see what this thing could do at 90nm, although could it still be a 3v3 chip? It would be really nice, even if the pins are the limiting factor here. I remember there was some import of the custom ring of the chip, would that require an entire manual layout or could the shrink be done without completely redesigning the custom part?
I'm loving programming with fastspin and the inline assembly. P2asm is just wicked and I haven't even hardly touched most of the new features. Those smartpins, WOW. I loved the counters and now having 64 smartpins, instead of just 16 counters. AND FINALLY 64 PINS!!!
I'm slightly confused, chip has mentioned the iq modulator problem in rev1 chip and so I'm thinking all the pixel mixing instructions are borked? I'm about at a place where they'd come in handy!!
The P2 ?notUniPad? is coming along (slowly) and I can't wait till I'm adding radios and RTC and everything!
The hardware encoder counting was also similarly affected, and has been fixed in this respin.
I guess my question isn't properly stated, sorry. I'm wondering if the Pixel Mixer instructions work properly in REVa silicon? I haven't tried them and my understanding of video is poor tbh. I'm not even sure 100% how these work but was looking for a fun project to play around on while taking a break from debugging the ton of code I just poorly wrote, lol...
Those instructions have always worked. They are not related to the old sign-extension problem.
Hi evanh
Thanks for the explanation and linked Cluso's post; just now the coin has dropped, inside my head...
Unfortunately, the attached .zip was swallowed by forum's software. Passing the mouse cursor over it, says max-PLL.zip carries 10kB worth of information, but saving it resulted a 0-lenght file that can't be unziped.
Speed is way higher and power consumption is much lower than I anticipated. We've got all the old silicon bugs fixed and all the new enhancements seem to work fine. Also, our I/O pins seem to synchronously interface all the way to the top end at 390MHz at room temperature, which was quite unexpected. Maybe tightening their propagation times to within 1ns of each other helped them work way beyond the timing spec of 175MHz at 150C junction temperature.
The only problem I can find is this P16 high-to-float 250mV downward glitch, which is pretty minor.
We need to get every die packaged that we can from these initial six engineering wafers. There should be at least 1,000 and maybe up to 2,400 chips from them. We are waiting to hear from ON Semi about when that can happen. We will build an initial run of 400 P2 Eval boards with more, as needed. The full mask set is ready to go, and was used to make these six engineering wafers.
As we transition from "engineering" to "production" at On Semi, there are some testing details to work out, but that all looks very promising.
I keep wracking my brain, trying to figure out if there's anything else I need to be worrying about, at this point. I'll probably transition back to working on Spin2 tomorrow.
P.S. VonSzarvas has got the new P2 Eval board thoroughly tested for all kinds of operating and fault conditions. It is going to be a very solid platform for the new P2 silicon. Looks simple, too, without all the jumpers. We're ready to build those at Parallax as soon as we get Amkor-packaged parts from ON Semi.