New P2 Silicon

evanh · 2019-08-07 14:29

Rayman wrote: »

Re. Pin16:
Does drvh followed by fltl do the same thing as drvh followed by flth?

The way I'm reading the scope trace, yes, one dip per FLTx when output high. I've forgotten why it happens even when the OUT isn't changing.

Seairth · 2019-08-07 15:43

cgracey wrote: »

Here are some current measurements on the old vs. new silicon:

Wow. This thread has been an exciting read! It looks like the updated P2 is exceeding everyone's expectations before even a single board has shipped!

My vague recollection about the v1 chip was that the hub RAM (or was it the eggbeater?) was a significant factor in power consumption. How much has that changed with the v2 chip?

Yanomani · 2019-08-07 17:21

evanh wrote: »

The 4-bit post divider is outside the closed loop of the PLL, right? I've been using that divider, by setting it to /2, to test the top frequency of the PLL of the v1 chip. The result I got was around 415 MHz at room temp.

Hi evanh

During the tests you've done, to determine the top frequency of the PLL, are you sure you were using a post divider of /2 (%PPPP = %0000) and not /1 (%PPPP = %1111, thus, post divider bypassed)?

IIRC (and also based on the linked post, from Chip) if one wants to exercise the top limits of the PLL, the post divider needs to be bypassed, hence the /1.

https://forums.parallax.com/discussion/comment/1466494/#Comment_1466494

Henrique

ke4pjw · 2019-08-07 19:19

cgracey wrote: »

Anyway, I can give the PLL impossibly-high-frequency settings and it goes as fast as it can and becomes temperature dependent, since it can't lock. This makes me think that the VCO divider is faster than the VCO, which is better than the opposite, since things top out safely.

I doubt there is a way to do this, but I will ask anyway

Is there a way of knowing that the VCO is locked? Is there a register that would hold the VCO lock status?

jmg · 2019-08-07 20:37

cgracey wrote: »

I made a program tonight to check for any remaining race conditions between DIR and OUT signals, which caused pins to glitch low when going from high to float on the first silicon. On the new silicon, DIR is supposed to transition before OUT, to avoid glitching.

On the new silicon, pin 16 still exhibits a race condition, but only slightly. All the other pins are clean. I need to ask Wendy why this could be, since we made timing constraints to assure that DIR transitions before OUT on all the pins. It's not a big deal, but in future versions of P2 chips, we'll want this problem gone, all the way.

Be interesting to see how different pin16 is in the OnSemi reports.

Did you run this test across all the mounted glob sample you have ? It may even vary with each device, and temperature.
Better may be to include an explicit Verilog delay, so the drive and float do not both try to happen on the exact same clock edge ?
Even a latch to hold for half a clock duration, gives a definite margin ?
As errata go, this one is very minor...

Tubular · 2019-08-07 21:51

jmg wrote: »

cgracey wrote: »

I made a program tonight to check for any remaining race conditions between DIR and OUT signals, which caused pins to glitch low when going from high to float on the first silicon. On the new silicon, DIR is supposed to transition before OUT, to avoid glitching.

On the new silicon, pin 16 still exhibits a race condition, but only slightly. All the other pins are clean. I need to ask Wendy why this could be, since we made timing constraints to assure that DIR transitions before OUT on all the pins. It's not a big deal, but in future versions of P2 chips, we'll want this problem gone, all the way.

Be interesting to see how different pin16 is in the OnSemi reports.

Did you run this test across all the mounted glob sample you have ? It may even vary with each device, and temperature.
Better may be to include an explicit Verilog delay, so the drive and float do not both try to happen on the exact same clock edge ?
Even a latch to hold for half a clock duration, gives a definite margin ?
As errata go, this one is very minor...

I agree, its minor.

When I looked into this on the previous silicon, P15 seemed worst and P16 was second worst. There seemed to be a possible relationship between the test pin functions and how bad the glitches were

https://forums.parallax.com/discussion/comment/1447922/#Comment_1447922

Rayman · 2019-08-07 22:23

looks minor at 5 us/div. But, does it actually go to zero on the ns time scale?

Yanomani · 2019-08-08 00:11

ke4pjw wrote: »

cgracey wrote: »

Anyway, I can give the PLL impossibly-high-frequency settings and it goes as fast as it can and becomes temperature dependent, since it can't lock. This makes me think that the VCO divider is faster than the VCO, which is better than the opposite, since things top out safely.

I doubt there is a way to do this, but I will ask anyway Is there a way of knowing that the VCO is locked? Is there a register that would hold the VCO lock status?

Hi ke4pjw

Despite there are many ways to know that a PLL has aquired its lock status, either analog and/or digital, none was implemented at neither silicon version of P2, that could be accessed externally to the PLL itself.

Henrique

cgracey · 2019-08-08 00:33

jmg wrote: »

cgracey wrote: »

I made a program tonight to check for any remaining race conditions between DIR and OUT signals, which caused pins to glitch low when going from high to float on the first silicon. On the new silicon, DIR is supposed to transition before OUT, to avoid glitching.

On the new silicon, pin 16 still exhibits a race condition, but only slightly. All the other pins are clean. I need to ask Wendy why this could be, since we made timing constraints to assure that DIR transitions before OUT on all the pins. It's not a big deal, but in future versions of P2 chips, we'll want this problem gone, all the way.

Be interesting to see how different pin16 is in the OnSemi reports.

Did you run this test across all the mounted glob sample you have ? It may even vary with each device, and temperature.
Better may be to include an explicit Verilog delay, so the drive and float do not both try to happen on the exact same clock edge ?
Even a latch to hold for half a clock duration, gives a definite margin ?
As errata go, this one is very minor...

I only have one sample chip running on a board. So, my sample lot is ONE. I'm pretty sure this P16 glitch is just baked into the design.

I cooled the part with freeze spray, but the 250mV spike amplitude hardly budged, at all.

cgracey · 2019-08-08 00:35

Seairth wrote: »

cgracey wrote: »

Here are some current measurements on the old vs. new silicon:

Wow. This thread has been an exciting read! It looks like the updated P2 is exceeding everyone's expectations before even a single board has shipped!

My vague recollection about the v1 chip was that the hub RAM (or was it the eggbeater?) was a significant factor in power consumption. How much has that changed with the v2 chip?

All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.

cgracey · 2019-08-08 00:36

Rayman wrote: »

Re. Pin16:
Does drvh followed by fltl do the same thing as drvh followed by flth?

I realized it doesn't really matter, because OUT is AND'd with DIR before the signals even leave the cog.

cgracey · 2019-08-08 00:37

evanh wrote: »

The 4-bit post divider is outside the closed loop of the PLL, right? I've been using that divider, by setting it to /2, to test the top frequency of the PLL of the v1 chip. The result I got was around 415 MHz at room temp.

Yes, the 4-bit post-VCO divider is outside of the PLL loop.

localroger · 2019-08-08 01:06

cgracey wrote: »

All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.

Chip, in the very early days with OnSemi I remember you teasing us that once you had P2V1, it might just be a matter of paying for another run of the layout software to make a design at some stupidly small pitch like 20 nm that would run at 1 GHz but would be a toaster oven due to the insulation leakage, and we all went OOOH we want that. Sounds like we're a good fraction of the way to that performance at this design pitch and it's not even a toaster oven. That's some serious exceedance of expectations there.

I got my credit card out and am waving it for when the new eval boards become available.

Cluso99 · 2019-08-08 01:11

cgracey wrote: »

Seairth wrote: »

cgracey wrote: »

Here are some current measurements on the old vs. new silicon:

Wow. This thread has been an exciting read! It looks like the updated P2 is exceeding everyone's expectations before even a single board has shipped!

My vague recollection about the v1 chip was that the hub RAM (or was it the eggbeater?) was a significant factor in power consumption. How much has that changed with the v2 chip?

All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.

And that is a glob-top chip which we saw ran hotter than the proper packaged chips!

Tubular · 2019-08-08 01:17

I think that heat is more due to the available spread on P2ES pcb vs P2V2 pcb, which is much more compact. The previous glob tops were all soldered to P2D2's

cgracey · 2019-08-08 01:29

localroger wrote: »

cgracey wrote: »

All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.

Chip, in the very early days with OnSemi I remember you teasing us that once you had P2V1, it might just be a matter of paying for another run of the layout software to make a design at some stupidly small pitch like 20 nm that would run at 1 GHz but would be a toaster oven due to the insulation leakage, and we all went OOOH we want that. Sounds like we're a good fraction of the way to that performance at this design pitch and it's not even a toaster oven. That's some serious exceedance of expectations there.

I got my credit card out and am waving it for when the new eval boards become available.

It makes me wonder what kind of speed we could really achieve at 20nm. I hope we get to find out.

Cluso99 · 2019-08-08 01:38

localroger wrote: »

cgracey wrote: »

All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.

Chip, in the very early days with OnSemi I remember you teasing us that once you had P2V1, it might just be a matter of paying for another run of the layout software to make a design at some stupidly small pitch like 20 nm that would run at 1 GHz but would be a toaster oven due to the insulation leakage, and we all went OOOH we want that. Sounds like we're a good fraction of the way to that performance at this design pitch and it's not even a toaster oven. That's some serious exceedance of expectations there.

I got my credit card out and am waving it for when the new eval boards become available.

From my reading, the finer geometries do reduce leakage per transistor, as well as current draw. It’s the fact that there is usually way more transistors on a finer geometry. You’ll note that the latest CPU’s from Intel use quite a bit less power than the previous generation, despite there being lots more transistors.

Cluso99 · 2019-08-08 01:41

cgracey wrote: »

localroger wrote: »

cgracey wrote: »

All power consumption is way down. Not sure, yet, on how the reduction is distributed. It's so low, anyway, that the chip can be running 320MHz and it's barely even warm.

Chip, in the very early days with OnSemi I remember you teasing us that once you had P2V1, it might just be a matter of paying for another run of the layout software to make a design at some stupidly small pitch like 20 nm that would run at 1 GHz but would be a toaster oven due to the insulation leakage, and we all went OOOH we want that. Sounds like we're a good fraction of the way to that performance at this design pitch and it's not even a toaster oven. That's some serious exceedance of expectations there.

I got my credit card out and am waving it for when the new eval boards become available.

It makes me wonder what kind of speed we could really achieve at 20nm. I hope we get to find out.

I wonder where the next sweet spot is these days. eg 90nm, ~50nm ?

jmg · 2019-08-08 01:44

cgracey wrote: »

It makes me wonder what kind of speed we could really achieve at 20nm. I hope we get to find out.

P2 is already fairly 'topped out' at the speeds a processor pin in CMOS can manage.
The monster parts that are in 20nm and below, are all using GHz differential LVDS style connection schemes, and none of those are what could be called 'microcontrollers'.

msrobots · 2019-08-08 02:15

I am perfectly happy with 320Mhz, even 160. Mine is running 24/7 with whatever last prog I loaded, sometimes over Days when I do other work.

No glitch no fail, perfectly stable.

Programming the beast is challenging. one has to dig deep into chips way of thinking and then one ends up with some astonishing small amount of lines needed to do it. Simple serial ? a couple of lines, no library needed. Those smart pins are wonderful.

But with Parallax Style documentation this will be solved. Once you get it, it's quite easy, logical and understandable. I learn by coding it myself and so far I went thru SP's, Streamer and now I try to grook XBYTE and SKIPping.

Like the P1 was different from all others, the P2 is different from all others. So far I really like it.

Mike

Cluso99 · 2019-08-08 03:09

P1 assembler was a dream.

P2 assembler is a dream if you forget about all the other instructions to do some quirky faster things. I want those quirky faster things

evanh · 2019-08-08 03:35

cgracey wrote: »

evanh wrote: »

The 4-bit post divider is outside the closed loop of the PLL, right? I've been using that divider, by setting it to /2, to test the top frequency of the PLL of the v1 chip. The result I got was around 415 MHz at room temp.

Yes, the 4-bit post-VCO divider is outside of the PLL loop.

Good, so my testing proves that the v1 PLL can go faster than the synthesised core. I'm guessing the v2 PLL can go even faster, therefore the PLL is not the limiting factor for 390 MHz at room temp.

evanh · 2019-08-08 03:51

Yanomani wrote: »

Hi evanh
During the tests you've done, to determine the top frequency of the PLL, are you sure you were using a post divider of /2 (%PPPP = %0000) and not /1 (%PPPP = %1111, thus, post divider bypassed)?

I used XDIVP = 2 with cluso's macros - https://forums.parallax.com/discussion/comment/1452025/#Comment_1452025

IIRC (and also based on the linked post, from Chip) if one wants to exercise the top limits of the PLL, the post divider needs to be bypassed, hence the /1.

The XDIVP of 2 means the PLL internally is going double the frequency of the sysclock. Which means we can keep operating the chip at half the PLL's VCO frequency to see at what point the achieved sysclock frequency stops matching the calculated/specified frequency.

EDIT: Attached an example terminal capture, and the source that produced it.
EDIT2: Updated diag-init.spin2 to show sub-MHz sysclock setting. Example now using XDIVP = 4.
EDIT3: Fixed a bug in fractional comport divider calculation when at very low sysclock rates.

cheezus · 2019-08-08 04:17

msrobots wrote: »

Programming the beast is challenging. one has to dig deep into chips way of thinking and then one ends up with some astonishing small amount of lines needed to do it. Simple serial ? a couple of lines, no library needed. Those smart pins are wonderful.

Mike

This chip IS a beast! I've been doing most of my testing at 160mhz but often times I end up running RCFast just so I can get LA caps! Really exciting to see the power reduction in the respin. I would have been happy with the fixing the sign extension issue only, tbh.

I'd love to see what this thing could do at 90nm, although could it still be a 3v3 chip? It would be really nice, even if the pins are the limiting factor here. I remember there was some import of the custom ring of the chip, would that require an entire manual layout or could the shrink be done without completely redesigning the custom part?

I'm loving programming with fastspin and the inline assembly. P2asm is just wicked and I haven't even hardly touched most of the new features. Those smartpins, WOW. I loved the counters and now having 64 smartpins, instead of just 16 counters. AND FINALLY 64 PINS!!!

I'm slightly confused, chip has mentioned the iq modulator problem in rev1 chip and so I'm thinking all the pixel mixing instructions are borked? I'm about at a place where they'd come in handy!!

The P2 ?notUniPad? is coming along (slowly) and I can't wait till I'm adding radios and RTC and everything!

Tubular · 2019-08-08 04:20

The iq modulator has been fixed - see the color NTSC output for an example of it in use. Previously NTSC needed to be greyscale due to the iq modulator issue

The hardware encoder counting was also similarly affected, and has been fixed in this respin.

jmg · 2019-08-08 04:21

cheezus wrote: »

I'm slightly confused, chip has mentioned the iq modulator problem in rev1 chip and so I'm thinking all the pixel mixing instructions are borked? I'm about at a place where they'd come in handy!!

Yes, I think that was also sign-bug related, and fails in ES1 and is ok in ES2 ? ie if you want IQ, or quadcounting, you need ES2 - but they look shippable

cheezus · 2019-08-08 04:28

Tubular wrote: »

The iq modulator has been fixed - see the color NTSC output for an example of it in use. Previously NTSC needed to be greyscale due to the iq modulator issue

The hardware encoder counting was also similarly affected, and has been fixed in this respin.

I guess my question isn't properly stated, sorry. I'm wondering if the Pixel Mixer instructions work properly in REVa silicon? I haven't tried them and my understanding of video is poor tbh. I'm not even sure 100% how these work but was looking for a fun project to play around on while taking a break from debugging the ton of code I just poorly wrote, lol...

cgracey · 2019-08-08 04:44

cheezus wrote: »

Tubular wrote: »

The iq modulator has been fixed - see the color NTSC output for an example of it in use. Previously NTSC needed to be greyscale due to the iq modulator issue

The hardware encoder counting was also similarly affected, and has been fixed in this respin.

I guess my question isn't properly stated, sorry. I'm wondering if the Pixel Mixer instructions work properly in REVa silicon? I haven't tried them and my understanding of video is poor tbh. I'm not even sure 100% how these work but was looking for a fun project to play around on while taking a break from debugging the ton of code I just poorly wrote, lol...

Those instructions have always worked. They are not related to the old sign-extension problem.

Yanomani · 2019-08-08 05:00

evanh wrote: »

Yanomani wrote: »

Hi evanh
During the tests you've done, to determine the top frequency of the PLL, are you sure you were using a post divider of /2 (%PPPP = %0000) and not /1 (%PPPP = %1111, thus, post divider bypassed)?

I used XDIVP = 2 with cluso's macros - https://forums.parallax.com/discussion/comment/1452025/#Comment_1452025

IIRC (and also based on the linked post, from Chip) if one wants to exercise the top limits of the PLL, the post divider needs to be bypassed, hence the /1.

The XDIVP of 2 means the PLL internally is going double the frequency of the sysclock. Which means we can keep operating the chip at half the PLL's VCO frequency to see at what point the achieved sysclock frequency stops matching the calculated/specified frequency.

EDIT: Attached an example terminal capture, and the source that produced it.
EDIT2: Updated diag-init.spin2 to show sub-MHz sysclock setting. Example now using XDIVP = 4.

Hi evanh

Thanks for the explanation and linked Cluso's post; just now the coin has dropped, inside my head...

Unfortunately, the attached .zip was swallowed by forum's software. Passing the mouse cursor over it, says max-PLL.zip carries 10kB worth of information, but saving it resulted a 0-lenght file that can't be unziped.

cgracey · 2019-08-08 05:00

I feel like I've sufficiently checked out the new silicon and it is good.

Speed is way higher and power consumption is much lower than I anticipated. We've got all the old silicon bugs fixed and all the new enhancements seem to work fine. Also, our I/O pins seem to synchronously interface all the way to the top end at 390MHz at room temperature, which was quite unexpected. Maybe tightening their propagation times to within 1ns of each other helped them work way beyond the timing spec of 175MHz at 150C junction temperature.

The only problem I can find is this P16 high-to-float 250mV downward glitch, which is pretty minor.

We need to get every die packaged that we can from these initial six engineering wafers. There should be at least 1,000 and maybe up to 2,400 chips from them. We are waiting to hear from ON Semi about when that can happen. We will build an initial run of 400 P2 Eval boards with more, as needed. The full mask set is ready to go, and was used to make these six engineering wafers.

As we transition from "engineering" to "production" at On Semi, there are some testing details to work out, but that all looks very promising.

I keep wracking my brain, trying to figure out if there's anything else I need to be worrying about, at this point. I'll probably transition back to working on Spin2 tomorrow.

P.S. VonSzarvas has got the new P2 Eval board thoroughly tested for all kinds of operating and fault conditions. It is going to be a very solid platform for the new P2 silicon. Looks simple, too, without all the jumpers. We're ready to build those at Parallax as soon as we get Amkor-packaged parts from ON Semi.

New P2 Silicon

Comments