There was also that analog only test chip from way back... Presumably, that had this same problem and I don't remember Chip noticing this problem with them... Also, I think OnSemi looked over the layout and beefed up some things, but didn't see this problem...
There was also that analog only test chip from way back... Presumably, that had this same problem and I don't remember Chip noticing this problem with them... Also, I think OnSemi looked over the layout and beefed up some things, but didn't see this problem...
ON Semi has a lot of tools that run various checks on the design, but this n-well problem never got caught. When it comes to full-custom I/O, you're on your own, somewhat.
ON Semi confirmed the other day that the pad frame DID pass antenna checks, along with the rest of the chip.
That is more reassuring.
Now, how OnSemi plans to screen out the potentially problematic dies? Do they have a sure way to check this? It wouldn't be good to have samples blowing out in the wild, and taking PCBs with them (burnt and lifted pads, lifted traces, etc).
ON Semi confirmed the other day that the pad frame DID pass antenna checks, along with the rest of the chip.
That is more reassuring.
Now, how OnSemi plans to screen out the potentially problematic dies? Do they have a sure way to check this? It wouldn't be good to have samples blowing out in the wild, and taking PCBs with them (burnt and lifted pads, lifted traces, etc).
Kind regards, Samuel Lourenço
They need to set the VIO current limit to 70mA on their tester. Then they can test dies without worrying about burning anymore probe card pins.
PS: It was difficult to get accurate temperature readings at higher temperatures. I think convection flow on the top side was creating a gradient, so I roughly packed some insulation over the top side to improve the measurements.
PPS: The temperature measuring is really the bottom centre of the Eval board, not the thermal pad I said earlier.
EDIT: Here's photos of the test jig and added top side insulation
Chiller packs from the freezer. One on top, one underneath, like a sandwich, plus a rag wrapped around. For measurements above zero I'll lift the top one off to allow a gentle rise.
EDIT: Higher temps are with a hair dryer blowing on the underside - which is why it's upside down in the vice. I have to restrict the air flow to get it to the highest temps. Basically hold it so close I'm blocking the nozzle with the PCB.
Any speculation as to why more than 20 oC spread from /1 to /20? I would have put it down to differences in local hot spots or similar if it was only a couple of degrees, but 23 odd seems way too much for that.
Any speculation as to why more than 20 oC spread from /1 to /20? I would have put it down to differences in local hot spots or similar if it was only a couple of degrees, but 23 odd seems way too much for that.
I'm unclear on the question ?
/1 is running out of processor speed first, so it is quite different from the other /2 /4 /10 /20.
When you look at those, the difference is now not so great, and you would expect less-die-heat to favour higher MHz.
The delta here looks ~ 11'C, which is not large ?
Provided the PLL does not limit the core, then where the PLL finally fails, is mostly academic.
Those measurements do nicely show the PLL does not limit the core.
Chip needs to send you a P2 ES2 so you can repeat those measurements on that, tho I guess that should be a 'proper package' part
/1 is running out of processor speed first, so it is quite different from the other /2 /4 /10 /20.
Where the core is not running out, 380 MHz and below, it demonstrates the full 20+ degrees.
When you look at those, the difference is now not so great, and you would expect less-die-heat to favour higher MHz.
The delta here looks ~ 11'C, which is not large ?
Even that's enough to question what is happening.
Provided the PLL does not limit the core, then where the PLL finally fails, is mostly academic.
That was true before I found a difference between post-PLL dividers. Now, it looks like there is something post-PLL, in the source select mux or XDIVP divider, that is also a limit. Maybe THE limit, maybe the VCO inverters aren't the limit at all.
The PLL frequency does limit the core in most of those tests, just /1 below 30 oC is where the core logic caves first. Admittedly the most common config. On the whole, it does kind of agree with Chip's v2 silicon finding that with a small improvement, ie: v2 silicon, the core logic will outpace the PLL. Yeah, I'd love to measure /1 with higher PLL frequencies at lower temps with the v2 silicon. See it fill in the left column to prove this.
PS: I estimate for /1 at 390 MHz, if the core had held on one more degree I'd have seen the PLL lose lock first. Which might be very close to what Chip has observed with the v2 silicon. The two silicon versions may not far apart at all, ie: much closer than the XDIVP spreads.
The PLL frequency does limit the core in most of those tests, just /1 below 30 oC is where the core logic caves first. Admittedly the most common config. On the whole, it does kind of agree with Chip's v2 silicon finding that with a small improvement, ie: v2 silicon, the core logic will outpace the PLL. Yeah, I'd love to measure /1 with higher PLL frequencies at lower temps with the v2 silicon. See it fill in the left column to prove this.
PS: I estimate for /1 at 390 MHz, if the core had held on one more degree I'd have seen the PLL lose lock first. Which might be very close to what Chip has observed with the v2 silicon. The two silicon versions may not far apart at all, ie: much closer than the XDIVP spreads.
IIRC the P2 target MHz was nudged up, but not by a lot, at synthesis time.
With larger than /2, yes, of course the PLL will limit SysCLK MHz, I was meaning at /1, where the PLL is better than the core, but not by much - basically, PLL can run 9' hotter, when the core SysCLK is /2.
When I tested EFM8 parts, where the core speed specs 25MHz min, I was able to feed 200MHz into the XIN pin, and it worked fine in the divider, and provided I kept core <= 25MHz, the core ran too.
The logic in EFM8 is much simpler, it has a tapped binary divider, which could even be a ripple counter. It does show a large margin between simple-silicon-toggle and core (flash limited)
In P2 that margin is smaller, (as you say, it may be small enough to 'vanish' on the slightly better P2ES2)
P2 has longer counters in the PLL, and they are not ripple counters but /N, up to 10 bits, but the core needs 32 bit counters and adders, so you would still expect 10b to 32b carry chain differences.
Did you see any signs of a failing VCO-divider (ie big jumps in timing) or it is just the VCO itself 'tops out' roughly 10MHz above the core ?
It depends on the temperature. At higher temperatures, with XDIVP = 1, the PLL still tops out lower frequency than the core.
I have no idea what, when or how a big jump might be or look like. One detail I did note: The simplicity of the program's toggle loop meant the program can continue running substantially above the normally viable max sysclock. ie: The HUBSET instruction crashes without something like a 20 oC cooler margin below the loop crash frequency.
It depends on the temperature. At higher temperatures, with XDIVP = 1, the PLL still tops out lower frequency than the core.
I have no idea what, when or how a big jump might be or look like. One detail I did note: The simplicity of the program's toggle loop meant the program can continue running substantially above the normally viable max sysclock. ie: The HUBSET instruction crashes without something like a 20 oC cooler margin below the loop crash frequency.
That's probably due to initial frequency overshoot as the PLL seeks lock. If you were to approach the target frequency in maybe 1MHz increments, allowing 100us per each 1MHz adjustment, you could probably get there without needing the temperature overhead.
I'm pretty certain it'll be on source select second HUBSET, after the 100 ms RCFAST pause.
Ah, yes. What was I thinking?
I don't know why you would need the 20 C temp overhead, though I think I've witnessed the same thing.
For what it's worth, could you step up in 1MHz increments, without leaving PLL-select mode? I don't know if that should make a difference, but in case something like an initial fast pulse is extra stressful, it could maybe close the temperature gap.
Tried both 1 MHz steps with a huge 500 ms RCFAST and also 1 MHz steps without any RCFAST source switching. With RCFAST at 25 oC it fails at 387 MHz. Without any switching at 25 oC it fails at 383 MHz.
EDIT: Here's the source (Configured as without source switching):
con
XDIV = 20
dat org
call #sysclksetfirst
loop
call #sysclkset
mov pa, #500
toggle
drvnot #0 'toggle P0 at 1/1000th the clock frequency
waitx #500-8
djnz pa, #toggle
ijnz xtalmul, #loop
jmp #$
xtalmul long 350
clk_mode long 0
sysclkset
' andn clk_mode, #%11 'clear the two select bits to force RCFAST selection
' hubset clk_mode '**IMPORTANT** Switches to RCFAST using known prior mode
mov clk_mode, xtalmul 'replace old with new ...
sub clk_mode, #1 'range 1-1024
shl clk_mode, #8
or clk_mode, ##(1<<24 + (XDIV-1)<<18 + %1111_1000)
' hubset clk_mode 'setup PLL mode for new frequency (still operating at RCFAST)
or clk_mode, #%11 'add PLL as the clock source select
' waitx ##22_000_000/2 '~500ms (at RCFAST) for PLL to stabilise and scope to capture each burst
waitx ##350_000_000/2 '~500ms for scope to capture each burst
_ret_ hubset clk_mode 'engage! Switch back to newly set PLL
sysclksetfirst
mov clk_mode, xtalmul 'replace old with new ...
sub clk_mode, #1 'range 1-1024
shl clk_mode, #8
or clk_mode, ##(1<<24 + (XDIV-1)<<18 + %1111_1000)
hubset clk_mode 'setup PLL mode for new frequency (still operating at RCFAST)
or clk_mode, #%11 'add PLL as the clock source select
waitx ##22_000_000/100 '~10ms (at RCFAST) for PLL to stabilise
_ret_ hubset clk_mode 'engage! Switch back to newly set PLL
EDIT: Added back in a commented piece for the RCFAST switching method.
When we were doing the dry ice testing with 8 cogs spinning Fozzie on P2D2, we also had a /1000 output being measured with the multimeter's frequency mode.
After a while of self heating the PLL started to slip back a bit, maybe to somewhere around 368 MHz, but the video kept doing its thing regardless, so it wasn't immediately obvious the PLL was slipping.
Anyway just mentioning this as a potential alternative test strategy to finding where the PLL tops out, without having to do small increments and/or wait for thermal equilibrium. It seemed to be just related to the temperature - you could blow on it and the MHz would increase
After a while of self heating the PLL started to slip back a bit ...
That pretty much is the basis of this testing, just with variations for certain tests.
PS: I hadn't paid much attention to the behaviour until I noticed that the post-PLL divider (XDIVP) impacted it. I didn't expect that to make a diff.
EDIT: Eg: Here's the code used for building all the /10 column in that table. All I had to do was control the temperature and note down the temperature when the kHz freq crossed each target.
dat org
' /2 x44 /10 (20 MHz crystal, 440 MHz PLL, 44 MHz sysclock)
hubset ##%1_000001_0000101011_0100_10_00
waitx ##20_000_000/10
hubset ##%1_000001_0000101011_0100_10_11
loop drvnot #0 'toggle P0 at 1/1000th the PLL frequency (1/100th clock frequency)
waitx #50-8
jmp #loop
It looks like with the higher temps, where I need to use the hair dryer, I'm getting two or three degrees of variation depending on how I'm holding the hair dryer.
Letting the full flow cover the whole board, bouncing off in all directions, seems to require going to a higher temperature reading to bring the PLL frequency reading down to target. Restricting the flow or directing it across the PCB in one direction seems to have the opposite effect.
I thought it was a convection issue yesterday. But it hasn't made any diff with the insulated pad covering the topside. I find myself constantly tweaking the numbers.
They are not completely sure about the nature of the problem, though the latch-up theory seems to be likely true.
We are going to have another meeting next week.
Meanwhile, I've asked them if they could limit their VIO tester current to 70mA and see how many dies they can get out of the first few wafers. I explained that we really need a few hundred, if possible, to update our customers/developers. We need to keep momentum.
I really want to get the new silicon to whoever wants it here because it is just way better than the first version.
They sent me some curve traces to check, though I'm not clear, yet, on what they represent. Here they are:
Sorry, I don't like it...
We got the first batch without this tester, right? And, seems like just one or two out of a hundred were bad, right?
And, maybe playing with the power supply jumpers was the issue...
I agree you should try to get chips out of them. I think something is wrong with their tester...
Sorry, I don't like it...
We got the first batch without this tester, right? And, seems like just one or two out of a hundred were bad, right?
And, maybe playing with the power supply jumpers was the issue...
I agree you should try to get chips out of them. I think something is wrong with their tester...
Yes, it is frustrating. I've been really skeptical of the tester. I'd like to know what the chip is experiencing on the tester, from start to finish. If the unstoppable-latch-up theory is true, the tester may be fine.
Next week, we are going to build seven new P2 Eval boards with our seven remaining v2 chips. I'm hoping most of them work.
Can't enlarge images enough to distinguish much useful information about X and Y axis...
It seems that OnSemi did made some of its homework, and ever atempted to power VIO-fed nodes while VDD was kept at 0 V, and, in another case, they ramped-up VIO, while VDD was kept floating.
There is another plot, showing VDD limited to 2.5 V
Is the above a reasonable description of what the graphs are showing?
Can't enlarge images enough to distinguish much useful information about X and Y axis...
It seems that OnSemi did made some of its homework, and ever atempted to power VIO-fed nodes while VDD was kept at 0 V, and, in another case, they ramped-up VIO, while VDD was kept floating.
There is another plot, showing VDD limited to 2.5 V
Is the above a reasonable description of what the graphs are showing?
They're semiconductor curve trace plots, they include the negative region
I've always thought the P2 would make a neat curve tracer, with the ability of its ADCs to read a bit above and below the rails.
It'd be kind of neat to get the prop to generate those same curves, code up P2ES1 to check another P2ESx under test, while plotting the results to a monitor
Meanwhile, I've asked them if they could limit their VIO tester current to 70mA and see how many dies they can get out of the first few wafers. I explained that we really need a few hundred, if possible, to update our customers/developers. We need to keep momentum.
I really want to get the new silicon to whoever wants it here because it is just way better than the first version.
They sent me some curve traces to check, though I'm not clear, yet, on what they represent. Here they are:
Are these with lower current limiting enabled ? To what level ?
Have they given any ESD/latch up test results yet for ES1 and ES2 ?
Injecting currents into clamp diodes, while watching IDD, IIO on a I limited supply with minimal Cd, should show where latch up triggers.
re ordered to pair
a Vdd 1pt8V_Sweep VIO to 3pt5V.PNG :: IIO = 831pA At 1.8V (~50%)
b VIO swept to 2pt5V_Vdd 1.8V.PNG :: IIO = 275uA at 2.475V
c Vdd open_Sweep VIO to 3pt5V.PNG :: IIO = 4.75mA at 1.8V, IIO ~ 10mA at 3v5
d VIO swept to 2pt5V_Vdd float.png :: IIO = 6.74mA at 2.475V
e Virgin Part_VIO 3V_Vdd swept to 1.8V.PNG :: Idd = 20mA at 1v8
The a,b) seem fine, and show a very low IIO at partial Vio, and a tolerable IIO at 2.475V
c,d) are one supply floating, so are of limited info, but seem to show a lateral powering effect, or is it just skewed biasing ? (currents here are >> a,b)
If there is lateral current of some mA, than floating Vdd may not be a great idea, as it could float above 1.8V which could be fatal.
e) is unclear, if that is the SAME silicon as a,b, or a different part. e) is the only plot that gives Idd, but I'd expect static Idd on a 'good' part to be uA not mA.
What is the RCFAST current and does RCFAST run in their test mode ? (was this test run in Test or normal, and was RST active ?)
Missing is Idd info (Vdd=1.8V) as VIO sweeps up.
Idd in e) starts to ramp just above 200mV and hits a very nicely regulated ~20mA at Vdd >= 400mV thru to 1.8V
What does a P2 draw when RST = active, and RCFAST is running ?
The very flat / regulated 20mA is rather a surprise, as any switching current should be Vdd proportional
Id = Cpd * Vcc * Fi + Istatic, but if there is no switching (RCFAST = OFF) then that 20mA seems high ?
Next week, we are going to build seven new P2 Eval boards with our seven remaining v2 chips. I'm hoping most of them work.
Maybe build 5 and send 2 (or 3) to Peter ? - so he can mount on P2D2 - then that brings another supply pairing into the mix, as P2D2 does not use the same SMPS / Linear regs.
Comments
ON Semi has a lot of tools that run various checks on the design, but this n-well problem never got caught. When it comes to full-custom I/O, you're on your own, somewhat.
Now, how OnSemi plans to screen out the potentially problematic dies? Do they have a sure way to check this? It wouldn't be good to have samples blowing out in the wild, and taking PCBs with them (burnt and lifted pads, lifted traces, etc).
Kind regards, Samuel Lourenço
They need to set the VIO current limit to 70mA on their tester. Then they can test dies without worrying about burning anymore probe card pins.
Kind regards, Samuel Lourenço
I noticed that I was getting differences between differing XDIVP values so I made a table of them:
PPS: The temperature measuring is really the bottom centre of the Eval board, not the thermal pad I said earlier.
EDIT: Here's photos of the test jig and added top side insulation
EDIT: Higher temps are with a hair dryer blowing on the underside - which is why it's upside down in the vice. I have to restrict the air flow to get it to the highest temps. Basically hold it so close I'm blocking the nozzle with the PCB.
I'm unclear on the question ?
/1 is running out of processor speed first, so it is quite different from the other /2 /4 /10 /20.
When you look at those, the difference is now not so great, and you would expect less-die-heat to favour higher MHz.
The delta here looks ~ 11'C, which is not large ?
Provided the PLL does not limit the core, then where the PLL finally fails, is mostly academic.
Those measurements do nicely show the PLL does not limit the core.
Chip needs to send you a P2 ES2 so you can repeat those measurements on that, tho I guess that should be a 'proper package' part
Even that's enough to question what is happening.
That was true before I found a difference between post-PLL dividers. Now, it looks like there is something post-PLL, in the source select mux or XDIVP divider, that is also a limit. Maybe THE limit, maybe the VCO inverters aren't the limit at all.
The PLL frequency does limit the core in most of those tests, just /1 below 30 oC is where the core logic caves first. Admittedly the most common config. On the whole, it does kind of agree with Chip's v2 silicon finding that with a small improvement, ie: v2 silicon, the core logic will outpace the PLL. Yeah, I'd love to measure /1 with higher PLL frequencies at lower temps with the v2 silicon. See it fill in the left column to prove this.
PS: I estimate for /1 at 390 MHz, if the core had held on one more degree I'd have seen the PLL lose lock first. Which might be very close to what Chip has observed with the v2 silicon. The two silicon versions may not far apart at all, ie: much closer than the XDIVP spreads.
With larger than /2, yes, of course the PLL will limit SysCLK MHz, I was meaning at /1, where the PLL is better than the core, but not by much - basically, PLL can run 9' hotter, when the core SysCLK is /2.
When I tested EFM8 parts, where the core speed specs 25MHz min, I was able to feed 200MHz into the XIN pin, and it worked fine in the divider, and provided I kept core <= 25MHz, the core ran too.
The logic in EFM8 is much simpler, it has a tapped binary divider, which could even be a ripple counter. It does show a large margin between simple-silicon-toggle and core (flash limited)
In P2 that margin is smaller, (as you say, it may be small enough to 'vanish' on the slightly better P2ES2)
P2 has longer counters in the PLL, and they are not ripple counters but /N, up to 10 bits, but the core needs 32 bit counters and adders, so you would still expect 10b to 32b carry chain differences.
Did you see any signs of a failing VCO-divider (ie big jumps in timing) or it is just the VCO itself 'tops out' roughly 10MHz above the core ?
I have no idea what, when or how a big jump might be or look like. One detail I did note: The simplicity of the program's toggle loop meant the program can continue running substantially above the normally viable max sysclock. ie: The HUBSET instruction crashes without something like a 20 oC cooler margin below the loop crash frequency.
That's probably due to initial frequency overshoot as the PLL seeks lock. If you were to approach the target frequency in maybe 1MHz increments, allowing 100us per each 1MHz adjustment, you could probably get there without needing the temperature overhead.
Ah, yes. What was I thinking?
I don't know why you would need the 20 C temp overhead, though I think I've witnessed the same thing.
For what it's worth, could you step up in 1MHz increments, without leaving PLL-select mode? I don't know if that should make a difference, but in case something like an initial fast pulse is extra stressful, it could maybe close the temperature gap.
EDIT: Here's the source (Configured as without source switching):
EDIT: Added back in a commented piece for the RCFAST switching method.
After a while of self heating the PLL started to slip back a bit, maybe to somewhere around 368 MHz, but the video kept doing its thing regardless, so it wasn't immediately obvious the PLL was slipping.
Anyway just mentioning this as a potential alternative test strategy to finding where the PLL tops out, without having to do small increments and/or wait for thermal equilibrium. It seemed to be just related to the temperature - you could blow on it and the MHz would increase
Thanks for testing this stuff Evanh
PS: I hadn't paid much attention to the behaviour until I noticed that the post-PLL divider (XDIVP) impacted it. I didn't expect that to make a diff.
EDIT: Eg: Here's the code used for building all the /10 column in that table. All I had to do was control the temperature and note down the temperature when the kHz freq crossed each target.
Letting the full flow cover the whole board, bouncing off in all directions, seems to require going to a higher temperature reading to bring the PLL frequency reading down to target. Restricting the flow or directing it across the PCB in one direction seems to have the opposite effect.
I thought it was a convection issue yesterday. But it hasn't made any diff with the insulated pad covering the topside. I find myself constantly tweaking the numbers.
They are not completely sure about the nature of the problem, though the latch-up theory seems to be likely true.
We are going to have another meeting next week.
Meanwhile, I've asked them if they could limit their VIO tester current to 70mA and see how many dies they can get out of the first few wafers. I explained that we really need a few hundred, if possible, to update our customers/developers. We need to keep momentum.
I really want to get the new silicon to whoever wants it here because it is just way better than the first version.
They sent me some curve traces to check, though I'm not clear, yet, on what they represent. Here they are:
We got the first batch without this tester, right? And, seems like just one or two out of a hundred were bad, right?
And, maybe playing with the power supply jumpers was the issue...
I agree you should try to get chips out of them. I think something is wrong with their tester...
Yes, it is frustrating. I've been really skeptical of the tester. I'd like to know what the chip is experiencing on the tester, from start to finish. If the unstoppable-latch-up theory is true, the tester may be fine.
Next week, we are going to build seven new P2 Eval boards with our seven remaining v2 chips. I'm hoping most of them work.
Can't enlarge images enough to distinguish much useful information about X and Y axis...
It seems that OnSemi did made some of its homework, and ever atempted to power VIO-fed nodes while VDD was kept at 0 V, and, in another case, they ramped-up VIO, while VDD was kept floating.
There is another plot, showing VDD limited to 2.5 V
Is the above a reasonable description of what the graphs are showing?
That sounds right.
I've always thought the P2 would make a neat curve tracer, with the ability of its ADCs to read a bit above and below the rails.
It'd be kind of neat to get the prop to generate those same curves, code up P2ES1 to check another P2ESx under test, while plotting the results to a monitor
Are these with lower current limiting enabled ? To what level ?
Have they given any ESD/latch up test results yet for ES1 and ES2 ?
Injecting currents into clamp diodes, while watching IDD, IIO on a I limited supply with minimal Cd, should show where latch up triggers.
The a,b) seem fine, and show a very low IIO at partial Vio, and a tolerable IIO at 2.475V
c,d) are one supply floating, so are of limited info, but seem to show a lateral powering effect, or is it just skewed biasing ? (currents here are >> a,b)
If there is lateral current of some mA, than floating Vdd may not be a great idea, as it could float above 1.8V which could be fatal.
e) is unclear, if that is the SAME silicon as a,b, or a different part. e) is the only plot that gives Idd, but I'd expect static Idd on a 'good' part to be uA not mA.
What is the RCFAST current and does RCFAST run in their test mode ? (was this test run in Test or normal, and was RST active ?)
Missing is Idd info (Vdd=1.8V) as VIO sweeps up.
Idd in e) starts to ramp just above 200mV and hits a very nicely regulated ~20mA at Vdd >= 400mV thru to 1.8V
What does a P2 draw when RST = active, and RCFAST is running ?
The very flat / regulated 20mA is rather a surprise, as any switching current should be Vdd proportional
Id = Cpd * Vcc * Fi + Istatic, but if there is no switching (RCFAST = OFF) then that 20mA seems high ?
Maybe build 5 and send 2 (or 3) to Peter ? - so he can mount on P2D2 - then that brings another supply pairing into the mix, as P2D2 does not use the same SMPS / Linear regs.