Automated Testing of P2's

Rayman · 2019-08-29 13:09

There was also that analog only test chip from way back... Presumably, that had this same problem and I don't remember Chip noticing this problem with them... Also, I think OnSemi looked over the layout and beefed up some things, but didn't see this problem...

cgracey · 2019-08-29 13:21

Rayman wrote: »

There was also that analog only test chip from way back... Presumably, that had this same problem and I don't remember Chip noticing this problem with them... Also, I think OnSemi looked over the layout and beefed up some things, but didn't see this problem...

ON Semi has a lot of tools that run various checks on the design, but this n-well problem never got caught. When it comes to full-custom I/O, you're on your own, somewhat.

cgracey · 2019-08-29 13:23

ON Semi confirmed the other day that the pad frame DID pass antenna checks, along with the rest of the chip.

samuell · 2019-08-29 14:19

cgracey wrote: »

ON Semi confirmed the other day that the pad frame DID pass antenna checks, along with the rest of the chip.

That is more reassuring.

Now, how OnSemi plans to screen out the potentially problematic dies? Do they have a sure way to check this? It wouldn't be good to have samples blowing out in the wild, and taking PCBs with them (burnt and lifted pads, lifted traces, etc).

Kind regards, Samuel Lourenço

cgracey · 2019-08-29 14:32

samuell wrote: »

cgracey wrote: »

ON Semi confirmed the other day that the pad frame DID pass antenna checks, along with the rest of the chip.

That is more reassuring.

Now, how OnSemi plans to screen out the potentially problematic dies? Do they have a sure way to check this? It wouldn't be good to have samples blowing out in the wild, and taking PCBs with them (burnt and lifted pads, lifted traces, etc).

Kind regards, Samuel Lourenço

They need to set the VIO current limit to 70mA on their tester. Then they can test dies without worrying about burning anymore probe card pins.

samuell · 2019-08-29 15:13

Well, 70mA seems to be a good value. They will still be able to detect latch up.

Kind regards, Samuel Lourenço

evanh · 2019-08-29 15:16

Chip,
I noticed that I was getting differences between differing XDIVP values so I made a table of them:

Measured PLL frequency limits as a function of
temperature and post-PLL divider (XDIVP)
===============================================
PLL freq| Temperatures (oC) at XDIVP dividers
 (MHz)  |   /1     /2     /4    /10    /20
--------+--------------------------------------
  430   |                        -3     -2
  420   |          -1      4      9     10
  410   |          11     16     20     21
  400   |    x     23     28     33     34
  390   |    *     36     43     46     47
  380   |   39     48     57     60     61
  370   |   54     64     71     76     77
  360   |   67     78     83
  350   |   83     92

 x = Cog crashed outright at inital -4 oC
 * = Cog crashed at 23 oC (PLL held lock)

evanh · 2019-08-29 15:22

PS: It was difficult to get accurate temperature readings at higher temperatures. I think convection flow on the top side was creating a gradient, so I roughly packed some insulation over the top side to improve the measurements.

PPS: The temperature measuring is really the bottom centre of the Eval board, not the thermal pad I said earlier.

EDIT: Here's photos of the test jig and added top side insulation

cgracey · 2019-08-29 16:04

Evanh, Are you chilling the chip somehow? I see some really low oC numbers.

evanh · 2019-08-29 16:18

Chiller packs from the freezer. One on top, one underneath, like a sandwich, plus a rag wrapped around. For measurements above zero I'll lift the top one off to allow a gentle rise.

EDIT: Higher temps are with a hair dryer blowing on the underside - which is why it's upside down in the vice. I have to restrict the air flow to get it to the highest temps. Basically hold it so close I'm blocking the nozzle with the PCB.

evanh · 2019-08-30 00:06

Any speculation as to why more than 20 oC spread from /1 to /20? I would have put it down to differences in local hot spots or similar if it was only a couple of degrees, but 23 odd seems way too much for that.

jmg · 2019-08-30 00:19

evanh wrote: »

Any speculation as to why more than 20 oC spread from /1 to /20? I would have put it down to differences in local hot spots or similar if it was only a couple of degrees, but 23 odd seems way too much for that.

I'm unclear on the question ?
/1 is running out of processor speed first, so it is quite different from the other /2 /4 /10 /20.
When you look at those, the difference is now not so great, and you would expect less-die-heat to favour higher MHz.
The delta here looks ~ 11'C, which is not large ?

Provided the PLL does not limit the core, then where the PLL finally fails, is mostly academic.
Those measurements do nicely show the PLL does not limit the core.

Chip needs to send you a P2 ES2 so you can repeat those measurements on that, tho I guess that should be a 'proper package' part

evanh · 2019-08-30 00:45

jmg wrote: »

/1 is running out of processor speed first, so it is quite different from the other /2 /4 /10 /20.

Where the core is not running out, 380 MHz and below, it demonstrates the full 20+ degrees.

When you look at those, the difference is now not so great, and you would expect less-die-heat to favour higher MHz.
The delta here looks ~ 11'C, which is not large ?

Even that's enough to question what is happening.

Provided the PLL does not limit the core, then where the PLL finally fails, is mostly academic.

That was true before I found a difference between post-PLL dividers. Now, it looks like there is something post-PLL, in the source select mux or XDIVP divider, that is also a limit. Maybe THE limit, maybe the VCO inverters aren't the limit at all.

The PLL frequency does limit the core in most of those tests, just /1 below 30 oC is where the core logic caves first. Admittedly the most common config. On the whole, it does kind of agree with Chip's v2 silicon finding that with a small improvement, ie: v2 silicon, the core logic will outpace the PLL. Yeah, I'd love to measure /1 with higher PLL frequencies at lower temps with the v2 silicon. See it fill in the left column to prove this.

PS: I estimate for /1 at 390 MHz, if the core had held on one more degree I'd have seen the PLL lose lock first. Which might be very close to what Chip has observed with the v2 silicon. The two silicon versions may not far apart at all, ie: much closer than the XDIVP spreads.

jmg · 2019-08-30 01:16

evanh wrote: »

The PLL frequency does limit the core in most of those tests, just /1 below 30 oC is where the core logic caves first. Admittedly the most common config. On the whole, it does kind of agree with Chip's v2 silicon finding that with a small improvement, ie: v2 silicon, the core logic will outpace the PLL. Yeah, I'd love to measure /1 with higher PLL frequencies at lower temps with the v2 silicon. See it fill in the left column to prove this.

PS: I estimate for /1 at 390 MHz, if the core had held on one more degree I'd have seen the PLL lose lock first. Which might be very close to what Chip has observed with the v2 silicon. The two silicon versions may not far apart at all, ie: much closer than the XDIVP spreads.

IIRC the P2 target MHz was nudged up, but not by a lot, at synthesis time.
With larger than /2, yes, of course the PLL will limit SysCLK MHz, I was meaning at /1, where the PLL is better than the core, but not by much - basically, PLL can run 9' hotter, when the core SysCLK is /2.

When I tested EFM8 parts, where the core speed specs 25MHz min, I was able to feed 200MHz into the XIN pin, and it worked fine in the divider, and provided I kept core <= 25MHz, the core ran too.
The logic in EFM8 is much simpler, it has a tapped binary divider, which could even be a ripple counter. It does show a large margin between simple-silicon-toggle and core (flash limited)

In P2 that margin is smaller, (as you say, it may be small enough to 'vanish' on the slightly better P2ES2)
P2 has longer counters in the PLL, and they are not ripple counters but /N, up to 10 bits, but the core needs 32 bit counters and adders, so you would still expect 10b to 32b carry chain differences.

Did you see any signs of a failing VCO-divider (ie big jumps in timing) or it is just the VCO itself 'tops out' roughly 10MHz above the core ?

evanh · 2019-08-30 05:08

It depends on the temperature. At higher temperatures, with XDIVP = 1, the PLL still tops out lower frequency than the core.

I have no idea what, when or how a big jump might be or look like. One detail I did note: The simplicity of the program's toggle loop meant the program can continue running substantially above the normally viable max sysclock. ie: The HUBSET instruction crashes without something like a 20 oC cooler margin below the loop crash frequency.

cgracey · 2019-08-30 05:18

evanh wrote: »

It depends on the temperature. At higher temperatures, with XDIVP = 1, the PLL still tops out lower frequency than the core.

I have no idea what, when or how a big jump might be or look like. One detail I did note: The simplicity of the program's toggle loop meant the program can continue running substantially above the normally viable max sysclock. ie: The HUBSET instruction crashes without something like a 20 oC cooler margin below the loop crash frequency.

That's probably due to initial frequency overshoot as the PLL seeks lock. If you were to approach the target frequency in maybe 1MHz increments, allowing 100us per each 1MHz adjustment, you could probably get there without needing the temperature overhead.

evanh · 2019-08-30 05:33

There is 100 ms RCFAST pause. Are you saying the PLL may not settle in that amount if running on the edge?

cgracey · 2019-08-30 05:39

evanh wrote: »

I'm pretty certain it'll be on source select second HUBSET, after the 100 ms RCFAST pause.

Ah, yes. What was I thinking?

I don't know why you would need the 20 C temp overhead, though I think I've witnessed the same thing.

For what it's worth, could you step up in 1MHz increments, without leaving PLL-select mode? I don't know if that should make a difference, but in case something like an initial fast pulse is extra stressful, it could maybe close the temperature gap.

evanh · 2019-08-30 06:18

Tried both 1 MHz steps with a huge 500 ms RCFAST and also 1 MHz steps without any RCFAST source switching. With RCFAST at 25 oC it fails at 387 MHz. Without any switching at 25 oC it fails at 383 MHz.

EDIT: Here's the source (Configured as without source switching):

con
	XDIV = 20


dat	org

	call	#sysclksetfirst
loop
	call	#sysclkset

	mov	pa, #500
toggle
	drvnot	#0		'toggle P0 at 1/1000th the clock frequency
	waitx	#500-8
	djnz	pa, #toggle

	ijnz	xtalmul, #loop
	jmp	#$



xtalmul		long	350
clk_mode	long	0



sysclkset
'		andn	clk_mode, #%11			'clear the two select bits to force RCFAST selection
'		hubset	clk_mode			'**IMPORTANT**  Switches to RCFAST using known prior mode

		mov	clk_mode, xtalmul		'replace old with new ...
		sub	clk_mode, #1			'range 1-1024
		shl	clk_mode, #8
		or	clk_mode, ##(1<<24 + (XDIV-1)<<18 + %1111_1000)
'		hubset	clk_mode			'setup PLL mode for new frequency (still operating at RCFAST)

		or	clk_mode, #%11			'add PLL as the clock source select
'		waitx	##22_000_000/2			'~500ms (at RCFAST) for PLL to stabilise and scope to capture each burst
		waitx	##350_000_000/2			'~500ms for scope to capture each burst
	_ret_	hubset	clk_mode			'engage!  Switch back to newly set PLL


sysclksetfirst
		mov	clk_mode, xtalmul		'replace old with new ...
		sub	clk_mode, #1			'range 1-1024
		shl	clk_mode, #8
		or	clk_mode, ##(1<<24 + (XDIV-1)<<18 + %1111_1000)
		hubset	clk_mode			'setup PLL mode for new frequency (still operating at RCFAST)

		or	clk_mode, #%11			'add PLL as the clock source select
		waitx	##22_000_000/100		'~10ms (at RCFAST) for PLL to stabilise
	_ret_	hubset	clk_mode			'engage!  Switch back to newly set PLL

EDIT: Added back in a commented piece for the RCFAST switching method.

Tubular · 2019-08-30 06:24

When we were doing the dry ice testing with 8 cogs spinning Fozzie on P2D2, we also had a /1000 output being measured with the multimeter's frequency mode.

After a while of self heating the PLL started to slip back a bit, maybe to somewhere around 368 MHz, but the video kept doing its thing regardless, so it wasn't immediately obvious the PLL was slipping.

Anyway just mentioning this as a potential alternative test strategy to finding where the PLL tops out, without having to do small increments and/or wait for thermal equilibrium. It seemed to be just related to the temperature - you could blow on it and the MHz would increase

Thanks for testing this stuff Evanh

evanh · 2019-08-30 06:31

Tubular wrote: »

After a while of self heating the PLL started to slip back a bit ...

That pretty much is the basis of this testing, just with variations for certain tests.

PS: I hadn't paid much attention to the behaviour until I noticed that the post-PLL divider (XDIVP) impacted it. I didn't expect that to make a diff.

EDIT: Eg: Here's the code used for building all the /10 column in that table. All I had to do was control the temperature and note down the temperature when the kHz freq crossed each target.

dat	org

'		         /2        x44  /10       (20 MHz crystal, 440 MHz PLL, 44 MHz sysclock)
	hubset	##%1_000001_0000101011_0100_10_00
	waitx	##20_000_000/10
	hubset	##%1_000001_0000101011_0100_10_11

loop	drvnot	#0		'toggle P0 at 1/1000th the PLL frequency (1/100th clock frequency)
	waitx	#50-8
	jmp	#loop

Tubular · 2019-08-30 07:24

Ok, neat. Yes we never looked at any post divide, only 1. Good to know this stuff

evanh · 2019-08-30 10:40

It looks like with the higher temps, where I need to use the hair dryer, I'm getting two or three degrees of variation depending on how I'm holding the hair dryer.

Letting the full flow cover the whole board, bouncing off in all directions, seems to require going to a higher temperature reading to bring the PLL frequency reading down to target. Restricting the flow or directing it across the PCB in one direction seems to have the opposite effect.

I thought it was a convection issue yesterday. But it hasn't made any diff with the insulated pad covering the topside. I find myself constantly tweaking the numbers.

cgracey · 2019-08-31 01:32

We had a conference call with ON Semi today.

They are not completely sure about the nature of the problem, though the latch-up theory seems to be likely true.

We are going to have another meeting next week.

Meanwhile, I've asked them if they could limit their VIO tester current to 70mA and see how many dies they can get out of the first few wafers. I explained that we really need a few hundred, if possible, to update our customers/developers. We need to keep momentum.

I really want to get the new silicon to whoever wants it here because it is just way better than the first version.

They sent me some curve traces to check, though I'm not clear, yet, on what they represent. Here they are:

Rayman · 2019-08-31 01:43

Sorry, I don't like it...
We got the first batch without this tester, right? And, seems like just one or two out of a hundred were bad, right?
And, maybe playing with the power supply jumpers was the issue...

I agree you should try to get chips out of them. I think something is wrong with their tester...

cgracey · 2019-08-31 01:58

Rayman wrote: »

Sorry, I don't like it...
We got the first batch without this tester, right? And, seems like just one or two out of a hundred were bad, right?
And, maybe playing with the power supply jumpers was the issue...

I agree you should try to get chips out of them. I think something is wrong with their tester...

Yes, it is frustrating. I've been really skeptical of the tester. I'd like to know what the chip is experiencing on the tester, from start to finish. If the unstoppable-latch-up theory is true, the tester may be fine.

Next week, we are going to build seven new P2 Eval boards with our seven remaining v2 chips. I'm hoping most of them work.

Yanomani · 2019-08-31 02:19

That pesky Android thing in my hands...

Can't enlarge images enough to distinguish much useful information about X and Y axis...

It seems that OnSemi did made some of its homework, and ever atempted to power VIO-fed nodes while VDD was kept at 0 V, and, in another case, they ramped-up VIO, while VDD was kept floating.

There is another plot, showing VDD limited to 2.5 V

Is the above a reasonable description of what the graphs are showing?

cgracey · 2019-08-31 03:04

Yanomani wrote: »

That pesky Android thing in my hands...

Can't enlarge images enough to distinguish much useful information about X and Y axis...

It seems that OnSemi did made some of its homework, and ever atempted to power VIO-fed nodes while VDD was kept at 0 V, and, in another case, they ramped-up VIO, while VDD was kept floating.

There is another plot, showing VDD limited to 2.5 V

Is the above a reasonable description of what the graphs are showing?

That sounds right.

Tubular · 2019-08-31 03:34

They're semiconductor curve trace plots, they include the negative region
I've always thought the P2 would make a neat curve tracer, with the ability of its ADCs to read a bit above and below the rails.
It'd be kind of neat to get the prop to generate those same curves, code up P2ES1 to check another P2ESx under test, while plotting the results to a monitor

jmg · 2019-08-31 06:46

cgracey wrote: »

Meanwhile, I've asked them if they could limit their VIO tester current to 70mA and see how many dies they can get out of the first few wafers. I explained that we really need a few hundred, if possible, to update our customers/developers. We need to keep momentum.

Very true.

cgracey wrote: »

I really want to get the new silicon to whoever wants it here because it is just way better than the first version.

They sent me some curve traces to check, though I'm not clear, yet, on what they represent. Here they are:

Are these with lower current limiting enabled ? To what level ?

Have they given any ESD/latch up test results yet for ES1 and ES2 ?
Injecting currents into clamp diodes, while watching IDD, IIO on a I limited supply with minimal Cd, should show where latch up triggers.

re ordered to pair 
a Vdd 1pt8V_Sweep VIO to 3pt5V.PNG   ::  IIO = 831pA At 1.8V (~50%)
b VIO swept to 2pt5V_Vdd 1.8V.PNG    ::  IIO = 275uA at 2.475V

c Vdd open_Sweep VIO to 3pt5V.PNG   :: IIO = 4.75mA at 1.8V,  IIO ~ 10mA at 3v5
d VIO swept to 2pt5V_Vdd float.png  :: IIO = 6.74mA at 2.475V

e Virgin Part_VIO 3V_Vdd swept to 1.8V.PNG :: Idd = 20mA at 1v8

The a,b) seem fine, and show a very low IIO at partial Vio, and a tolerable IIO at 2.475V

c,d) are one supply floating, so are of limited info, but seem to show a lateral powering effect, or is it just skewed biasing ? (currents here are >> a,b)
If there is lateral current of some mA, than floating Vdd may not be a great idea, as it could float above 1.8V which could be fatal.

e) is unclear, if that is the SAME silicon as a,b, or a different part. e) is the only plot that gives Idd, but I'd expect static Idd on a 'good' part to be uA not mA.
What is the RCFAST current and does RCFAST run in their test mode ? (was this test run in Test or normal, and was RST active ?)

Missing is Idd info (Vdd=1.8V) as VIO sweeps up.

Idd in e) starts to ramp just above 200mV and hits a very nicely regulated ~20mA at Vdd >= 400mV thru to 1.8V
What does a P2 draw when RST = active, and RCFAST is running ?
The very flat / regulated 20mA is rather a surprise, as any switching current should be Vdd proportional
Id = Cpd * Vcc * Fi + Istatic, but if there is no switching (RCFAST = OFF) then that 20mA seems high ?

cgracey wrote: »

Next week, we are going to build seven new P2 Eval boards with our seven remaining v2 chips. I'm hoping most of them work.

Maybe build 5 and send 2 (or 3) to Peter ? - so he can mount on P2D2 - then that brings another supply pairing into the mix, as P2D2 does not use the same SMPS / Linear regs.

Automated Testing of P2's

Comments