List of Changes in Next P2 Silicon

jmg · 2018-11-16 22:05

Rayman wrote: »

Ok, I can see the "CLK output mode" useful when using external HDMI encoder too...

For HDMI you get lucky, as that is a more complex interface, it includes a clock PLL, and clock is sent at a lower rate (Data rate /10)
It's the simpler synchronous interfaces like SPI that need CLK at data rate.

jmg · 2018-11-16 22:10

Tubular wrote: »

I'd add a look at 'pin pulldown' to the list. Its been observed that floating inputs tend towards '1' rather than '0', perhaps thanks to the interleaving positive VIO and VDD pins being closest.

I know analog pad block respin isn't on the cards for the next iteration, and thats absolutely fine, but we do have other options such as whether to engage the 150 kohm pulldown resistors, or look at a GND "guard ring" on the pcb that might tend things back toward towards '0'

This is all really low priority, it nothing at all is done all we have to do is manage user expectations for why their inputs show '1' when nothing is connected, but while we're making a list it may as well go on it.

Any SW load is only a partial effect to floating pins issues, as during RESET=L time, the P2 pins are floating, and until reset exit delays are done, plus serial ROM loaded time, things are undefined.
Then, do you pull down, or pull up ?
SPI Chip selects are active LOW, and so are UARTs, and i2c, so it is common for MCUs to reset with light pullups. Light enough that any pin that needs to be LOW during reset can be pulled down with a resistor.

cgracey · 2018-11-16 22:24

Guys, the floating pull-up current is due to incidental leakage and I'm sure it measures in the mere nanoamps. There's always going to be leakage and either VIO leakage or GND leakage is going to dominate.

cgracey · 2018-11-16 22:31

Cluso99 wrote: »
cgracey wrote: »
I just finished getting the PTRx behavior straightened out for SETQ(2)+RD/WR/WMLONG.

It works like this:
	SETQ	#16-1		'ready to transfer 16 longs
	RDLONG	base,PTRA	'read at PTRA

	SETQ	#10-1		'ready to transfer 10 longs
	RDLONG	base,++PTRB	'read at PTRB+10<<2, PTRB += 10<<2

	SETQ	#100-1		'ready to transfer 100 longs
	RDLONG	base,--PTRA	'read at PTRA-100<<2, PTRA -= 100<<2

	SETQ	#8-1		'ready to transfer 8 longs
	RDLONG	base,PTRA++	'read at PTRA, PTRA += 8<<2

	SETQ	#5-1		'ready to transfer 5 longs
	RDLONG	base,PTRB--	'read at PTRB, PTRB -= 5<<2
Only the MSB of the encoded index is used to increment or decrement PTRx by the block size. This way, you can keep loading or storing memory sequentially.

One more thing off the list.
Am I understanding this correctly for --PTRx
	SETQ	#100-1		'ready to transfer 100 longs
	RDLONG	base,--PTRA	'read at PTRA-100<<2, PTRA -= 100<<2
First location read is
RDLONG base, PTR-100<<2
next is
RDLONG base+2, PTR-99<<2
etc, and when done
PTR=PTR-100<<2

That's right. It will track based on block size, which is (Q+1)×4.

Wait, your example is wrong, in that the first register written is base, followed by base+1, etc.

Multiple 'SETQ+RDLONG base,PTRA++' operations will read blocks in order.

Ken Gracey · 2018-11-16 22:37

Chip,

ON comes to Parallax in ten days with their ECO and associated agreement ($$$$). Their schedule will depend on your schedule.

How much time are you allocating to customer testing once the P2 Eval Board is shipped?

How much time do you need to complete the list of changes you posted? When you make changes, will they need to be tested in FPGA by users?

Only when these two variables are known will we be able to consider signing their agreement and pressing them for a schedule. We need to be ready to talk specifics and provide a schedule when they arrive at our office.

Ken Gracey

cgracey · 2018-11-16 22:37

Rayman wrote: »

Maybe new ROM should be on list?

Yes. Just added that.

cgracey · 2018-11-16 23:05

Ken, I should be done in ten days.

Cluso99 · 2018-11-17 00:21

cgracey wrote: »
Cluso99 wrote: »
cgracey wrote: »
I just finished getting the PTRx behavior straightened out for SETQ(2)+RD/WR/WMLONG.

It works like this:
	SETQ	#16-1		'ready to transfer 16 longs
	RDLONG	base,PTRA	'read at PTRA

	SETQ	#10-1		'ready to transfer 10 longs
	RDLONG	base,++PTRB	'read at PTRB+10<<2, PTRB += 10<<2

	SETQ	#100-1		'ready to transfer 100 longs
	RDLONG	base,--PTRA	'read at PTRA-100<<2, PTRA -= 100<<2

	SETQ	#8-1		'ready to transfer 8 longs
	RDLONG	base,PTRA++	'read at PTRA, PTRA += 8<<2

	SETQ	#5-1		'ready to transfer 5 longs
	RDLONG	base,PTRB--	'read at PTRB, PTRB -= 5<<2
Only the MSB of the encoded index is used to increment or decrement PTRx by the block size. This way, you can keep loading or storing memory sequentially.

One more thing off the list.
Am I understanding this correctly for --PTRx
	SETQ	#100-1		'ready to transfer 100 longs
	RDLONG	base,--PTRA	'read at PTRA-100<<2, PTRA -= 100<<2
First location read is
RDLONG base, PTR-100<<2
next is
RDLONG base+1, PTR-99<<2 (postedit correction)
etc, and when done
PTR=PTR-100<<2
That's right. It will track based on block size, which is (Q+1)×4.

Wait, your example is wrong, in that the first register written is base, followed by base+1, etc.

Multiple 'SETQ+RDLONG base,PTRA++' operations will read blocks in order.

Thanks Chip. Corrected the error in my post (and here too).
It's as I thought you explained. A bit confusing to effectively reverse the data, but there is probably good reason, and we can use that reversal to advantage elsewhere too I'm sure.

cgracey · 2018-11-17 00:31

Cluso99 wrote: »
cgracey wrote: »
Cluso99 wrote: »
cgracey wrote: »
I just finished getting the PTRx behavior straightened out for SETQ(2)+RD/WR/WMLONG.

It works like this:
	SETQ	#16-1		'ready to transfer 16 longs
	RDLONG	base,PTRA	'read at PTRA

	SETQ	#10-1		'ready to transfer 10 longs
	RDLONG	base,++PTRB	'read at PTRB+10<<2, PTRB += 10<<2

	SETQ	#100-1		'ready to transfer 100 longs
	RDLONG	base,--PTRA	'read at PTRA-100<<2, PTRA -= 100<<2

	SETQ	#8-1		'ready to transfer 8 longs
	RDLONG	base,PTRA++	'read at PTRA, PTRA += 8<<2

	SETQ	#5-1		'ready to transfer 5 longs
	RDLONG	base,PTRB--	'read at PTRB, PTRB -= 5<<2
Only the MSB of the encoded index is used to increment or decrement PTRx by the block size. This way, you can keep loading or storing memory sequentially.

One more thing off the list.
Am I understanding this correctly for --PTRx
	SETQ	#100-1		'ready to transfer 100 longs
	RDLONG	base,--PTRA	'read at PTRA-100<<2, PTRA -= 100<<2
First location read is
RDLONG base, PTR-100<<2
next is
RDLONG base+1, PTR-99<<2 (postedit correction)
etc, and when done
PTR=PTR-100<<2
That's right. It will track based on block size, which is (Q+1)×4.

Wait, your example is wrong, in that the first register written is base, followed by base+1, etc.

Multiple 'SETQ+RDLONG base,PTRA++' operations will read blocks in order.
Thanks Chip. Corrected the error in my post (and here too).
It's as I thought you explained. A bit confusing to effectively reverse the data, but there is probably good reason, and we can use that reversal to advantage elsewhere too I'm sure.

Think of 'SETQ(2)+RDLONG base,--PTRA' as a POP and it makes sense.

'SETQ(2)+WRLONG base,PTRA++' is like a PUSH.

You've also got PTRA, ++PTRA, and PTRA-- to use. They all conform to the block size of (Q+1)*4.

TonyB_ · 2018-11-17 00:39

cgracey wrote: »
Cluso99 wrote: »
cgracey wrote: »
Cluso99 wrote: »
cgracey wrote: »
I just finished getting the PTRx behavior straightened out for SETQ(2)+RD/WR/WMLONG.

It works like this:
	SETQ	#16-1		'ready to transfer 16 longs
	RDLONG	base,PTRA	'read at PTRA

	SETQ	#10-1		'ready to transfer 10 longs
	RDLONG	base,++PTRB	'read at PTRB+10<<2, PTRB += 10<<2

	SETQ	#100-1		'ready to transfer 100 longs
	RDLONG	base,--PTRA	'read at PTRA-100<<2, PTRA -= 100<<2

	SETQ	#8-1		'ready to transfer 8 longs
	RDLONG	base,PTRA++	'read at PTRA, PTRA += 8<<2

	SETQ	#5-1		'ready to transfer 5 longs
	RDLONG	base,PTRB--	'read at PTRB, PTRB -= 5<<2
Only the MSB of the encoded index is used to increment or decrement PTRx by the block size. This way, you can keep loading or storing memory sequentially.

One more thing off the list.
Am I understanding this correctly for --PTRx
	SETQ	#100-1		'ready to transfer 100 longs
	RDLONG	base,--PTRA	'read at PTRA-100<<2, PTRA -= 100<<2
First location read is
RDLONG base, PTR-100<<2
next is
RDLONG base+1, PTR-99<<2 (postedit correction)
etc, and when done
PTR=PTR-100<<2
That's right. It will track based on block size, which is (Q+1)×4.

Wait, your example is wrong, in that the first register written is base, followed by base+1, etc.

Multiple 'SETQ+RDLONG base,PTRA++' operations will read blocks in order.
Thanks Chip. Corrected the error in my post (and here too).
It's as I thought you explained. A bit confusing to effectively reverse the data, but there is probably good reason, and we can use that reversal to advantage elsewhere too I'm sure.
Think of 'SETQ(2)+RDLONG base,--PTRA' as a POP and it makes sense.

'SETQ(2)+WRLONG base,PTRA++' is like a PUSH.

You've also got PTRA, ++PTRA, and PTRA-- to use. They all conform to the block size of (Q+1)*4.

And data not reversed.

cgracey · 2019-01-28 14:16

Here is the new streamer command summary:

GETXACC D			'get Goertzel X into D and Goertzel Y into next S, clear X and Y

SETDACS	D/#			'set four 8-bit DACs
SETXFRQ	D/#			'set frequency

XINIT	D/#,S/#		'init, reset phase, delay ~4 clocks, start
XZERO	D/#,S/#		'continue, reset phase (normalizes jitter on video scan lines)
XCONT	D/#,S/#		'continue, maintain phase

SETQ before XINIT/XZERO/XCONT can be used to synchronously set frequency


XINIT/XZERO/XCONT D/#,S/#:

D/#														S/#
mode	dacs	pins	misc	cnt			imm		description				pins	dac output
----	----	----	----	---			---		-----------				----	----------
0000	dddd	eppp	bbbb	xxxxxxxxxxxxxxxx	<long>		imm -> 32 x 1-bit LUT			32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba
0001	dddd	eppp	bbbb	xxxxxxxxxxxxxxxx	<long>		imm -> 16 x 2-bit LUT			32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba
0010	dddd	eppp	bbbb	xxxxxxxxxxxxxxxx	<long>		imm ->  8 x 4-bit LUT			32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba
0011	dddd	eppp	bbbb	xxxxxxxxxxxxxxxx	<long>		imm ->  4 x 8-bit LUT			32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

0100	dddd	eppp	pppa	xxxxxxxxxxxxxxxx	<long>		imm 32 x 1  ->  1-pin + 1-DAC1, a=alt	 1	%00000000_00000000_00000000_aaaaaaaa
0101	dddd	eppp	pp0a	xxxxxxxxxxxxxxxx	<long>		imm 16 x 2  ->  2-pin + 2-DAC1, a=alt	 2	%00000000_00000000_bbbbbbbb_aaaaaaaa
0101	dddd	eppp	pp1a	xxxxxxxxxxxxxxxx	<long>		imm 16 x 2  ->  2-pin + 1-DAC2, a=alt	 2	%00000000_00000000_00000000_babababa
0110	dddd	eppp	p00a	xxxxxxxxxxxxxxxx	<long>		imm  8 x 4  ->  4-pin + 4-DAC1, a=alt	 4	%dddddddd_cccccccc_bbbbbbbb_aaaaaaaa
0110	dddd	eppp	p01a	xxxxxxxxxxxxxxxx	<long>		imm  8 x 4  ->  4-pin + 2-DAC2, a=alt	 4	%00000000_00000000_dcdcdcdc_babababa
0110	dddd	eppp	p10a	xxxxxxxxxxxxxxxx	<long>		imm  8 x 4  ->  4-pin + 1-DAC4, a=alt	 4	%00000000_00000000_00000000_dcbadcba
0110	dddd	eppp	0110	xxxxxxxxxxxxxxxx	<long>		imm  4 x 8  ->  8-pin + 4-DAC2		 8	%hghghghg_fefefefe_dcdcdcdc_babababa
0110	dddd	eppp	0111	xxxxxxxxxxxxxxxx	<long>		imm  4 x 8  ->  8-pin + 2-DAC4		 8	%00000000_00000000_hgfehgfe_dcbadcba
0110	dddd	eppp	1110	xxxxxxxxxxxxxxxx	<long>		imm  4 x 8  ->  8-pin + 1-DAC8		 8	%00000000_00000000_00000000_hgfedcba
0110	dddd	eppp	1111	xxxxxxxxxxxxxxxx	<long>		imm  2 x 16 -> 16-pin + 4-DAC4		16	%ponmponm_lkjilkji_hgfehgfe_dcbadcba
0111	dddd	eppp	0000	xxxxxxxxxxxxxxxx	<long>		imm  2 x 16 -> 16-pin + 2-DAC8		16	%00000000_00000000_ponmlkji_hgfedcba
0111	dddd	eppp	0001	xxxxxxxxxxxxxxxx	<long>		imm  1 x 32 -> 32-pin + 4-DAC8		32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

0111	dddd	eppp	001a	xxxxxxxxxxxxxxxx	bbbb		RFLONG -> 32 x 1-bit LUT, a=alt		32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba
0111	dddd	eppp	010a	xxxxxxxxxxxxxxxx	bbbb		RFLONG -> 16 x 2-bit LUT, a=alt		32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba
0111	dddd	eppp	011a	xxxxxxxxxxxxxxxx	bbbb		RFLONG ->  8 x 4-bit LUT, a=alt		32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba
0111	dddd	eppp	1000	xxxxxxxxxxxxxxxx	bbbb		RFLONG ->  4 x 8-bit LUT		32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

1000	dddd	eppp	pppa	xxxxxxxxxxxxxxxx	-	    1/8 RFBYTE ->  1-pin + 1-DAC1, a=alt	 1	%00000000_00000000_00000000_aaaaaaaa
1001	dddd	eppp	pp0a	xxxxxxxxxxxxxxxx	-	    1/4 RFBYTE ->  2-pin + 2-DAC1, a=alt	 2	%00000000_00000000_bbbbbbbb_aaaaaaaa
1001	dddd	eppp	pp1a	xxxxxxxxxxxxxxxx	-	    1/4 RFBYTE ->  2-pin + 1-DAC2, a=alt	 2	%00000000_00000000_00000000_babababa
1010	dddd	eppp	p00a	xxxxxxxxxxxxxxxx	-	    1/2 RFBYTE ->  4-pin + 4-DAC1, a=alt	 4	%dddddddd_cccccccc_bbbbbbbb_aaaaaaaa
1010	dddd	eppp	p01a	xxxxxxxxxxxxxxxx	-	    1/2 RFBYTE ->  4-pin + 2-DAC2, a=alt	 4	%00000000_00000000_dcdcdcdc_babababa
1010	dddd	eppp	p10a	xxxxxxxxxxxxxxxx	-	    1/2 RFBYTE ->  4-pin + 1-DAC4, a=alt	 4	%00000000_00000000_00000000_dcbadcba
1010	dddd	eppp	0110	xxxxxxxxxxxxxxxx	-		RFBYTE ->  8-pin + 4-DAC2		 8	%hghghghg_fefefefe_dcdcdcdc_babababa
1010	dddd	eppp	0111	xxxxxxxxxxxxxxxx	-		RFBYTE ->  8-pin + 2-DAC4		 8	%00000000_00000000_hgfehgfe_dcbadcba
1010	dddd	eppp	1110	xxxxxxxxxxxxxxxx	-		RFBYTE ->  8-pin + 1-DAC8		 8	%00000000_00000000_00000000_hgfedcba
1010	dddd	eppp	1111	xxxxxxxxxxxxxxxx	-		RFWORD -> 16-pin + 4-DAC4		16	%ponmponm_lkjilkji_hgfehgfe_dcbadcba
1011	dddd	eppp	0000	xxxxxxxxxxxxxxxx	-		RFWORD -> 16-pin + 2-DAC8		16	%00000000_00000000_ponmlkji_hgfedcba
1011	dddd	eppp	0001	xxxxxxxxxxxxxxxx	-		RFLONG -> 32-pin + 4-DAC8		32	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

1011	dddd	eppp	0010	xxxxxxxxxxxxxxxx	rgb		RFBYTE -> 24-pin + LUMA8		32	%rrrrrrrr_gggggggg_bbbbbbbb_00000000
1011	dddd	eppp	0011	xxxxxxxxxxxxxxxx	-		RFBYTE -> 24-pin + RGBI8		32	%rrrrrrrr_gggggggg_bbbbbbbb_00000000
1011	dddd	eppp	0100	xxxxxxxxxxxxxxxx	-		RFBYTE -> 24-pin + RGB8  (3:3:2)	32	%rrrrrrrr_gggggggg_bbbbbbbb_00000000
1011	dddd	eppp	0101	xxxxxxxxxxxxxxxx	-		RFWORD -> 24-pin + RGB16 (5:6:5)	32	%rrrrrrrr_gggggggg_bbbbbbbb_00000000
1011	dddd	eppp	0110	xxxxxxxxxxxxxxxx	-		RFLONG -> 24-pin + RGB24 (8:8:8)	32	%rrrrrrrr_gggggggg_bbbbbbbb_00000000

1100	dddd	wppp	pppa	xxxxxxxxxxxxxxxx	-		 1-pin -> 1-DAC1 + 1/8 WFBYTE, a=alt	 1 in	%00000000_00000000_00000000_aaaaaaaa
1101	dddd	wppp	pp0a	xxxxxxxxxxxxxxxx	-		 2-pin -> 2-DAC1 + 1/4 WFBYTE, a=alt	 2 in	%00000000_00000000_bbbbbbbb_aaaaaaaa
1101	dddd	wppp	pp1a	xxxxxxxxxxxxxxxx	-		 2-pin -> 1-DAC2 + 1/4 WFBYTE, a=alt	 2 in	%00000000_00000000_00000000_babababa
1110	dddd	wppp	p00a	xxxxxxxxxxxxxxxx	-		 4-pin -> 4-DAC1 + 1/2 WFBYTE, a=alt	 4 in	%dddddddd_cccccccc_bbbbbbbb_aaaaaaaa
1110	dddd	wppp	p01a	xxxxxxxxxxxxxxxx	-		 4-pin -> 2-DAC2 + 1/2 WFBYTE, a=alt	 4 in	%00000000_00000000_dcdcdcdc_babababa
1110	dddd	wppp	p10a	xxxxxxxxxxxxxxxx	-		 4-pin -> 1-DAC4 + 1/2 WFBYTE, a=alt	 4 in	%00000000_00000000_00000000_dcbadcba
1110	dddd	wppp	0110	xxxxxxxxxxxxxxxx	-		 8-pin -> 4-DAC2 + WFBYTE		 8 in	%hghghghg_fefefefe_dcdcdcdc_babababa
1110	dddd	wppp	0111	xxxxxxxxxxxxxxxx	-		 8-pin -> 2-DAC4 + WFBYTE		 8 in	%00000000_00000000_hgfehgfe_dcbadcba
1110	dddd	wppp	1110	xxxxxxxxxxxxxxxx	-		 8-pin -> 1-DAC8 + WFBYTE		 8 in	%00000000_00000000_00000000_hgfedcba
1110	dddd	wppp	1111	xxxxxxxxxxxxxxxx	-		16-pin -> 4-DAC4 + WFWORD		16 in	%ponmponm_lkjilkji_hgfehgfe_dcbadcba
1111	dddd	wppp	0000	xxxxxxxxxxxxxxxx	-		16-pin -> 2-DAC8 + WFWORD		16 in	%00000000_00000000_ponmlkji_hgfedcba
1111	dddd	wppp	0001	xxxxxxxxxxxxxxxx	-		32-pin -> 4-DAC8 + WFLONG		32 in	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

1111	dddd	w---	0010	xxxxxxxxxxxxxxxx	aa		1-ADC8 -> 1-DAC8 + WFBYTE		 8 in	%00000000_00000000_00000000_hgfedcba
1111	dddd	wppp	0011	xxxxxxxxxxxxxxxx	aa	1-ADC8 + 8-pin -> 2-DAC8 + WFWORD		16 in	%00000000_00000000_ponmlkji_hgfedcba
1111	dddd	w---	0100	xxxxxxxxxxxxxxxx	a-		2-ADC8 -> 2-DAC8 + WFWORD		16 in	%00000000_00000000_ponmlkji_hgfedcba
1111	dddd	wppp	0101	xxxxxxxxxxxxxxxx	a-     2-ADC8 + 16-pin -> 4-DAC8 + WFLONG		32 in	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba
1111	dddd	w---	0110	xxxxxxxxxxxxxxxx	--		4-ADC8 -> 4-DAC8 + WFLONG		32 in	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

1111	dddd	0ppp	p111	xxxxxxxxxxxxxxxx	sgn4:ena4:siz3:base9	DDS/Goertzel LUT SINC1 *	 4 in	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba
1111	dddd	1ppp	p111	xxxxxxxxxxxxxxxx	sgn4:ena4:siz3:base9	DDS/Goertzel LUT SINC2 *	 4 in	%PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

* bytes[3:2] multiply-accumulate, all DAC bytes output with !msb, S/# [19:16] = inp sign, [15:12] = inp ena, [11:9] = size (512, 256, ..4), [8:0] = LUT base


dddd
----
0000	 -  -  -  -
0001	 0  0  0  0
0010	 -  -  0  0
0011	 0  0  -  -
0100	 -  -  -  0
0101	 -  -  0  -
0110	 -  0  -  -
0111	 0  -  -  -
1000	!0  0 !0  0
1001	 -  - !0  0
1010	!0  0  -  -
1011	 1  0  1  0
1100	 -  -  1  0
1101	 1  0  -  -
1110	!1  1 !0  0
1111	 3  2  1  0

eppp	= e:enable output, ppp:8-bit port base
wppp	= w:enable wfxxxx, ppp:8-bit port base
bbbb	= %bbbb00000 base in LUT

cgracey · 2019-03-08 03:59

I talked to Wendy at ON Semi today, who is doing the synthesis and place-and-route.

She said that even though our cell instance count has gone from 630k to 780k in the new silicon, the max-power test I gave her is showing a reduction from 1.2W to 1.0W. This is due to clock-gating she enabled in the synthesis tool. It makes the clock tree more complicated, but allows a lot of flops to lose their enable circuits which mux the Q output back into the D input. And 180MHz is still no problem.

I'm waiting for her to take a simulated power measurement during downloading, when only one cog is enabled. I think we'll see the current 77mA drop to under 10mA.

Clock gating means the chip will take power that is proportional to functionality, while the current silicon dissipates most all power in the clock tree, itself. The new clock tree will have many levels and take a lot less power.

Roy Eltham · 2019-03-08 04:52

Chip,
Sounds really good! Do we have a rough ETA to first samples if all goes smoothly from here forward?

rogloh · 2019-03-08 04:52

cgracey wrote: »

I'm waiting for her to take a simulated power measurement during downloading, when only one cog is enabled. I think we'll see the current 77mA drop to under 10mA.

That would be a great improvement. I hope that 250MHz is still an achievable overclock too.

jmg · 2019-03-08 05:00

cgracey wrote: »

I talked to Wendy at ON Semi today, who is doing the synthesis and place-and-route.

She said that even though our cell instance count has gone from 630k to 780k in the new silicon, the max-power test I gave her is showing a reduction from 1.2W to 1.0W....

So that max-power test, is all clocks gated ON ?
If so, nice that everything-running has fallen slightly, (to 83.33%) even tho the cells have increased by 23.8%.
Has that come at any cost to the indicated MHz, relative to the P2-ES ?

cgracey · 2019-03-08 05:02

We should have prototypes back in early June.

If anything, the next P2 should run even a little faster, because it will self-heat less.

jmg · 2019-03-08 05:04

cgracey wrote: »

We should have prototypes back in early June.

If anything, the next P2 should run even a little faster, because it will self-heat less.

Are they still trying to get 250MHz as a formal spec-point, ? even if that is at a reduced Tmax, and maybe 5% Vdd specs ?

jmg · 2019-03-08 05:09

cgracey wrote: »

I thought it would be helpful to list all changes made to the P2 source Verilog, so that everyone could anticipate what is coming next. I will maintain this list.
...
(18) Be able to output system CLK via smart pins, must explore with ON Semi.

Did (18) make the cut ?

cgracey · 2019-03-08 05:13

jmg wrote: »

cgracey wrote: »

I thought it would be helpful to list all changes made to the P2 source Verilog, so that everyone could anticipate what is coming next. I will maintain this list.
...
(18) Be able to output system CLK via smart pins, must explore with ON Semi.

Did (18) make the cut ?

No.

We are skew-banding the DIR, OUT, and IN signals to within 1ns across the chip, though.

cgracey · 2019-03-08 05:15

jmg wrote: »

cgracey wrote: »

We should have prototypes back in early June.

If anything, the next P2 should run even a little faster, because it will self-heat less.

Are they still trying to get 250MHz as a formal spec-point, ? even if that is at a reduced Tmax, and maybe 5% Vdd specs ?

Same target as before: 180MHz from -40C to +85C at 1.8V +- 5%.

Cluso99 · 2019-03-08 05:52

WOW. That's a big increase in cells.

Will be nice to see that reduction in power when less functions/cogs are in use. I've never used the full functionality of P1, so I am not expecting to on P2.

If anything causes a respin of the verilog, the WAITPAT may do with a look at.
P2 is waiting for the pattern to set an event when the pattern goes from not met to met, whereas IIRC the P1 would immediately return if the pattern was met. Therefore I am having to hand code the check instead of using WAITPAT.

Cluso99 · 2019-03-08 05:55

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

We should have prototypes back in early June.

If anything, the next P2 should run even a little faster, because it will self-heat less.

Are they still trying to get 250MHz as a formal spec-point, ? even if that is at a reduced Tmax, and maybe 5% Vdd specs ?

Same target as before: 180MHz from -50C to +85C at 1.8V +- 5%.

I thought you were going to shoot for 200MHz? Did I get this wrong or was that not on the table?

cgracey · 2019-03-08 05:57

Cluso99 wrote: »

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

We should have prototypes back in early June.

If anything, the next P2 should run even a little faster, because it will self-heat less.

Are they still trying to get 250MHz as a formal spec-point, ? even if that is at a reduced Tmax, and maybe 5% Vdd specs ?

Same target as before: 180MHz from -50C to +85C at 1.8V +- 5%.

I thought you were going to shoot for 200MHz? Did I get this wrong or was that not on the table?

We tried, but the biggest inhibitor was the hub memory.

cgracey · 2019-03-15 00:54

We had a conference call today with ON Semi to discuss the tapeout status.

With all the new logic in the next P2, ON has been having a very hard time closing timing.

Wendy did some compiles last night at 180, 170, and 160 MHz Fmax targets, in order to see how different speed goals affect the instance count.

Get a load of this... By dropping the goal from 180MHz to 160MHz, the instance count went from ~780k to ~680k, which is less than the current silicon contains. And we were only reaching 172MHz, anyway, with those ~780k instances. Those extra ~100k instances were certainly buffers to speed up signalling between flops. Those buffers were not only taking power, but increasing the routing congestion, which was the main impediment to meeting timing. A big metal-density/cell-scarcity hurricane-eye pattern was forming in the middle of the die from 100% routing utilization, pushing cells out of the middle of the logic area.

I told them today to just go with 160MHz, since 180MHz was not going to happen and even 172MHz was taking ~100k instances of extra cells.

So, 160MHz will close timing fine and it will take considerably less power at any frequency. If the chip runs cooler, it can go faster, so I think we'll probably be not much slower than the current silicon is. In addition to these ~100k cells going away, clock-gating is going to make a huge contribution to power reduction.

Over the next few days, simulations will reveal what the power levels will be.

evanh · 2019-03-15 01:25

Nice. Even lower power is definitely a positive. And I'm confident in getting solid over-clocking still.

Cluso99 · 2019-03-15 02:49

Yes. Seems like the best decision.

When done, is it possible to ask what a lesser spec would give as fmax?
Say 0C to +70C at 1.8V +- 2.5%

evanh · 2019-03-15 02:59

Cluso,
We already know, and have the hard evidence. If the die temperature is held low then it can go faster. It's called over-clocking.

Tubular · 2019-03-15 03:25

Sounds like progress. It's fascinating how the gate count evolves as you tighten or relax constraints.

Thanks for sharing this, Chip.

jmg · 2019-03-15 09:18

evanh wrote: »

Cluso,
We already know, and have the hard evidence. If the die temperature is held low then it can go faster. It's called over-clocking.

I think the question relates more to a commercial use specification rather than what one sample, from one run, might be able to achieve on a test bench.
Volume users need guaranteed specifications to work to.

jmg · 2019-03-15 09:21

cgracey wrote: »

We had a conference call today with ON Semi to discuss the tapeout status.

With all the new logic in the next P2, ON has been having a very hard time closing timing.

That was always a risk of the new added logic....

Wendy did some compiles last night at 180, 170, and 160 MHz Fmax targets, in order to see how different speed goals affect the instance count.

Get a load of this... By dropping the goal from 180MHz to 160MHz, the instance count went from ~780k to ~680k, which is less than the current silicon contains. And we were only reaching 172MHz, anyway, with those ~780k instances. [/quote]

What numbers does 165MHz give ?

List of Changes in Next P2 Silicon

Comments