List of Changes in Next P2 Silicon

evanh · 2019-04-03 14:37

Due to the presence of the thermal pad, the board and enclosure heatsinking will matter a lot. The better that is the less relative rise will occur. The design of the heatsinking will be a factor in the Prop2 usable clockrate spec. Where it wasn't in the Prop1.

cgracey · 2019-04-03 23:56

Okay. Wendy sent me the current-silicon power test simulation that shows 1.2W:

                        Internal  Switching  Leakage    Total
Power Group             Power     Power      Power      Power   (     %)  Attrs
--------------------------------------------------------------------------------
clock_network              0.8340    0.2442 1.136e-06    1.0783 (87.15%)  i
register                   0.0147 1.255e-03 2.577e-05    0.0160 ( 1.29%)  
combinational              0.0162    0.0430 9.582e-05    0.0594 ( 4.80%)  
sequential              4.971e-04    0.0000 1.453e-06 4.985e-04 ( 0.04%)  
memory                     0.0152 3.047e-03 1.950e-04    0.0185 ( 1.49%)  
io_pad                     0.0000    0.0646 8.822e-05    0.0647 ( 5.23%)  
black_box                  0.0000    0.0000 3.884e-14 3.884e-14 ( 0.00%)  

  Net Switching Power  =    0.3561   (28.78%)
  Cell Internal Power  =    0.8807   (71.18%)
  Cell Leakage Power   = 4.074e-04   ( 0.03%)
                         ---------
Total Power            =    1.2372  (100.00%)

And for comparison, here is the future-silicon test that shows 790mW:

                        Internal  Switching  Leakage    Total
Power Group             Power     Power      Power      Power   (     %)  Attrs
--------------------------------------------------------------------------------
clock_network              0.3084    0.1488 3.312e-06    0.4572 (57.93%)  i
register                8.300e-03 3.107e-03 2.573e-05    0.0114 ( 1.45%)  
combinational              0.0361    0.0769 7.545e-05    0.1131 (14.33%)  
sequential              4.566e-04 2.212e-05 1.270e-06 4.800e-04 ( 0.06%)  
memory                     0.1056 1.972e-03 1.950e-04    0.1078 (13.65%)  
io_pad                     0.0000    0.0992 8.938e-05    0.0993 (12.58%)  
black_box                  0.0000    0.0000 9.151e-14 9.151e-14 ( 0.00%)  

  Net Switching Power  =    0.3301   (41.82%)
  Cell Internal Power  =    0.4589   (58.13%)
  Cell Leakage Power   = 3.902e-04   ( 0.05%)
                         ---------
Total Power            =    0.7894  (100.00%)

I just remembered that the current-silicon test executes from cog, while the future-silicon test executes from hub, activating the hub RAMs. So, the total power difference is ~90mW greater than shown. That means the current-silicon power should be about 1.33W, which gets reduced by 40% in the future silicon.

jmg · 2019-04-04 00:21

cgracey wrote: »

Okay. Wendy sent me the current-silicon power test simulation that shows 1.2W:

Good, that's close to the reported typical of P2es

The reports still shows some changes ?

                          Total                                             Total
Power Group               Power   (     %)  Attrs    Power   (     %)  Attrs
-------------------------------------------------- --------------------------
clock_network              1.0783 (87.15%)  i         0.4572 (57.93%)  i
register                   0.0160 ( 1.29%)            0.0114 ( 1.45%)  
combinational              0.0594 ( 4.80%)            0.1131 (14.33%)  
sequential              4.985e-04 ( 0.04%)         4.800e-04 ( 0.06%)  
memory                     0.0185 ( 1.49%)            0.1078 (13.65%)  
io_pad                     0.0647 ( 5.23%)            0.0993 (12.58%)  
black_box               3.884e-14 ( 0.00%)         9.151e-14 ( 0.00%)  
                                                   
  Net Switching Power  =    0.3561   (28.78%)      0.3301   (41.82%)
  Cell Internal Power  =    0.8807   (71.18%)      0.4589   (58.13%)
  Cell Leakage Power   = 4.074e-04   ( 0.03%)   3.902e-04   ( 0.05%)
                         ---------              ---------
Total Power            =    1.2372  (100.00%)      0.7894  (100.00%)

Memory power has increased quite a bit, why would that be ?
Clock_network has fallen (-0.6211W ! ), but partly offset by an increase (+0.0537) in combinational as the clock gating moves reporting columns.
Overall, that is now 64% of previous Power value, which is quite a gain. (1.57 x MHz for the same power)
Seems almost too good to be true ?

cgracey · 2019-04-04 00:32

Jmg,

I modified my post above. It's actually even better, because the future-silicon test executes from hub, while the current-silicon test executed from cog RAM. We need to add 89mW to the current-silicon power.

Future power is 59.5% of current power, yielding 1.68 x MHz for the same power.

jmg · 2019-04-04 00:38

cgracey wrote: »

I modified my post above. It's actually even better, because the future-silicon test executes from hub, while the current-silicon test executed from cog RAM. We need to add 89mW to the current-silicon power.
Future power is 59.5% of current power, yielding 1.68 x MHz for the same power.

Ah, that explains the Memory power change

rogloh · 2019-04-04 00:40

Now up to 68% more fuel economy from batteries! YMMV

cgracey · 2019-04-04 00:44

Here is the current-silicon power test that run from cog RAM:

dat
		orgh	0
'
' Launch all cogs with test program.
'
		org

.loop		coginit	cognum,#@pgm	'last iteration relaunches cog 0
		djnf	cognum,#.loop

cognum		long	7
'
' Toggle 8 pins, get long from hub, start cordic command
'
		org

pgm		cogid	x		'which cog am I, 0..7?
		shl	x,#3

		rep	@.p,#8		'start pwm pins
		wrpin	#%01_01001_0,x
		wxpin	pat,x
		wypin	#1,x
		dirh	x
		add	x,#1
.p

.loop		rflong	y
		qrotate	y,y
		jmp	#.loop


pat		long	$0010_0001

x		res	1
y		res	1

Here is the future-silicon power test that runs from hub RAM (more power):

dat
		orgh	0
'
' Launch all cogs with test program.
'
		org

.loop		coginit	cognum,#@pgm	'last iteration relaunches cog 0
		djnf	cognum,#.loop

cognum		long	7
'
' Toggle 8 pins, get long from hub, start cordic command
'
		org

pgm		cogid	x		'which cog am I, 0..7?
		shl	x,#3

		rep	@.p,#8		'start pwm pins
		wrpin	#%01_01001_0,x
		wxpin	pat,x
		wypin	#1,x
		dirh	x
		add	x,#1
.p

		wrlong	.loop+0,##$400
		wrlong	.loop+1,##$404
		jmp	#$400

.loop		qrotate	y,y
		jmp	#.loop


pat		long	$0010_0001

x		res	1
y		res	1

Actually, the current-silicon test was doing an RFLONG, which was exercising the FIFO, but not nearly as much as hub-exec in the future-silicon version. So, I imagine that the future-silicon power is maybe only 58% of the current-silicon power.

Tubular · 2019-04-04 02:28

Nice going. Those energy saving efforts seem worthwhile.

Cluso99 · 2019-04-04 02:58

I noticed the extra memory power, and then read it's because of hubexec.

Nice power reduction! 0.8W is quite manageable, and that is sort of max too. Most real-life will be significantly less

cgracey · 2019-04-04 03:50

Cluso99 wrote: »

I noticed the extra memory power, and then read it's because of hubexec.

Nice power reduction! 0.8W is quite manageable, and that is sort of max too. Most real-life will be significantly less

And remember that doesn't include the 3.3V I/O power.

jmg · 2019-04-04 03:55

cgracey wrote: »

And remember that doesn't include the 3.3V I/O power.

Using a quick Cpd compare, I get that a 8-COG, all Pins P2+ needs to run at ~ 61MHz SysCLK to match the power drain of a 80MHz P1.

rogloh · 2019-04-04 05:15

jmg wrote: »

cgracey wrote: »

And remember that doesn't include the 3.3V I/O power.

Using a quick Cpd compare, I get that a 8-COG, all Pins P2+ needs to run at ~ 61MHz SysCLK to match the power drain of a 80MHz P1.

So if your numbers are right jmg that would basically mean that we can run ~50% more PASM instructions on a P2 than a P1 for the same core input power. And many of these P2 instructions are more flexible/powerful than the existing P1 instructions. Still seems like a net gain.

Be interesting to see what current draw we could end up with on the P2 at low frequency RC clock rates for sleep modes etc when hopefully many clocks can get gated off. Can that be estimated yet from these numbers?

jmg · 2019-04-04 05:21

rogloh wrote: »

Be interesting to see what current draw we could end up with on the P2 at low frequency RC clock rates for sleep modes etc when hopefully many clocks can get gated off. Can that be estimated yet from these numbers?

You can use Cpd to calculate the 1-COG, 20kHz number, of around 36uA + Static Icc (should be similar to P2es static/leakage?), or measure a P2es with stopped Clock, then 20kHz, and scale the dynamic Icc by ~ 0.58

Yanomani · 2019-04-04 05:30

jmg wrote: »

Using a quick Cpd compare, I get that a 8-COG, all Pins P2+ needs to run at ~ 61MHz SysCLK to match the power drain of a 80MHz P1.

So, only extending those basic numbers:

A P2, running at ~61 MHz, is able to execute >30 M basic instructions per second, thus equivalent to >1.5 x P1 capabilities, using the same set of basic instructions.

And a P2 has twice the pin count, taken as a means of interacting with surrounding logic.

Calculating... 1.5 x 2 = 3; then P2 can outrun three P1s, without having to resort to a single Smart Cell, nor Lut sharing, or Streamer-related features.

Even a lot more, because all P2 features can be internally coordinated, an strategy that is time and many resource consuming, when attempted at the 3 x P1 setup.

Hi rogloh

You've did beat me. I'm such a slow-writer.

cgracey · 2019-04-05 19:14

I just went over the final approval doc's and timing constraints with Wendy and I signed off on everything. So, the reticles are going to be made now and it will go into fab.

Fifteen weeks from now will be July 20, when the prototypes should be arriving. A month after that we should have 1,000 chips and new P2 Eval boards.

I've been working on the new Spin compiler, backing into the PUB/PRI section after updating all the ancillary functions. Hopefully, it will show signs of life soon. I'm happy to be working on tools, finally, like ersmith and RossH have been.

potatohead · 2019-04-05 19:29

Awesome!

Fingers crossed this revision will be solid. Thank you Chip.

And 1k eval boards? And or chips? PARTY IN ROCKLIN

msrobots · 2019-04-07 17:44

@cgracey,

when you are back working on spin2, please look at the enhancements @ersmith already did to his spin2 version.

Some things he did are very useful, the biggest one are optional parameter when calling a function, multiple return values, @ @ @, path for objects to include and #define/ #ifdef.

I have not used the last ones but they will make a lot of sense for writing code able to run on the p1 and the p2.
.
But fastspin does compile to native code and even with 512K a bytecode interpreter will be for sure needed. HDMI will use a lot of ram.

Anyways, welcome back from the world of HDL's to the world of us normal(?) programmers, you maybe have not even realized how much fun it is to program the P2, being stuck in development of it for so long.

This brainchild of yours is just incredible, I am currently (re)writing a fullduplex driver, and every time I start new the thing gets smaller and faster, even if I still fight to understand/use most of the changed/new instructions.

Thanks for bringing fun back into programming,

Mike

List of Changes in Next P2 Silicon

Comments