List of Changes in Next P2 Silicon

12346»

Comments

  • Due to the presence of the thermal pad, the board and enclosure heatsinking will matter a lot. The better that is the less relative rise will occur. The design of the heatsinking will be a factor in the Prop2 usable clockrate spec. Where it wasn't in the Prop1.

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • cgraceycgracey Posts: 11,697
    edited 2019-04-04 - 00:26:35
    Okay. Wendy sent me the current-silicon power test simulation that shows 1.2W:
                            Internal  Switching  Leakage    Total
    Power Group             Power     Power      Power      Power   (     %)  Attrs
    --------------------------------------------------------------------------------
    clock_network              0.8340    0.2442 1.136e-06    1.0783 (87.15%)  i
    register                   0.0147 1.255e-03 2.577e-05    0.0160 ( 1.29%)  
    combinational              0.0162    0.0430 9.582e-05    0.0594 ( 4.80%)  
    sequential              4.971e-04    0.0000 1.453e-06 4.985e-04 ( 0.04%)  
    memory                     0.0152 3.047e-03 1.950e-04    0.0185 ( 1.49%)  
    io_pad                     0.0000    0.0646 8.822e-05    0.0647 ( 5.23%)  
    black_box                  0.0000    0.0000 3.884e-14 3.884e-14 ( 0.00%)  
    
      Net Switching Power  =    0.3561   (28.78%)
      Cell Internal Power  =    0.8807   (71.18%)
      Cell Leakage Power   = 4.074e-04   ( 0.03%)
                             ---------
    Total Power            =    1.2372  (100.00%)
    

    And for comparison, here is the future-silicon test that shows 790mW:
                            Internal  Switching  Leakage    Total
    Power Group             Power     Power      Power      Power   (     %)  Attrs
    --------------------------------------------------------------------------------
    clock_network              0.3084    0.1488 3.312e-06    0.4572 (57.93%)  i
    register                8.300e-03 3.107e-03 2.573e-05    0.0114 ( 1.45%)  
    combinational              0.0361    0.0769 7.545e-05    0.1131 (14.33%)  
    sequential              4.566e-04 2.212e-05 1.270e-06 4.800e-04 ( 0.06%)  
    memory                     0.1056 1.972e-03 1.950e-04    0.1078 (13.65%)  
    io_pad                     0.0000    0.0992 8.938e-05    0.0993 (12.58%)  
    black_box                  0.0000    0.0000 9.151e-14 9.151e-14 ( 0.00%)  
    
      Net Switching Power  =    0.3301   (41.82%)
      Cell Internal Power  =    0.4589   (58.13%)
      Cell Leakage Power   = 3.902e-04   ( 0.05%)
                             ---------
    Total Power            =    0.7894  (100.00%)
    

    I just remembered that the current-silicon test executes from cog, while the future-silicon test executes from hub, activating the hub RAMs. So, the total power difference is ~90mW greater than shown. That means the current-silicon power should be about 1.33W, which gets reduced by 40% in the future silicon.
  • jmgjmg Posts: 13,902
    cgracey wrote: »
    Okay. Wendy sent me the current-silicon power test simulation that shows 1.2W:
    Good, that's close to the reported typical of P2es

    The reports still shows some changes ?
                              Total                                             Total
    Power Group               Power   (     %)  Attrs    Power   (     %)  Attrs
    -------------------------------------------------- --------------------------
    clock_network              1.0783 (87.15%)  i         0.4572 (57.93%)  i
    register                   0.0160 ( 1.29%)            0.0114 ( 1.45%)  
    combinational              0.0594 ( 4.80%)            0.1131 (14.33%)  
    sequential              4.985e-04 ( 0.04%)         4.800e-04 ( 0.06%)  
    memory                     0.0185 ( 1.49%)            0.1078 (13.65%)  
    io_pad                     0.0647 ( 5.23%)            0.0993 (12.58%)  
    black_box               3.884e-14 ( 0.00%)         9.151e-14 ( 0.00%)  
                                                       
      Net Switching Power  =    0.3561   (28.78%)      0.3301   (41.82%)
      Cell Internal Power  =    0.8807   (71.18%)      0.4589   (58.13%)
      Cell Leakage Power   = 4.074e-04   ( 0.03%)   3.902e-04   ( 0.05%)
                             ---------              ---------
    Total Power            =    1.2372  (100.00%)      0.7894  (100.00%)
    

    Memory power has increased quite a bit, why would that be ?
    Clock_network has fallen (-0.6211W ! ), but partly offset by an increase (+0.0537) in combinational as the clock gating moves reporting columns.
    Overall, that is now 64% of previous Power value, which is quite a gain. (1.57 x MHz for the same power)
    Seems almost too good to be true ?
  • Jmg,

    I modified my post above. It's actually even better, because the future-silicon test executes from hub, while the current-silicon test executed from cog RAM. We need to add 89mW to the current-silicon power.

    Future power is 59.5% of current power, yielding 1.68 x MHz for the same power.
  • jmgjmg Posts: 13,902
    cgracey wrote: »
    I modified my post above. It's actually even better, because the future-silicon test executes from hub, while the current-silicon test executed from cog RAM. We need to add 89mW to the current-silicon power.
    Future power is 59.5% of current power, yielding 1.68 x MHz for the same power.
    Ah, that explains the Memory power change :)


  • Now up to 68% more fuel economy from batteries! YMMV :smiley:
  • cgraceycgracey Posts: 11,697
    edited 2019-04-04 - 00:47:42
    Here is the current-silicon power test that run from cog RAM:
    dat
    		orgh	0
    '
    ' Launch all cogs with test program.
    '
    		org
    
    .loop		coginit	cognum,#@pgm	'last iteration relaunches cog 0
    		djnf	cognum,#.loop
    
    cognum		long	7
    '
    ' Toggle 8 pins, get long from hub, start cordic command
    '
    		org
    
    pgm		cogid	x		'which cog am I, 0..7?
    		shl	x,#3
    
    		rep	@.p,#8		'start pwm pins
    		wrpin	#%01_01001_0,x
    		wxpin	pat,x
    		wypin	#1,x
    		dirh	x
    		add	x,#1
    .p
    
    .loop		rflong	y
    		qrotate	y,y
    		jmp	#.loop
    
    
    pat		long	$0010_0001
    
    x		res	1
    y		res	1
    


    Here is the future-silicon power test that runs from hub RAM (more power):
    dat
    		orgh	0
    '
    ' Launch all cogs with test program.
    '
    		org
    
    .loop		coginit	cognum,#@pgm	'last iteration relaunches cog 0
    		djnf	cognum,#.loop
    
    cognum		long	7
    '
    ' Toggle 8 pins, get long from hub, start cordic command
    '
    		org
    
    pgm		cogid	x		'which cog am I, 0..7?
    		shl	x,#3
    
    		rep	@.p,#8		'start pwm pins
    		wrpin	#%01_01001_0,x
    		wxpin	pat,x
    		wypin	#1,x
    		dirh	x
    		add	x,#1
    .p
    
    		wrlong	.loop+0,##$400
    		wrlong	.loop+1,##$404
    		jmp	#$400
    
    .loop		qrotate	y,y
    		jmp	#.loop
    
    
    pat		long	$0010_0001
    
    x		res	1
    y		res	1
    


    Actually, the current-silicon test was doing an RFLONG, which was exercising the FIFO, but not nearly as much as hub-exec in the future-silicon version. So, I imagine that the future-silicon power is maybe only 58% of the current-silicon power.
  • Nice going. Those energy saving efforts seem worthwhile.
  • I noticed the extra memory power, and then read it's because of hubexec.

    Nice power reduction! 0.8W is quite manageable, and that is sort of max too. Most real-life will be significantly less :smiley:
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99 wrote: »
    I noticed the extra memory power, and then read it's because of hubexec.

    Nice power reduction! 0.8W is quite manageable, and that is sort of max too. Most real-life will be significantly less :smiley:

    And remember that doesn't include the 3.3V I/O power.
  • jmgjmg Posts: 13,902
    cgracey wrote: »
    And remember that doesn't include the 3.3V I/O power.
    Using a quick Cpd compare, I get that a 8-COG, all Pins P2+ needs to run at ~ 61MHz SysCLK to match the power drain of a 80MHz P1.


  • jmg wrote: »
    cgracey wrote: »
    And remember that doesn't include the 3.3V I/O power.
    Using a quick Cpd compare, I get that a 8-COG, all Pins P2+ needs to run at ~ 61MHz SysCLK to match the power drain of a 80MHz P1.

    So if your numbers are right jmg that would basically mean that we can run ~50% more PASM instructions on a P2 than a P1 for the same core input power. And many of these P2 instructions are more flexible/powerful than the existing P1 instructions. Still seems like a net gain.

    Be interesting to see what current draw we could end up with on the P2 at low frequency RC clock rates for sleep modes etc when hopefully many clocks can get gated off. Can that be estimated yet from these numbers?
  • jmgjmg Posts: 13,902
    rogloh wrote: »
    Be interesting to see what current draw we could end up with on the P2 at low frequency RC clock rates for sleep modes etc when hopefully many clocks can get gated off. Can that be estimated yet from these numbers?
    You can use Cpd to calculate the 1-COG, 20kHz number, of around 36uA + Static Icc (should be similar to P2es static/leakage?), or measure a P2es with stopped Clock, then 20kHz, and scale the dynamic Icc by ~ 0.58

  • jmg wrote: »
    Using a quick Cpd compare, I get that a 8-COG, all Pins P2+ needs to run at ~ 61MHz SysCLK to match the power drain of a 80MHz P1.

    So, only extending those basic numbers:

    A P2, running at ~61 MHz, is able to execute >30 M basic instructions per second, thus equivalent to >1.5 x P1 capabilities, using the same set of basic instructions.

    And a P2 has twice the pin count, taken as a means of interacting with surrounding logic.

    Calculating... 1.5 x 2 = 3; then P2 can outrun three P1s, without having to resort to a single Smart Cell, nor Lut sharing, or Streamer-related features.

    Even a lot more, because all P2 features can be internally coordinated, an strategy that is time and many resource consuming, when attempted at the 3 x P1 setup.

    Hi rogloh

    You've did beat me. I'm such a slow-writer. :lol:
  • cgraceycgracey Posts: 11,697
    edited 2019-04-05 - 19:21:13
    I just went over the final approval doc's and timing constraints with Wendy and I signed off on everything. So, the reticles are going to be made now and it will go into fab.

    Fifteen weeks from now will be July 20, when the prototypes should be arriving. A month after that we should have 1,000 chips and new P2 Eval boards.

    I've been working on the new Spin compiler, backing into the PUB/PRI section after updating all the ancillary functions. Hopefully, it will show signs of life soon. I'm happy to be working on tools, finally, like ersmith and RossH have been.
  • potatoheadpotatohead Posts: 9,785
    edited 2019-04-05 - 19:30:20
    Awesome!

    Fingers crossed this revision will be solid. Thank you Chip.

    And 1k eval boards? And or chips? PARTY IN ROCKLIN
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • msrobotsmsrobots Posts: 2,909
    edited 2019-04-07 - 17:45:11
    @cgracey,

    when you are back working on spin2, please look at the enhancements @ersmith already did to his spin2 version.

    Some things he did are very useful, the biggest one are optional parameter when calling a function, multiple return values, @ @ @, path for objects to include and #define/ #ifdef.

    I have not used the last ones but they will make a lot of sense for writing code able to run on the p1 and the p2.
    .
    But fastspin does compile to native code and even with 512K a bytecode interpreter will be for sure needed. HDMI will use a lot of ram.

    Anyways, welcome back from the world of HDL's to the world of us normal(?) programmers, you maybe have not even realized how much fun it is to program the P2, being stuck in development of it for so long.

    This brainchild of yours is just incredible, I am currently (re)writing a fullduplex driver, and every time I start new the thing gets smaller and faster, even if I still fight to understand/use most of the changed/new instructions.

    Thanks for bringing fun back into programming,

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
Sign In or Register to comment.