List of Changes in Next P2 Silicon

1246

Comments

  • Cluso99Cluso99 Posts: 15,233
    edited 2019-03-30 - 01:54:14
    WOW, that is huge for the clock tree :(

    290mW @ 1V8 = 161mA @ 200 MHz just for the clock tree ???

    That doesn't match the 40mA tho' I presume this is done at ~25MHz so maybe half that is the clock tree ???
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99Cluso99 Posts: 15,233
    edited 2019-03-30 - 01:58:42
    Don't know if this can be done...

    Can the clock tree be divided into 8 sections first, one for each cog? Then the relevant trees will be inactive when the cog(s) are inactive. This also applies to the hub access per cog.

    This may cut the clock tree power to almost 1/8th when only 1 cog is running.
    If so, at 20MHz it may shave 10mA, which translates to maybe 100mA at 200MHz.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99 wrote: »
    Don't know if this can be done...

    Can the clock tree be divided into 8 sections first, one for each cog? Then the relevant trees will be inactive when the cog(s) are inactive. This also applies to the hub access per cog.

    This may cut the clock tree power to almost 1/8th when only 1 cog is running.
    If so, at 20MHz it may shave 10mA, which translates to maybe 100mA at 200MHz.

    This clock gating was slipped into the design. It was not done with prior intent. We let the tools do it as they saw fit. Tapout is going on right now, so it's too late to do anything else.
  • Well we have a nice improvement anyways. And I was happy with the first silicon so this is a bonus :smiley:
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • jmgjmg Posts: 13,784
    Cluso99 wrote: »
    Don't know if this can be done...

    Can the clock tree be divided into 8 sections first, one for each cog? Then the relevant trees will be inactive when the cog(s) are inactive. This also applies to the hub access per cog.

    This may cut the clock tree power to almost 1/8th when only 1 cog is running.
    If so, at 20MHz it may shave 10mA, which translates to maybe 100mA at 200MHz.

    That would be ideal, but based on previous Icc and Chip's estimates here, a single COG will be roughly 1/3.6 of the old Icc - so yes, short of the ideal 1/8, but still better.
    Hopefully all 8 COGs running, are not much different in Peak ICC ?
  • Hopefully a flat out P2 doesn't consume the same as before plus the extra 161mA. I would expect a flat out P2 will still have sections not clocking when idle to make up for this.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99 wrote: »
    Hopefully a flat out P2 doesn't consume the same as before plus the extra 161mA. I would expect a flat out P2 will still have sections not clocking when idle to make up for this.

    It will certainly take LESS power than the current silicon.
  • Cluso99Cluso99 Posts: 15,233
    edited 2019-03-30 - 08:31:49
    I’m still hoping we can overclock to 300-360MHz.
    400MHz would be incredible ;)

    I’m sure they never expected 180nm would achieve these speeds when they first got to 180nm. Cannot believe they were only at 1.2um 30 years ago. And now they are discussing standards for 2nm!
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • LOL. It's funny that 180nm is almost a 20 year old process now! Size wise I guess it's closer to 1.2um than it is to current tech. Not dissin' the P2 tho.
  • Roy ElthamRoy Eltham Posts: 2,642
    edited 2019-03-30 - 09:48:14
    Pentium 4's from 2000-2002 were 180nm and ran up to 2Ghz internally (they had a 400Mhz external bus clock).
    Of course, those things required a lot of power, and really good active cooling.
  • Cluso99Cluso99 Posts: 15,233
    edited 2019-03-30 - 11:17:21
    @Roy
    Didn’t realise that. Wonder what the process differences were that achieved 2GHz internally and then why can’t we get that on P2? Imagine even a 1GHz P2....

    Postedit & OT
    A little research. Pentium III was the first x86 micro to use 180nm and ran at 500MHz. Later the Pentium 4 eventually reached 2GHz before going to 150nm. These internal speeds were reached using pipelines up to 20 stages.
    So, 1GHz or more isn’t going to happen for our P2 :(
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Probably not process differences so much as tight pipelining and highly optimized signal paths.
    Moving a signal across a chip costs time as the FET resistances and interconnect resistsnaces are high
    and even the femtofarad levels of stray capacitance at these size scales are an issue. You can select
    the gate geometry in each FET to tune is input capacitance and output resistance, and add inverters
    every so often in long signal paths to keep rise and fall times tight. You can also add more layers of
    interconnect (extra process steps) to improve signal routing and reduce signal resistances.

    And then there's the fact 180nm isn't the only parameter - using strained silicon, exotic dielectrics,
    exotic metalization, mean there's a lot more than one "180nm process".

    Lots of tricks are played in complex processors to increase the level of pipelining, like using redundant
    arithmetic techniques to shorten/remove carry chains. Anything to reduce the critical path delay,
    even if it means doubling the transistor count...

    The resources thrown into the pentium 4 design process must be many orders of magnitude higher
    than Parallax have to hand!
  • cgraceycgracey Posts: 11,511
    edited 2019-03-30 - 12:38:06
    Yeah, they minimize logic levels between flops to just several. The P2 has maybe up to 30 levels of gates between flops. That runs slower, but doesn't need deep pipelines (more flops, more MHz, ball of fire). Also, we are bumping into the speed limits of the big hub RAMs. As long as they are part of the design and we don't multi-cycle them, there's no point in designing the logic to clock any faster.
  • ..
    Roy Eltham wrote: »
    Pentium 4's from 2000-2002 were 180nm and ran up to 2Ghz internally (they had a 400Mhz external bus clock).
    Of course, those things required a lot of power, and really good active cooling.

    I did not know that! I thought we were pushing this process way hard. We might just be in the pocket, sweet.

    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • jmgjmg Posts: 13,784
    cgracey wrote: »
    Yeah, they minimize logic levels between flops to just several. The P2 has maybe up to 30 levels of gates between flops. That runs slower, but doesn't need deep pipelines (more flops, more MHz, ball of fire). Also, we are bumping into the speed limits of the big hub RAMs. As long as they are part of the design and we don't multi-cycle them, there's no point in designing the logic to clock any faster.

    How much skew is there between large RAM and the core speed ?
    ie how much gain could a COG only system deliver ?
    There could be ways to nudge the core speed, but match slower HUB in a P3 ? (or P2++)
  • jmg wrote: »
    There could be ways to nudge the core speed, but match slower HUB in a P3 ? (or P2++)

    Even the P1 already has something like this, the hub runs at half the speed of the cogs.
    Altough wether speeding up the cogs without speeding up the hub really results in a tangible performance increase depends on the application (i.e. anything using hubexec and/or lots of non-FIFO memory acccess wouldn't feel a difference)
  • cgracey wrote: »
    Tapout is going on right now ...
    Then there will be a second chance to get a prototype PCB board this summer, or they will be just few samples for your internal testing?

  • Ramon wrote: »
    cgracey wrote: »
    Tapout is going on right now ...
    Then there will be a second chance to get a prototype PCB board this summer, or they will be just few samples for your internal testing?

    You're asking about the Propeller 2 Evaluation Board, Revision B. Yes, we're planning on 200+ units being made the minute the chips arrive at Parallax. It's already been designed and we've got the components on order.

    Ken Gracey
  • localrogerlocalroger Posts: 3,241
    edited 2019-03-31 - 03:01:31
    cgracey wrote: »
    This clock gating was slipped into the design. It was not done with prior intent. We let the tools do it as they saw fit. Tapout is going on right now, so it's too late to do anything else.

    As long as the P2 doesn't start expressing an interest in some guy named John Connor we should be golden.
  • As long as the P2 doesn't start expressing an interest in some guy named John Connor we should be golden.

    and on (insert date here) the Propeller 2 became self-aware.....

    Melbourne, Australia
  • The P2 won't kill me. But I might die trying.

    Ken and Chip,

    Thanks


  • rjo__ wrote: »
    The P2 won't kill me. But I might die trying.

    Ken and Chip,

    Thanks

    Thank you and you're welcome. Chip used his brain and I used my mouth and brute force business skills. I don't know who's more ready to see this launched.

    We will see you in Ottawa at a future Propeller Expo, I'm sure.

    Ken Gracey
  • jmgjmg Posts: 13,784
    Ken Gracey wrote: »
    Ramon wrote: »
    cgracey wrote: »
    Tapout is going on right now ...
    Then there will be a second chance to get a prototype PCB board this summer, or they will be just few samples for your internal testing?

    You're asking about the Propeller 2 Evaluation Board, Revision B. Yes, we're planning on 200+ units being made the minute the chips arrive at Parallax. It's already been designed and we've got the components on order.

    Ken Gracey

    Is there a final change list for Rev B PCB ?
    Did it gain more footprint choices for oscillators ? I think SD moved to reduce tracks, which would give more room for OSC choices ?
    Did the SMPS blocks shrink to something smaller than P2 itself ?
  • Ken and Chip,
    I’ve gone grey waiting for the P2.
    Now I’ve gone white playing with it ;)

    You never can please everyone, but this P2 is surely a crowd pleaser! Cannot believe all the features in this. I’m learning new tricks every day. And they are not just tricks, but real performance enhancers too.

    Just hope there are some real volume apps for this. You guys really deserve it!!!

    And I have to say it’s been a real eye opener to see what is involved in the design of a chip. Thanks immensely for allowing us to be involved and sharing all those details. No other company has ever shared this process with its’ users, and been so open about it.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • jmg wrote: »

    Is there a final change list for Rev B PCB ?
    Did it gain more footprint choices for oscillators ? I think SD moved to reduce tracks, which would give more room for OSC choices ?
    Did the SMPS blocks shrink to something smaller than P2 itself ?

    sd moved - yes
    osc choices - no
    smps - yes

    schematic and imagery will be shared, but due to holidays maybe a couple weeks from now.

    osc choice footprint request noted. *if* we could drop in 1 extra footprint (or hybrid footprint), what would be preferred?
  • evanhevanh Posts: 7,522
    edited 2019-03-31 - 08:27:51
    I assume JMG is talking about using different frequency crystals. This will be coming from the instability of using XDIV values greater than two.

    However, since this is meant to be fixed in the prop2 respin, I doubt anyone will be going to the effort of changing out the crystal when it's so easy to program the desired frequency without negative side effects.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • jmgjmg Posts: 13,784
    VonSzarvas wrote: »
    osc choice footprint request noted. *if* we could drop in 1 extra footprint (or hybrid footprint), what would be preferred?

    A picture is worth 100 words, so attached is a flexible layout I did for a Si5351, that supports
    * Std Xtal footprint (right hand side)
    * Clipped Sine Oscillator, (left hand, 3 sizes,manual solderable) that uses 2 xtal pads as capacitor coupling (select XTAL 0pF mode)
    * CMOS Oscillator that uses resistor or capacitor coupling
    * Choice of Clipped Sine/Osc sizes, from 2.5 x 2.0mm up to 5.0 x 3.2 - the 5.0 x 3.2 is there to support Murata 200ppb XTCHH VCTCXO oscillators, now stocked at DIgikey

    Pin 1 of OSC here connects to a DAC, with optional pullup, to allow VCTCXOs control - for P2, maybe a jumper to a pin makes sense ?
    An external 1pps can then calibrate the oscillator, and GPS also reports the (Satellite-TCXO) error in an ASCII string, so GPS(Serial+1pps)+SW in P2 should give low ppb timing standards.

    The centre footprint is an optional AC coupling to a local MCU, not really needed on P2

    I have tested Clipped sine Oscillators of 26MHz and 38.4MHz with P2, (without removing existing Xtal, so the CAP of that will degrade things a little).
    26MHz is fine, and 38.4MHz also works, but has measurably lower amplifier out amplitude, as the Xtal amplifier is rolling off more at 38.4MHz
    Source impedance of these is a little under 200 ohms, and they need AC coupling to a XTAL stage. (so are quite simple to use)
    High volumes (GPS) make the clipped sine units quite low cost, and high precision, as well as low power and low RFI.


    725 x 259 - 60K
  • Ken Gracey wrote: »
    You're asking about the Propeller 2 Evaluation Board, Revision B. Yes, we're planning on 200+ units being made the minute the chips arrive at Parallax. It's already been designed and we've got the components on order.

    Yes. That's great !! Thanks.

  • Ken Gracey wrote: »
    rjo__ wrote: »
    The P2 won't kill me. But I might die trying.

    Ken and Chip,

    Thanks

    Thank you and you're welcome. Chip used his brain and I used my mouth and brute force business skills. I don't know who's more ready to see this launched.

    We will see you in Ottawa at a future Propeller Expo, I'm sure.

    Ken Gracey

    Is the date for the Ottawa Propeller Expo set?
    In science there is no authority. There is only experiment.
    Life is unpredictable. Eat dessert first.
  • cgraceycgracey Posts: 11,511
    edited 2019-04-02 - 17:21:43
    Wendy sent me the power reports for the download test (1 cog at 20MHz) and the power torture test, where all 8 cogs are executing from hub and all 64 smart pins are running PWM at a clock frequency of 182MHz:

    Here is the download power test. This is how much power will be dissipated when the chip is waiting for a download or actively downloading, where one cog is active. The current silicon dissipates 136mW, while the new silicon will dissipate 75mW:
    ****************************************
    Report : Averaged Power
    Design : CHIP
    Version: M-2016.12-SP1
    Date   : Fri Mar 29 12:07:38 2019
    ****************************************
    
                            Internal  Switching  Leakage    Total
    Power Group             Power     Power      Power      Power   (     %)  Attrs
    --------------------------------------------------------------------------------
    clock_network              0.0292    0.0154 3.322e-06    0.0446 (59.10%)  i
    register                3.302e-03 1.622e-04 2.591e-05 3.490e-03 ( 4.63%)  
    combinational           7.393e-03    0.0132 7.511e-05    0.0207 (27.44%)  
    sequential              3.650e-04 3.531e-06 1.264e-06 3.698e-04 ( 0.49%)  
    memory                  1.819e-03 1.972e-03 1.950e-04 3.986e-03 ( 5.28%)  
    io_pad                     0.0000 2.225e-03 8.824e-05 2.313e-03 ( 3.07%)  
    black_box                  0.0000    0.0000 9.160e-14 9.160e-14 ( 0.00%)  
    
      Net Switching Power  =    0.0330   (43.72%)
      Cell Internal Power  =    0.0421   (55.77%)
      Cell Leakage Power   = 3.889e-04   ( 0.52%)
                             ---------
    Total Power            =    0.0754  (100.00%)
    

    And here's the power torture test. The old silicon dissipated 1.2W, while the new silicon will dissipate 790mW:
    ****************************************
    Report : Averaged Power
    Design : CHIP
    Version: M-2016.12-SP1
    Date   : Thu Mar 28 12:17:02 2019
    ****************************************
    
                            Internal  Switching  Leakage    Total
    Power Group             Power     Power      Power      Power   (     %)  Attrs
    --------------------------------------------------------------------------------
    clock_network              0.3084    0.1488 3.312e-06    0.4572 (57.93%)  i
    register                8.300e-03 3.107e-03 2.573e-05    0.0114 ( 1.45%)  
    combinational              0.0361    0.0769 7.545e-05    0.1131 (14.33%)  
    sequential              4.566e-04 2.212e-05 1.270e-06 4.800e-04 ( 0.06%)  
    memory                     0.1056 1.972e-03 1.950e-04    0.1078 (13.65%)  
    io_pad                     0.0000    0.0992 8.938e-05    0.0993 (12.58%)  
    black_box                  0.0000    0.0000 9.151e-14 9.151e-14 ( 0.00%)  
    
      Net Switching Power  =    0.3301   (41.82%)
      Cell Internal Power  =    0.4589   (58.13%)
      Cell Leakage Power   = 3.902e-04   ( 0.05%)
                             ---------
    Total Power            =    0.7894  (100.00%)
    

    Note that these power figures are for CORE logic and memory, and do not include the 3.3V I/O pads.
Sign In or Register to comment.