List of Changes in Next P2 Silicon

13»

Comments

  • jmg wrote: »
    Volume users need guaranteed specifications to work to.

    160 MHz.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • jmg wrote: »
    That was always a risk of the new added logic.... :(
    Chip then went on to say how it had impacted the P2ES synthesis as well. Only by doing the compare have they realised that spec'ing for 180 MHz was pushing too hard.

    I'm really keen to see how much this reduces heating. Which in turn makes for easy over-clocking without the need for active cooling. That would be a neat outcome.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • jmgjmg Posts: 13,263
    evanh wrote: »
    jmg wrote: »
    Volume users need guaranteed specifications to work to.

    160 MHz.

    Yes, that is one point on the TVP curve.
    Cluso99 was asking for another point (as other vendors often specify)
    In his case, it was
    When done, is it possible to ask what a lesser spec would give as fmax?
    Say 0C to +70C at 1.8V +- 2.5%
  • You might get 5 MHz more that way - Which will be rounded back down to 160 MHz.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • jmgjmg Posts: 13,263
    evanh wrote: »
    You might get 5 MHz more that way - Which will be rounded back down to 160 MHz.

    That's one guess. (ie 3.125%)

    Another litmus test is to look at Atmel, who already do this on their parts...
    Their Commercial is 0~70 and 5% Vcc and Industrial is 0~85 and 10% Vcc
    They state that 10ns 'I' grade is ~ 7ns 'C' grade, so that's a 30% gain for 15'C drop in TMax and halve of Vcc spread.
  • Slap whatever you like on.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanh wrote: »
    You might get 5 MHz more that way - Which will be rounded back down to 160 MHz.

    A specification is a specification, not a guess!
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • It wasn't said as a guess. It is a statement of you get the same result.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Wendy reminded me yesterday that timing closure for the new P2 is set from -55C to +150C junction temperature. Our package Tja is ~20C/W. The anticipated power dissipation was ~2.25W. This would result in a ~45C (20×2.25) rise in junction temperature over ambient temperature, which affords us a -55C to +85C packaged temperature range with ~20C (150-45-85) allowance for local hot spots on the die.

    After the tapeout is complete, we will be able to generate a graph of temperature vs Fmax.

    Since power is going to be lower than originally planned for, we should be able to get a higher stated Fmax than before. Also, if the customer constrains ambient temperature to, say, 70C, that will allow for an even higher stated Fmax.

    We will have simulation data soon that will indicate what this curve will look like.
  • Sounds great Chip. Looks like that'll answer all the questions on power.

    Apologies to Cluso,
    There wasn't any good reason for me to jump on you for this question. It wasn't my problem to solve. It's just question seemed to be asking for extra attempts at miniscule variations of parameters and that struck a nerve in me.

    Hopefully I've had enough trolling for the moment. :)
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Excellent news Chip :smiley:

    It will be great to see what the fmax will be with lower spec requirements like say 70C and tighter 1V8.

    And then for us to run the silicon and see what we can push it to before it breaks :wink:

    No worries Evan. It's all good.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Wendy just told me that she was able to wrap up timing across all corners at 175MHz, after all.

    She is going to send me a final report that has more data in it. I'm curious to know what this did to the instance count.
  • jmgjmg Posts: 13,263
    cgracey wrote: »
    Wendy just told me that she was able to wrap up timing across all corners at 175MHz, after all.

    She is going to send me a final report that has more data in it. I'm curious to know what this did to the instance count.

    Sounds great!!
    Seems to always best 'to approach routing problems from below'..
    If the instance count improves speed, it does not matter so much, until it starts to not fit into the die.
  • But does it really improve speed in any significant way? We're only talking a few MHz in the parameters. I'd rather get the speed without needing active cooling.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • jmgjmg Posts: 13,263
    evanh wrote: »
    But does it really improve speed in any significant way? We're only talking a few MHz in the parameters. I'd rather get the speed without needing active cooling.

    Keep in mind that better drive improves the slews of the driven lines, and that reduces the Icc effect from the transition current peak.
    So whilst you have more devices, you also have better slew, which means you would need to dig deep into the spice results to decide which of those effects dominates.
    The real test will be to compare the mA/MHz Cpd figures for P2es and P2+
    P2+ should be well ahead of P2es, in all cases except where all 8 COGS are fully operational.
    Even with 8 COGS running, P2+ may have clock gate savings on the smart pin cells too, depends how far OnSemi went.
  • The real test is how many extra gates are used to get so little extra MHz.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • I sadly have very less time for my P2-es, but compared to the P1 this thing runs fast. I have no fan installed yet so I stay at 180 or such and id does not even get warm running all COGs. But still it is at least 4 times faster as the P1 and those smart pins save a lot of code.

    I can't wait to get my hands on more then one...

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • jmgjmg Posts: 13,263
    evanh wrote: »
    The real test is how many extra gates are used to get so little extra MHz.

    Not really, you or I or other users do care about mA/MHz specs, not about how many gates are inside the package.
  • Lol, we're saying the same thing JMG.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • I did a little test with some cogs driving constant cordic rotate ops, so each cog does a sine and cosine
    every 8 clocks, ie a 40Mflops equivalent at 160MHz, counting sine and cosine separately :)

    At 160MHz 1 cog took 0.26A, 7 cogs took 0.38A (at 5V), so 20mA per cog for 40Mflops, or put another
    way 2Gflops/amp (Ignoring the constant 0.24A drain).

    [ I'm treating flop to mean "fixpoint operations" of course ]

    What this is at the 1.8V rail I don't know, I guess about 700Mflops/A, perhaps 400Mflop/W is a better
    way to state it.

    In practice its not possible to use both sine/cosine results driven every 8 clocks, but its a interesting
    performance figure as it relates actual computation rather than just clock frequency. A more
    plausible practice figure is 200Mflop/W. [My unrolled loop code for FFT calculation roughly agrees
    with this.]

    The comparable value for integer ops is presumably about twice this as 4 instructions every 8 clocks is
    typical, 400MIPS/W

    So assuming the high constant power drain is fixed in the next silicon, it feels pretty competitive
    https://en.wikipedia.org/wiki/Performance_per_watt#Examples

    (and assuming fixed point is adequate for your application!)
  • jmgjmg Posts: 13,263
    Mark_T wrote: »
    ..
    At 160MHz 1 cog took 0.26A, 7 cogs took 0.38A (at 5V), so 20mA per cog for 40Mflops, or put another
    way 2Gflops/amp (Ignoring the constant 0.24A drain).
    ..
    So assuming the high constant power drain is fixed in the next silicon, it feels pretty competitive
    You can't quite 'Ignore the constant 0.24A drain', as that is not a static load, but comes from the clock tree.
    What you can expect is that 1 COG is closer to 1/7 that of 7 cogs, and 7 COGs will be lower than 8 COGS, but the peak mA/MHz for everything running, will likely not change much.

  • Yes, I guess I knew that, so perhaps half the performance I'd like to imagine once this is factored in, but still.
  • I read ignoring as putting it aside. Ie: Leaving it out for purpose of demonstrating a linearity.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
Sign In or Register to comment.