New P2 Silicon Observations
cgracey
Posts: 14,206
I've got the P2 running on my bench at home now and I'm going through it. Haven't figured out, yet, what's up with the ROM code and pull-up sensing, but I'm trying other stuff out.
Using a 20MHz crystal and the PLL, I'm running a power torture test at 180Mhz:
a) All 8 cogs are running in hubexec.
b) Each cog executes a 2-instruction loop which causes the FIFO to reload continuously.
c) Each cog issues a CORDIC command on each loop.
d) All 64 smart pins are doing PWM with a 16-clock frame, so the pins toggle every 8 clocks, on average.
e) VDD = 1.80V, VIO = 3.30V, F=180MHz
Results:
VDD current = 920mA
VIO current = 38mA
VDD power = 1.80V * 920mA = 1.66W
VIO power = 3.30V * 38mA = 0.125W
Total power = 1.79W
The chip is a little too hot to keep your finger on, but I think with a solid ground plane in a 4-layer PCB, it could be brought way down. As it is, the back of the PCB is only slightly less hot than the top of the chip, so the Amkor package and the via plane on Peter's 2-layer PCB are doing their job well.
We will design a really simple 4-layer board with internal ground/heat plane and high-current power regulators. This will be needed to discover the chip's real speed limits. My current setup uses a dual-output bench supply and some wires with grabbers. At this VDD current level, I'm dropping ~200mV just over the wires going to the board, so I need to output 2.00V to get 1.80V onto the board. Peter's board has regulators, potentially, but I'm just using it as a carrier board for the chip. This way, I can measure currents easily.
Here is the power torture test code:
Using a 20MHz crystal and the PLL, I'm running a power torture test at 180Mhz:
a) All 8 cogs are running in hubexec.
b) Each cog executes a 2-instruction loop which causes the FIFO to reload continuously.
c) Each cog issues a CORDIC command on each loop.
d) All 64 smart pins are doing PWM with a 16-clock frame, so the pins toggle every 8 clocks, on average.
e) VDD = 1.80V, VIO = 3.30V, F=180MHz
Results:
VDD current = 920mA
VIO current = 38mA
VDD power = 1.80V * 920mA = 1.66W
VIO power = 3.30V * 38mA = 0.125W
Total power = 1.79W
The chip is a little too hot to keep your finger on, but I think with a solid ground plane in a 4-layer PCB, it could be brought way down. As it is, the back of the PCB is only slightly less hot than the top of the chip, so the Amkor package and the via plane on Peter's 2-layer PCB are doing their job well.
We will design a really simple 4-layer board with internal ground/heat plane and high-current power regulators. This will be needed to discover the chip's real speed limits. My current setup uses a dual-output bench supply and some wires with grabbers. At this VDD current level, I'm dropping ~200mV just over the wires going to the board, so I need to output 2.00V to get 1.80V onto the board. Peter's board has regulators, potentially, but I'm just using it as a carrier board for the chip. This way, I can measure currents easily.
Here is the power torture test code:
dat orgh 0 ' ' Launch all cogs with test program. ' org hubset ##%1_000000_0000001000_1111_10_00 'enable crystal+PLL, stay in 20MHz+ mode waitx ##20_000_000/100 'wait ~10ms for crystal+PLL to stabilize hubset ##%1_000000_0000001000_1111_10_11 'now switch to PLL running at 180MHz .loop coginit cognum,#@pgm 'last iteration relaunches cog 0 djnf cognum,#.loop cognum long 7 ' ' Set 8 pins (per cog) to PWM mode, jump to hubexec loop ' org pgm cogid x 'which cog am I, 0..7? shl x,#3 rep @.p,#8 'start pwm pins wrpin #%01_01001_0,x wxpin pat,x wypin #1,x dirh x add x,#1 .p jmp #hubloop pat long $0010_0001 x res 1 ' ' Hubexec loop ' orgh $400 hubloop qrotate x,x jmp #hubloop
Comments
When watching live space missions I think this sounds like what they refer to as "nominal", meaning as-expected
Are you thinking linear regs? There were linear regs on those old smartpack FPGA boards you used to make
VDD current = 1200mA
VIO current = 48mA
VDD power = 1.80V * 1200mA = 2.16W
VIO power = 3.30V * 48mA = 0.158W
Total power = 2.32W
This is quite proportional to the 180MHz power. We are running 33% faster and taking 30% more power.
I don't know the integrity of execution, though. The smart pins are the only thing giving indication of function, but all paths were optimized up to the goal point, so the cogs should fail at the same frequency that the smart pins fail at.
I see anyone with dreams of writing an emulator for the P2, pretty much falling off their chair at this point !
:-O
...and what kind of USB performance could be achieved at 280MHz :-)
J
This all sounds very promising for having 200Mhz+ as an option, similar to how we often use the P1 at 96Mhz+ vs the spec'd 80Mhz.
I'm guessing more typical use cases will pull a fraction of that power and run cool.
That's well above what was modeled, correct?
I'm just wondering it it could run at 280MHz (with an appropriate crystal) then be dynamically switched to RCFAST as an "alive but in Sleep mode" ?
j
Sure, but 280MHz will be very iffy. Just getting a part with slow process corner characteristics would likely dash that expectation. To be safe, stick with the 180MHz. Maybe 200MHz is reasonable for indoor environments.
Regardless of the MHz, could RCFAST be used as a dynamic low power mode?
j
I saw Total power = 1.79W and thought P2 Hot part duce, but a 4 layer should take care of that. Even the 240MHz wattage is half the P2 Hot. This is going in the right direction! Searching for good flights to Rocklin in April.
You may not even need to go to 28nm for a substantial speed improvement!
Check this out! 90nm challenging the performance of 7nm!
https://spectrum.ieee.org/nanoclast/semiconductors/processors/the-foundry-at-the-heart-of-darpas-plan-to-let-old-fabs-beat-new-ones
j
Margins are sounding ok, in this simplest of tests.
You mean smart pins run to 280MHz, when doing a /16 PWM ?
The rest of the code is not really doing any integrity verification, before everyone mentally locks in 250+ MHz, you should run some memory write/read verifies, and some Mul/Div verifies, & cordic result verifies.
Yes, the PLL has reasonable M/N control, so you can scale SysCLK to whatever the PLL manages.
It might be there are some opcodes that are 'Canary opcodes', that can be used to check as 'too fast' indicators, in which case you could scale SysCLK and check you were ok.
Such a design would probably also need an external watchdog part, that stored the last MHz and reset the P2, should it ever freeze stone dead.
Interesting bench testing stuff, but I'm not sure you would deploy in the field running machines !!
Still 12MHz ? - as that's the USB clock speed.
The next step of 480MHz is well outside P2 ability, but maybe someone will connect a HS-USB PHY to a P2 one day ?
It would be quite interesting to load the USB code, and ramp the SysCLK, and see how clk tolerant the USB code is. (FPGA tested at 80MHz)
There is a XCL220 switching regulator footprint on the P2D2, but that is a 1A part, with a >1.3A current limit, so is looking marginal for push-testing.
You might want to select a high efficiency, 3~6A region switcher (with PowerGOOD) + conservative inductor, so the power loss it adds is minimal.
Now that synthesized layout is locked in, there's likely to be a fastest and slowest cog
So perhaps you could have a 'canary cog' as well as a canary instruction
Maybe not 480MHz - but the thermal loading will be less with 1 COG, so you could expect to gain a few extra %, up the temperature curve.
I don't think there is any means to check die temperature in P2 ?
Inference from RCSLOW/RCFAST tempcos was talked about, but IIRC there is no capture path on those ?
MUL and DIV could also be some of the 'warmest' opcodes, as they likely have the most active silicon ?
All paths are optimized only to the goal point. This means a huge chunk of paths are splat against the timing wall, and those paths are certainly spread among all cogs. So, there won't be a cog that's faster than another. Prop1 was different, since it was laid out by hand and there were effective timing groupings.