New P2 Silicon Observations

jmg · 2018-10-05 21:18

cgracey wrote: »

That 280mA at 5V (before switching regulator) gives you 720 MIPS of 32-bit operations. It's got to cost something, in terms of power.

Running off a 20MHz crystal at 1MHz (using the PLL) takes 3mA at 1.8V. That's for 8 MIPS.

The RCSLOW mode runs at 20.6KHz and takes 100uA at 1.8V.

Quiescent current is only 34uA at 1.8V. You can park the P2 there, until reset.

With the PLL granularity, you have many choices (tho a larger post-divider would have been nice, to cover that 20k~1M area)

The Cpd corner frequency (where Dynamic Icc = Iq is ~11.7kHz, so the RCSLOW adds ~63uA, and an 8.192kHz external clock(1.5uA) would add 25uA

There are low-cost 32.768kHz oscillators in large numbers of variants, and some suppliers offer /2^N variants of those.

Note that anything faster than RCSLOW/2 is going to be unable to clock a Smart Pin cell, in RCSLOW mode. (to keep precise time)
That said, you could use the ExtOSC as the low speed SysCLK, and use that to calibrate RCFAST+PLL, for reasonable clock precision. (2 pins connected)
It seems RCFAST is able to drive the VCO/PLL ok ?

Some choices in > 32k, < 1M, or better precision

Above 32kHz:
SIT1569AC-J3-33E-0393.216000Q	SiTIME	5,000 stk $1.43208/3k	393.21kHz   LVCMOS	3.3V	±50ppm	 -20°C ~ 70°C	10µA
SIT1569AC-J3-33E-0075.000000Q	SiTIME	5,000 stk $1.63296/3k	75kHz	    LVCMOS	3.3V	±50ppm	 -20°C ~ 70°C	4.6µA

Better precision :
NZ2520SB-32.768KHZ-RNA3045A	NDK OSC 32.768KHZ $0.77000/3k	32.768kHz	CMOS	1.8V	±5ppm	-40°C ~ 85°C	220µA
SIT1552AI-JE-DCC-32.768E	SiTIME	18,000 stk  $1.21523/3k TCXO	32.768kHz	LVCMOS	1.5 V ~ 3.63 V	±5ppm	-40°C ~ 85°C	1.52µA
KT3225T32768DAW33T	Kyocera  $3.12500/3k	TCXO	32.768kHz	Enable/Disable	CMOS	3.3V	±3.8ppm	-40°C ~ 85°C	5.5µA

P2 Idd at those frequencies : 
Fi = 32768 /2^10 = 32Hz            P2_Id = Cpd * Vcc * Fi = 99uA + Iq   
Fi=75k     /2^10 = 73.2421875Hz    P2_Id = Cpd * Vcc * Fi = 227uA + Iq  
Fi=393.21k /2^10 = 383.9941Hz      P2_Id = Cpd * Vcc * Fi = 1.189uA + Iq     

/2^10 = 74AHC1G4210 as 32k divider, allows 20.6kHz RCSLOW choice...

jmg · 2018-10-05 21:25

cgracey wrote: »

Question:

Would you guys like clock-gating if it meant a 10% reduction in Fmax?

What exactly does that mean, in the Idd tables ?

Does that mean each COG now gets its own clock-gated domain, which could have quite an impact on Routing ?
Would that 8x SysCLK domain include then local WAIT gating, so that zero cogs could have SysCLK enabled, while one waits on a timer/event ?]

That would look more like a P1, and push WAIT powers down to well under 1/8th of where they are now.

Are the RAMs gated too, so that No-access from any COG, means no-chip-select on RAMs ?

Rayman · 2018-10-05 21:36

One good thing about the current power draw situation is that one won't find themselves with power issues as they add code and cog to their projects...

Roy Eltham · 2018-10-05 21:56

Seems like we should just do the minimum changes that will fix the known issues.

potatohead · 2018-10-05 22:01

That is my thought Roy.

I hate the thought of being here again, close. Gating it is gonna introduce a new set of potentials. Not gating it means we've got the actuals and can deal with them, likely being done!

But... OnSemi may have confidence and reasons for it we don't too.

DiodeRed · 2018-10-05 22:38

Real excited seeing the P2 coming along. Definitely some rather interesting capabilities.

Regarding the crosstalk from DIR causing a glitch on OUT, I find myself wondering if the reverse can be caused, that is to say causing a glitch in DIR when OUT is flipped. Coupling between the signals should happen the both ways if the drive strength, capacitance, and and the sensitivity in the pad are similar.

If so, one might be able to cause a glitch by either:
A ) Have DIR set the pin in float state, and toggle OUT a few times, perhaps seeing a glitch manifest with the pin briefly coming out of float toward one of the rails.

B ) Have DIR set the pin in a driven state, and toggle OUT a few times, perhaps seeing a glitch manifest with the pin briefly going high impedance. Of course... a pin going high impedance at the exact moment of an output state transition isn't going to look like much. Maybe if you had a quite strong mid-voltage pull applied to the pin you could notice a very subtle little oddity with slew rate?

pedward · 2018-10-05 22:38

I'm curious what the power characteristics of the P1 are in comparison to the P2 beta?

Unless Chip is supremely confident in clock gating (or finds another reason why it needs to be done), I'd think it would be best to leave the gating alone, the P1 as the "low power" choice.

The P2 seems to be close to the same MIPS/Watt as the rPi series of devices, so it's not really much less efficient than accepted products.

rPi 3b+ @ 4 cores = 5.1W / 2800MIPS = 1.82W per KMIPS
Prop 2β @ 8 cores = 1.2W / 720MIPS = 1.71W per KMIPS
rPi 3 b @ 4 cores = 3.7W / 2440MIPS = 1.51W per KMIPS
rPi 2b @ 4 cores = 2.1W / 1822MIPS = 1.15W per KMIPS

Tubular · 2018-10-05 22:54

nice comparison. I hadn't really crunched the numbers to realise it was that close a race.

jmg · 2018-10-05 22:55

pedward wrote: »

I'm curious what the power characteristics of the P1 are in comparison to the P2 beta?

Short answer: Poles apart. (in range at least)

P2 has Cpd of 1.68nF + 80pF/COG

P1 has these modes in the data sheet :

Vcc=3.3
WAIT(CNT/PEQ/PNE) Id = 1m;Fi=100M Cpd = Id/(Vcc*Fi) = 3.03pF (that's quite good for a waiting-state)
Assembly Loop (JMP) Id = 6m;Fi=100M Cpd = Id/(Vcc*Fi) = 18.18pF
Spin Loop (REPEAT) Id = 13m;Fi=100M Cpd = Id/(Vcc*Fi) = 39.4pF

8 COGs of P1 is going to be ~315pF, in the highest slope mode, which is close to 1/6 a P2.

- note the currents above are at 3v3 for P1 and 1v8 for P2, so if we assume a SMPS to keep the energy down, that means
13mA*3.3V*8 = 343.2mW for P1, or 190mA at 1v8, adding a 10% SMPS loss, that's ~ 44MHz P2 MHz (~174mA,1v8). for equivalent power, all cogs on full.

Of course in idle modes, the P1 can drop way back in power.

DiodeRed · 2018-10-05 23:26

As far as comparing the P2 to other processors like above with the RPi thing... one thing I find amusing is comparing the Propeller 2 to a Pentium II.

- Both could be said to be a "P2"
- The late generation mobile Pentium II was also done on a 180nm process node (larger for earlier Pentium IIs)
- If you include Pentium II's L2 cache they have a similar ballpark transistor count
- Similar ballpark MIPS ratings
- Vaguely similar-ish ballpark clock speeds (sure, lower standard speed, though that overclocking test in the other thread go to a similar sort of clock frequencies)
- 1.6V VCore for the 180nm Pentium II (though higher for other models such as a 2.0V VCore for the 2nd Gen desktop part)

TonyB_ · 2018-10-06 00:07

DiodeRed wrote: »

As far as comparing the P2 to other processors like above with the RPi thing... one thing I find amusing is comparing the Propeller 2 to a Pentium II.

- Both could be said to be a "P2"
- The late generation mobile Pentium II was also done on a 180nm process node (larger for earlier Pentium IIs)
- If you include Pentium II's L2 cache they have a similar ballpark transistor count
- Similar ballpark MIPS ratings
- Vaguely similar-ish ballpark clock speeds (sure, lower standard speed, though that overclocking test in the other thread go to a similar sort of clock frequencies)
- 1.6V VCore for the 180nm Pentium II (though higher for other models such as a 2.0V VCore for the 2nd Gen desktop part)

Welcome DiodeRed - and the biggest difference is that Intel is outside the Propeller 2!

msrobots · 2018-10-06 00:23

I remember replacing a 386 board against a Pentium one on a Novell Netware Server (3.12?). Damn that thing was fast. 400 MHZ? I also remember the uptime on them Novell Server Screens. Uptimes of 2 years, 3 month, 5 days where nothing special. At that time in the last Millenia (90ish) I installed lots of Novell Server. The most painless networking ever. They just worked.

Maybe it is because I am getting old, maybe it is because the current update frenzy is definitely wasting tons of my time, but I really want my computers back. At home and at all my customer sites it is the same effing problem. Constantly newer versions of OS, Hardware, Software, I spend more time just fixing up already running systems as I do developing new products.

The only stable stuff in my career is COBOL, but I hope the P2 will be powerful enough to set up some stable environment, without the need of constant changes.

we will find out.

Mike

Cluso99 · 2018-10-06 00:30

msrobots wrote: »

I remember replacing a 386 board against a Pentium one on a Novell Netware Server (3.12?). Damn that thing was fast. 400 MHZ? I also remember the uptime on them Novell Server Screens. Uptimes of 2 years, 3 month, 5 days where nothing special. At that time in the last Millenia (90ish) I installed lots of Novell Server. The most painless networking ever. They just worked.

Maybe it is because I am getting old, maybe it is because the current update frenzy is definitely wasting tons of my time, but I really want my computers back. At home and at all my customer sites it is the same effing problem. Constantly newer versions of OS, Hardware, Software, I spend more time just fixing up already running systems as I do developing new products.

The only stable stuff in my career is COBOL, but I hope the P2 will be powerful enough to set up some stable environment, without the need of constant changes.

we will find out.

Mike

Cobol runs on the P1

And it will on P2 soon too

msrobots · 2018-10-06 01:17

Cluso99 wrote: »

msrobots wrote: »

I remember replacing a 386 board against a Pentium one on a Novell Netware Server (3.12?). Damn that thing was fast. 400 MHZ? I also remember the uptime on them Novell Server Screens. Uptimes of 2 years, 3 month, 5 days where nothing special. At that time in the last Millenia (90ish) I installed lots of Novell Server. The most painless networking ever. They just worked.

Maybe it is because I am getting old, maybe it is because the current update frenzy is definitely wasting tons of my time, but I really want my computers back. At home and at all my customer sites it is the same effing problem. Constantly newer versions of OS, Hardware, Software, I spend more time just fixing up already running systems as I do developing new products.

The only stable stuff in my career is COBOL, but I hope the P2 will be powerful enough to set up some stable environment, without the need of constant changes.

we will find out.

Mike

Cobol runs on the P1
And it will on P2 soon too

I know @Cluso99, I tried to buy some working CP/M system from you but, alas you did never answered the PM. I even found COBOL89 for Z80-CP/M.

But that long running thread somehow lost me at the time external memory was used. But maybe someone will build a Z80 emulator for the P2 and CP/M will be possible again. Actually even MP/M with bank switching and networking...

Enjoy!

Mike

rogloh · 2018-10-06 01:29

Right now after looking at the data Chip posted earlier in this thread we sort of have the situation (very simplified) where the 1.8V current drawn follows something like this equation:

Idd(@Vcore) = Anf + Bf + nC + L

where n is the number of "active" COGs doing useful work, f is the frequency of the P2, and A,B,C,L are some coefficients that depend on the chip design and fab process etc.

(L) is the fixed or quiescent current and relative to the other components is small (so I will ignore it), similarly I am assuming there cannot really be some significant static (frequency independent) current consumed for each active COG vs an inactive COG at the core voltage (i.e. treat C=0).

(A) and (B) are some frequency scaling coefficients which seem to work out approximately to be about 0.14 mA/MHz/COG and 3 mA/MHz respectively when taken over the voltage and frequency range he tested from 20-220MHz.

So right now it looks like B ~= 22 x A, meaning this B component of the current basically dwarfs the current ever drawn by a COG. In comparison an active COG will only add about 4.5% of this B current, and so shutting off a COG can only save you 4.5% (at best) instead of ideally 12.5% in an 8 COG setup.

I think the quiescent and the full power P2 current is fine. But for me it would be better to have these A and B constants much closer together in the equation above, so the inactive COGs (and also COGs stopped by waiting on PINs/counters/events) can reduce the core current appreciably. I guess that requires clock gating. Whether that is too risky to attempt or is determined to reduce the achievable frequency too much I imagine will be up to Chip.

jmg · 2018-10-06 04:13

cgracey wrote: »

The RCSLOW mode runs at 20.6KHz and takes 100uA at 1.8V.

Quiescent current is only 34uA at 1.8V. You can park the P2 there, until reset.

How does RCSLOW vary with temperature ?
That may be predictable enough to use as a die sense, but measuring it could be tricky ?
Other MCUs have RCSLOW feed a WDT or WUT interrupt, so they can sleep/wake/measure their low power osc.

evanh · 2018-10-06 04:46

I gather the hubram suggestion has been discounted then? As in turning off whole cogs will give a big power saving. Groups of smartpins and even the cordic would also be candidates. Certainly, shutting down parts of hubram doesn't sound viable to me.

I'm half of the mind of saying leave the power reduction thinking for a future smaller model Prop2.

KeithE · 2018-10-06 05:20

>Idd(@Vcore) = Anf + Bf + nC + L
>this B component of the current basically dwarfs the current ever drawn by a COG

I guess that Chip could comment on the structure of the clock tree and how that relates to this Bf term.

Is hub RAM never clock gated regardless of n?

Dynamic power simulations should reflect this if they were done.

evanh · 2018-10-06 05:25

KeithE wrote: »

Is hub RAM never clock gated regardless of n?

Nothing is gated. Adding any clock gating would be a whole new feature with a collection of config to test out.

cgracey · 2018-10-06 06:07

Wendy reviewed some simulation the other day and saw that, indeed, the enables on the memories were toggling, not stuck 'on'.

Since we pushed the Fmax really high, the tools did not insert clock gating on many flops. Instead, they just used data enables on most of the flops, with the clock always toggling.

By lowering the Fmax, the tool would be able to insert real clock gates on most flops.

There should be no risk in doing this, as logical equivalency is maintained.

Part of me is happy with the way things are, though, as we have the highest performance, just not very flexible power consumption.

evanh · 2018-10-06 06:18

cgracey wrote: »

There should be no risk in doing this, as logical equivalency is maintained.

How does that work?

evanh · 2018-10-06 06:19

I thought it would be some sort of block On/Off switching.

K2 · 2018-10-06 06:20

cgracey wrote: »

Part of me is happy with the way things are, though, as we have the highest performance, just not very flexible power consumption.

That describes me...happy with the way things are. But if it's decided that reduced power consumption is more important, I'll be sure to stockpile a few v.1 chips for those times when nothing else will do.

evanh · 2018-10-06 06:29

Hmm, well, that certainly changes my view of the risks. I'm thinking it's worth doing now. Just to compare the real impact. It seems like there is plenty of headroom in max frequency.

Cluso99 · 2018-10-06 06:29

cgracey wrote: »

Wendy reviewed some simulation the other day and saw that, indeed, the enables on the memories were toggling, not stuck 'on'.

Since we pushed the Fmax really high, the tools did not insert clock gating on many flops. Instead, they just used data enables on most of the flops, with the clock always toggling.

By lowering the Fmax, the tool would be able to insert real clock gates on most flops.

There should be no risk in doing this, as logical equivalency is maintained.

Part of me is happy with the way things are, though, as we have the highest performance, just not very flexible power consumption.

Yep, it's a dilema. At least it's a nice one to have

If I had a crystal ball, and knew there would be a P2 Family, then perhaps I'd opt to have...

1. A smaller P2, 4 cogs, 128KB, 32 I/O slower and less power, plus perhaps 4x P1 style cogs added

2. Current P2, fastest and more power

cgracey · 2018-10-06 06:32

evanh wrote: »

I thought it would be some sort of block On/Off switching.

We could involve special clock-gating IP at the Verilog source level, but with relaxed enough timing, the tool will implement some clock gating on its own.

Cluso99 · 2018-10-06 06:34

Chip,
Is Wendy really sure that the hubs are not active if there is no hub access going on ???

If the P2 is powered on and nothing happens, then the ROM should shut down the P2. In this case, hub shouldn't be toggling.

rogloh · 2018-10-06 06:35

cgracey wrote: »

Wendy reviewed some simulation the other day and saw that, indeed, the enables on the memories were toggling, not stuck 'on'.

Since we pushed the Fmax really high, the tools did not insert clock gating on many flops. Instead, they just used data enables on most of the flops, with the clock always toggling.

By lowering the Fmax, the tool would be able to insert real clock gates on most flops.

There should be no risk in doing this, as logical equivalency is maintained.

Part of me is happy with the way things are, though, as we have the highest performance, just not very flexible power consumption.

If a COG is stopped or waiting on some condition or event, yet its flops are still being clocked, does that cause a lot of its flop states to continue to somehow transition and draw some power? I'm wondering where all this power is going. Is it just the long clock net itself and all its capacitance, or does it involve some other things?

Also with the egg-beater model does a stopped COG still somehow continue to do any Cog or hub RAM memory accesses in the background adding to the power draw, or would it cause any other hub related activity to take place?

cgracey · 2018-10-06 06:39

Cluso99 wrote: »

Chip,
Is Wendy really sure that the hubs are not active if there is no hub access going on ???

If the P2 is powered on and nothing happens, then the ROM should shut down the P2. In this case, hub shouldn't be toggling.

At some point in the simulation, she saw the hub memory enables toggle, per our test code.

evanh · 2018-10-06 07:00

Here's a graph of Prop2 done to similar scale as in the Prop1 datasheet, and I've included that for comparison.

New P2 Silicon Observations

Comments