We're looking at 5 Watts in a BGA!

potatohead · 2014-04-02 22:36

Now, each time there is a push/pop/call/ret, all of these flops toggle. If the LIFO used a SRAM and a pointer, no doubt a lot less registered bits toggle. Could this result in a reasonable power saving ???

Remember the earlier design was somewhere near 2 watts. Things like this are going to bend the curve some, but not seriously change it. That's obvious from the info put here and some of our earlier data points in terms of the number of flops, etc...

We might have to do that stuff, but we first need to understand whether or not the design, overall, actually makes enough things possible to fund continuing this whole affair. We can't really engineer that away without surrendering a ton of time, and worse, we will have done it with no hard goal, other than "optimize everything" which is a crappy goal.

jmg · 2014-04-02 22:45

Cluso99 wrote: »

Yesterday you told us it scales linearly uW/MHz.
Now you are using a factor of 3.7.
Something is not right here.

It is all in
Pt = SumOf(Cpd * Ft * Vc^2
That does scale at uW/Mhz at a fixed Vcc
- but you do not need to fix Vcc, if you drop the Mhz, Drop Vcc and Icc also drops, and so you can save (significant) power.

Atmel figures Clock corners vs Vcc
* < 2 MHz @ >1.7V
* < 4 MHz @ >1.8V
* < 10 MHz @ >2.7V
* < 16 MHz @ >4.5V

Tubular · 2014-04-02 23:17

Just for kicks, and data points, I loaded and measured the current on the DE0-Nano running OzPropDev's Invaders this arvo.

Anyone want to hazard a guess what kind of power that consumes, running in a 60nm, 80MHz Cyclone IV like it does? 1 cog, remember...

potatohead · 2014-04-02 23:22

Do tell!

(and I just want the data point)

Tubular · 2014-04-02 23:25

Oh no you have to guess... even if I have to lock you in a room and only let you out once the guess is "done" with.

Yes I'm prepared for you to hate me for not just giving you the data

ozpropdev · 2014-04-02 23:40

I've been running my Nano + DE2 flat out for a week now chasing a bug, so I can't cheat and do Tubulars test.
My guess is >1W.

potatohead · 2014-04-02 23:54

2.5W

No worries man! It's a good exercise.

Heater. · 2014-04-03 01:04

Off topic stuff...

@Clock Loop,

I thought the Raspberry Pi Foundations retro theme April Fools day joke was great. Pretty much every one commenting there loved it to. Playing along by posting ASCII art and such.

Don't forget they were also switching over to a new web site design at the same time so it's quite likely links would have gone missing anyway. Remember when that happened during the Parallax Forum upgrade? Took quite a while to get everything back in order.

Shame you were inconvenienced a bit. Perhaps it's a little reminder that the Pi Foundation is a charitable organization and does not have to behave like a corporate supplier. Such suppliers can mess you around just as badly anyway.

@rjo__

...so that programming a dual P2 would be almost as easy as programming a single P2. These things should be there anyway it just makes too much sense.

A desirable goal. Perhaps not as easy as it sounds. XMOS have done it. You can program a network of xcores by connecting to just one of the devices. When you write your code the "multiple-chips" is seamless, it's just a single program. How many Prop users are going to want to build a mult-prop system though?

Heater. · 2014-04-03 01:08

@everyone.

A year or so ago the P2 as it existed then went to a shuttle run which sadly failed. Whilst waiting on a slot for the next shuttle run the opportunity was taken to do some "tweaks" to the design.

Many months later we now have the "P3", a totally different device, and it seems to be a bit power hungry.

My questions:

1) What was the power consumption of that original P2 projected to be? I assume it was taken for granted that it was reasonable as nobody ever talked about it much.

2) WTF happened in the mean time? How come the power consumption is so much more than than expected of that old P2 design? Is it simply due to having more transistors (static power, leakage) or having a faster chip (dynamic, switching power) or due to a process change, geometry shrink?

I don't know how we can confidently suggest solutions or express preferences among solutions without knowing more about exactly where the problems lie.

Personally I think peddling backwards to a P1b is not attractive. The PII has so much more going for it. And it is yet another design and yet another year of development time.

I'm in Bill Henning's camp here. "Publish and be damned". I'm sure Bill is right that in any normal application power consumption will not be a worry. There is no reason the same device can't be characterized for use at different frequencies.

The only thing that worries me about Bill's approach is that frequency and COG usage is under software control. Potentially misprogramming my P2 collapses my power supply or fries my board !

There is one thing I DON'T WANT TO SEE: Since this news came out we have already had a dozen suggested changes to the P2 architecture. No more changes please! Not unless they are really, show stoppingly, necessary.

RossH · 2014-04-03 01:32

Heater. wrote: »

The only thing that worries me about Bill's approach is that frequency and COG usage is under software control. Potentially misprogramming my P2 collapses my power supply or fries my board !

There is one thing I DON'T WANT TO SEE: Since this news came out we have already had a dozen suggested changes to the P2 architecture. No more changes please! Not unless they are really, show stoppingly, necessary.

Exactly. Can you imagine the press Parallax would get when word got around that you could fry your entire system by accidentally running your software too fast!

Not to mention that the P2 could never be used in a commercial application without a 5W power supply, heatsinks and an adequately ventilated case.

Parallax might survive such a debacle, but this would destroy the P2 in the marketplace.

Ross.

Heater. · 2014-04-03 01:40

RossH,

I guess there are ways to enforce the frequency/power/cog characterization so that software can't cause a fire storm.

1) Whole COGs could be disabled, same chip just shipped as a 4 COG device for example.
2) Frequency multipliers could be limited.

In this way designs could be made bullet proof. Perhaps those fuse bits could be used to characterise devices.

ozpropdev · 2014-04-03 01:52

Having different speed P2's opens up a new dilemma with OBEX as well.
I think one model is best.

Heater. · 2014-04-03 01:55

Good point ozpropdev.

RossH · 2014-04-03 01:57

ozpropdev wrote: »

I think one model is best.

One model to rule them all, One model to find them
One model to bring them all, and in the shuttle-run bind them
In the land of Parallax, where Despondency lies.

jmg · 2014-04-03 02:17

Heater. wrote: »

The only thing that worries me about Bill's approach is that frequency and COG usage is under software control. Potentially misprogramming my P2 collapses my power supply or fries my board !

I'm not quite following the logic here : Even without power control features, the PLL is already under Software control, and if that runs above the legal Core speed, miss-clocking can result.
I have suggested some ways to confirm before switch over, (which I think Chip is doing), but you will always have ways to have programming go into invalid operating areas.

Likewise, if someone adds processor control of Vcore, they can save significant power - should they not do that, just in case they might draw more power sometimes ?

I think the P2 will need COG temperature measurement, but again, software will run to check that, and 'misprogramming' covers a very wide range. You cannot exclude that, just because it might be miss-read.

A window watchdog can catch a glitch-disturbed device, and do so well before any thermal issues arise. Reset is then in a safe sequence.
There will be supply voltages ranges where the P2 is always within the thermal envelope, but those will still have PLL ceilings.
So you can avoid 'fries my board' fears, but you cannot eliminate user error.

Heater. · 2014-04-03 03:38

jmg,

...you will always have ways to have programming go into invalid operating areas.

Ha, my code spends most of it's life in "invalid operating areas", especially when under development.

The logic is that it's pretty hard to destroy a Propeller, or most other devices, via a software error. I can set the PLL badly and it won't run. I can set an pin to output that will conflict with something else, not usually fatal.

That is a bit different to finding out that my chip dies completely, or the PCB gets scorched or the house has burned down whilst I'm away!

I guess the safeguards you suggest might do the trick though.

Aside: On chips like the ARM SOC used in the Raspberry Pi you are quite free to tweak with the core voltage and clocks etc. People over clock the 700MHz device to over a giga hertz. However extreme settings will permanently flip a bit in the chip and void your warranty.

mindrobots · 2014-04-03 03:54

Finish it, test it, print it, sell it.

Commit to the new architecture and "Propellerisms" and start developing hardware that shows it off (albeit running a bit warm) start developing software that shows it off and stretches the capabilities.

A P1B doesn't give all the smart hardware designers a chance to create the 'Next Big Thing'. The new design and featured take time to understand and exploit. Not everybody is going to be dumb enough to buy a $700 DE2 package to play with 1/2 of a P2.

There's a lot to be tested still, there's a lot to be developed still before it is a viable system to use and sell. Retrenching won't change those facts, If you redesign, there is still the same amount of testing and development to do. You'll have a new design, you'll have added 2 years to the clock and you may have a similar or worse situation and you'll still have a year or more on top of that to get to market....that makes everyone 3 years older, 3 years less enchanted, 3 years distracted with other shiny things that are doing the job for them. 3 years is a LONG time.

Get something in the hands of the people that can use it. Get prototypes out, get dev boards out, get chips out. Ok, so the dev boards run warm, they let people actually get hands-on experience, they let software people actually test and implement their neat ideas. Software dreams can become reality. Hardware designers that understand the power relationships can start designing real hardware that will work with a P2A and will just work better with a P2B.

The cost of entry in the P2 club is already high (for developers, enthusiasts and Parallax) more significant delays will make it higher for everyone, folks will start walking away, folks will start getting more disillusioned. Opportunity costs are still growing, none of what's happening or what will be done to resolve it will lower opportunity costs aside from going to a smaller technology will really open up more doors (but that comes at a significant real cost).

Not to throw out more doom and gloom but we don't know what the next hurdle will be....the power issue took everyone by surprise...it still hasn't gone to fab yet....is there another hurdle? Is there a TLOCK/TFREE bug out there? Is there a timing bug in the video? It's a complex beastie...it needs thorough testing by lots of people. If it gets redesigned, chopped up, reframed, etc. it will be a DIFFERENT but still complex bestie that needs lots of testing.....and it will be a year or so later.

Tubular · 2014-04-03 05:18

Heater. wrote: »

1) What was the power consumption of that original P2 projected to be? I assume it was taken for granted that it was reasonable as nobody ever talked about it much.

2) WTF happened in the mean time? How come the power consumption is so much more than than expected of that old P2 design? Is it simply due to having more transistors (static power, leakage) or having a faster chip (dynamic, switching power) or due to a process change, geometry shrink?

Here's a thread that answers 1) - follow the links from Seairth back to fine the original etc
http://forums.parallax.com/showthread.php/145968-Propeller-2-core-current-draw-(1.8v)

For 2) you've pretty much answered it yourself. A few more transistors, process change, and taking a "worse case" value rather than typical (the last one perhaps most responsible for forumista angst).

evanh · 2014-04-03 05:27

cgracey wrote: »

By putting the RAMs into Verilog, they will be able to pull them into their synthesis, using their own generated RAMs. This will help them better estimate power and area.

Their generated RAMs are well-characterized and amenable to auto-routing, unlike the ones we laid out by hand for 180nm.

I once asked Beau if MRAM would be an option and his answer at the time was no due to it requiring four metals layers. I didn't pursue any further questions but I had wanted to ask why four metal layers was a problem.

OnSemi, presumably, could also easily factor in using MRAM for HubRAM in a few of these estimates.

Bean · 2014-04-03 05:56

Somebody please tell me this is an "April fools" joke...

Please...

Pretty please, with sugar on top...

Bean

ozpropdev · 2014-04-03 06:28

Tubular wrote: »

Just for kicks, and data points, I loaded and measured the current on the DE0-Nano running OzPropDev's Invaders this arvo.

Anyone want to hazard a guess what kind of power that consumes, running in a 60nm, 80MHz Cyclone IV like it does? 1 cog, remember...

Hey Tubular,
Potatohead & I had a guess, are we HOT or COLD?

mindrobots · 2014-04-03 06:37

Tubular wrote: »

Just for kicks, and data points, I loaded and measured the current on the DE0-Nano running OzPropDev's Invaders this arvo.

Anyone want to hazard a guess what kind of power that consumes, running in a 60nm, 80MHz Cyclone IV like it does? 1 cog, remember...

1/2 W - I'm just guessing so we have enough guess that you tell us. (It's hard to tell under it's acrylic blanket)

My nano is a bit warm but nowhere near as warm as my Mac Mini while my P1 QuickStart running one COG is room temp.

mindrobots · 2014-04-03 06:44

5W??? Pish Posh!!

I'm a big fan of hot dogs, so if I can get a dev system something like this that contains a couple of P2s, I think it's a win/win!!

Don't look at the power issue as defeat, look at it as unrecognized opportunity!!

Maybe this would be an option for packaging??

Chin up, chaps!! The end may be near but it's not HERE!!

Tubular · 2014-04-03 06:50

mindrobots wrote: »

1/2 W - I'm just guessing so we have enough guess that you tell us. (It's hard to tell under it's acrylic blanket)

My nano is a bit warm but nowhere near as warm as my Mac Mini while my P1 QuickStart running one COG is room temp.

Ok so it's taking around 390mA at 5v when running DE0 invaders. Between 375 and 405mA, approximately, ie about 2 watts all up. This includes the resistor DACs of the DE0 breakout board. Tomorrow I'll test without this.

Some programs such as the DE0 demo loading code are down around 120mA, so you're looking at an incremental of about 1.4 Watts for running a single cog on the DE0 at 80 Mhz.

"idle state" / waiting for a program to be loaded in is actually taking more current at about 180mA.

mindrobots · 2014-04-03 07:16

Tubular wrote: »

Ok so it's taking around 390mA at 5v when running DE0 invaders. Between 375 and 405mA, approximately, ie about 2 watts all up. This includes the resistor DACs of the DE0 breakout board. Tomorrow I'll test without this.

Some programs such as the DE0 demo loading code are down around 120mA, so you're looking at an incremental of about 1.4 Watts for running a single cog on the DE0 at 80 Mhz.

"idle state" / waiting for a program to be loaded in is actually taking more current at about 180mA.

Good info! So, 8 * 1.4 Watts = .........ok, so lets look at this another way.

Do the power numbers of the FPGA emulation in any way relates to the P2 on real silicon? If it's 1 to 1, we're scre.... in trouble. If it has no relations, then we're where we have been.

potatohead · 2014-04-03 07:37

What is the max power draw of a P1?

I'm asking, because "Invaders" is a great P1 project. It can do it. That amount of work = how many watts? Processes are similar.

In a sense, running invaders on one COG on the P2, is asking for a lot of what a P1 can do. How does that relate to power and our overall expectations?

Would be fun to put invaders out of tasking mode and onto all the COGS in the DE2 and measure that.

potatohead · 2014-04-03 07:43

And I'm coming to a realization. This process has it's heat / compute profile. Bill brought up CPU power ratings in it, and scaled to what we are doing, they are appropriate.

What we did is we made a CPU, and we made it lean and mean for this process. It can get a LOT done, and it's heat budget for the process, given what it can do is apparently rational. So then, we deal with the P2 as we would a CPU, not so much a micro-controller when it's at full clip.

As a micro controller, we clock it down and package it for that purpose.

Know what I think?

Let's treat it like a CPU. Get the on-chip dev tools done. Then port the P1 dev tools. People can get all Parallax, if they want to. That's cool. They can build bigger things too. That's really cool. We have three P1 chips on boards, external RAM, and all kinds of attempts at bigger things now. If people do them with P2, they get a lot bigger, up to the max!

Then we package it down and make it safe for education. No frying the board. It will still rule.

Then we sell the Smile out of it at it's various ranges / capabilities.

Regarding multiple clocks in OBEX. We've written multi-clock code before and this community is pretty good at it. Baggers and I right now are making multi-clock video code. Others can do the same. Next.

We fund a shrink as fast as we can, while designing P3.

We get the shrink done, we sell more of the P2.

We become Intel like.

Always working on the next thing, always improving what we have out there. Release often as we can.

That's a rational business approach. I say we do it. There isn't any magic fix for this process. It is what it is. We can hope Chip and OnSemi get it optimal for us.

cgracey · 2014-04-03 07:57

Cluso99 wrote: »

Chip,
Is it possible to use the same die in an FPGA BGA package and a QFP package? I am not sure if the FPGA BGA die gets connected to the pga by vias.

The QFP144 package could be rated for slower clock speeds. Perhaps a central ground pad on the QFP144 version would help (jmg's? suggestion) - only necessary for power over xW.

For the FPGA256 BGA256 (presuming 1.0mm pitch) would we get away with only requiring connections from the outer 3 rows (60 pins outer, then 52 and 44 = 156 pins total)?

One die design could be packaged many different ways to accommodate various numbers of I/O's and different power levels.

rjo__ · 2014-04-03 08:04

Heater@

A desirable goal. Perhaps not as easy as it sounds. XMOS have done it. You can program a network of xcores by connecting to just one of the devices. When you write your code the "multiple-chips" is seamless, it's just a single program. How many Prop users are going to want to build a mult-prop system though?

When I asked the question "what is wrong with a 4Cog P2 with 40MHz cogs?" One of the answers was (to pare a phrase:) " there are lots of plans and code that would have to be scrapped."

We can already load a P1 from another P1… it is just a piece of software. It seems to me that putting that in hardware on the P2 and giving it ritualized incantations at the tool level would be a solution to this kind of objection… and it opens up the possibility of easily creating massively parallel systems…. it wouldn't take very many 1024 P2 system orders to make Ken a very happy camper.

So the answer to your question is…."those guys that would have to rewrite all of there software to fit it on a 4 Cog P2:)

A 4Cog P2 would be twice as fast and a 2X4CogP2 would have twice as many pointers!!!

I don't want to have to reach for a torque wrench every time I start my P2… I just want the torque to be there when I step on the gas.

Do you want 40MHz Cogs?
Do you want twice the hub ram for each Cog?
Do you want more pointers?
How many metaphors should come in the demo kit?

Heater. · 2014-04-03 08:08

potatohead,

This process has it's heat / compute profile.

What you didn't notice that over the past decades of micro-processor development.

My limited understanding is:

1) Chips suck power even if they are doing nothing. Ideally if a CMOS gate is on or off it's dissipation is zero. Unfortunately these little transistors leak current and dissipate power just sitting there. Seems the smaller our processes become the worse this leakage issue becomes. Lowering the operating voltage helps I guess.

2) Chips suck more power when they are working. Power is required to get these gates to change state. 0 to 1 or 1 to 0. Capacitances have to be charged resistance over come, whatever. So the faster you do that the more power you need.

So we have a static power drain and on top of that a frequency dependent power drain.

Question then is, using the process we have for the P2 currently at the proposed operating speeds what proportion of power consumption is static and what is dynamic?

If the static power were very low then not using half you COGS would save half the power. Running at half speed would save half the power. You only "pay" for what you use.

If the static power is high then stopping or throttling things won't help you so much. You have to pay all the time.

I really don't have a feeling for where the PII is on this static vs dynamic power spectrum.

Problem is that now a days people will be comparing to things like STM32 that drink 260uA per mega hertz and run up to 180MHz. That's 0.2 watts !
So we have 8 COGS, that would be only 1.6 watts.

We're looking at 5 Watts in a BGA!

Comments