P2 - Minimising power usage in a COG (cpu core)
Cluso99
Posts: 18,069
in Propeller 2
Chip,
How can we minimise the power used by a cog?
Suppose we have a cog running a driver to some external interface (eg an SD Card Driver). What steps should we take to minimise power when it's idle?
In P1 we had waitcnt, waitpxx, etc.
Typically we will be waiting on a command via hub.
Some ideas...
* Use a mix of rdbyte and waitx - introduces some latency but may not be an issue
* Have the calling cog use the COGATN instruction, and wait for it in a POLLATN or WAITATN loop
* Use a LOCK and wait for the lock to become "locked"
* Anything else? Or does it not matter?
How can we minimise the power used by a cog?
Suppose we have a cog running a driver to some external interface (eg an SD Card Driver). What steps should we take to minimise power when it's idle?
In P1 we had waitcnt, waitpxx, etc.
Typically we will be waiting on a command via hub.
Some ideas...
* Use a mix of rdbyte and waitx - introduces some latency but may not be an issue
* Have the calling cog use the COGATN instruction, and wait for it in a POLLATN or WAITATN loop
* Use a LOCK and wait for the lock to become "locked"
* Anything else? Or does it not matter?
Comments
So how low we can go waiting at RCfast with one COG?
curious,
Mike
Evanh, would mind comparing VDD current between these two different cog programs?
EDIT: It's because of an automatic implementation feature, called "clock gating", that On Semi enabled upon Chip's request.
There was a link Chip found to an article about a subject called "dark silicon". It was about reorienting and taking advantage of the problem of integrated circuits getting so densely packed with compute power that the thermal management had become a problem.
The premise was that RAM blocks, in particular, do not consume power the way logic does. Mainly because only one location at a time is accessed, the rest are sitting idle waiting their turn and therefore not drawing much current. This naturally low power state is termed as dark.
In effect, clock gating brings an auto-darkness mode to the processor logic, or more accurately the clocking circuits of the intervening flip-flops.
Of course, when you're designing compute chips, and the power consumption is slashed in half for you, you then immediately add more compute to use all available power again.
I wish there was a way to measure the RCSlow frequency. I guess there may be some way to do it by having an external RC circuit on some pin, and measure its response / time constant while operating on crystal, then repeat while on RCSlow. I guess we could use 1kohm dac mode then all you need is external reference capacitance
One off calibrate should be possible using USART.
Eg at lowest 367 baud of UB3, you can calibrate to 0.2%/ 0x00 char, limited by the auto cal jitter of the xtal-Less bridges.
If you need low ppm cal/track, then use a Xtal-UART & a small burst of chars.
Thanks Evan for doing that.
Would you mind adding figures for say 180MHz and 360MHz please?
This would give us some nice comparisons.
AFAIK the clock gating is helping the logic use less power. The RAM blocks that ON provide most likely already include the clock gating although IIRC the hub blocks were still being enabled if not required.
Adding this test running in hubexec would be a nice comparison too. Just the 180 and 360 would be fine.
I remember Chip posted a couple of numbers that indicated roughly 45% of those for the revB silicon.
EDIT: Here's some new measurements of those single cog loops above (note the slightly higher RCFAST, I've changed to higher capacity meter for these and retested RCFAST with it)
Thanks, Evanh. I don't know why, but I remember the difference being really big. It seems to make little difference, though.
Is it possible to add a test with Pgm1 running in all 8 COGs, to get a All-COG baseline number ?
I don't think the clock gating is smart enough to shut down all logic of not-started-yet COGs, so hopefully the current per added COG is not large.
PGM3 in all 8 COGS would bump the current quite a bit, given the added activity.
Any idea why V6063 is needed?
Has anybody tried this?
Ok, I see. I'm wondering if P2 can go to even lower power by removing all VIO. Perhaps powering them back up at intervals to check for the need to wake...
If you are not using any of the analog functions, I thought the VIO Static overhead per small VIO group, was single-digit uA ?
(ie not significant ?)
EDIT: Yep, I'm pretty certain I've got that one verified. With program #1, one cog running on the revB chip at 360 MHz it uses 405 mA, while with eight cogs it uses only 333 mA. Pretty much same a one cog at 300 MHz.
EDIT2: And on the revA chip they are practically the same. Both almost 1 amp. Eight cog program is a few mA higher. Tried at 180 MHz and same story, both around 520 mA.
I guess the revA is not a surprise, but the revB chip sure is. It appears an idle cog uses less power than a stopped cog on the revB silicon.