PDA

View Full Version : Propeller Current Consumption Oddities -- and a Conclusion



Phil Pilgrim (PhiPi)
01-11-2009, 03:49 PM
A question came up in a different thread about why Spin programs draw more current than assembly programs (according to the graphs in the datasheet). One hypothesis had it that it was because the interpreter was always busy with hub interactions. I hypothesized that because of the hub interactions (and consequent idles waiting for the hub) that Spin programs should consume less current, not more, than non-hub-interacting assembly programs. So I decided to see for myself.

The Propeller Demo Board has a handy jumper that can be removed that breaks the Vdd connection for hooking up an ammeter (or low-ohmic resistor and voltmeter). Here are the programs I tried, along with their Vdd current draw, as measured on the Demo Board:

A. Spin loop only (14.2mA)




CON

_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000

PUB main

repeat




B. Assembly loop only (10.1mA)




CON

_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000

PUB main

cognew(@loop, 0)

DAT

loop jmp #loop




C. Assembly loop with RDLONG from RAM (10.1mA)




CON

_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000

PUB main

cognew(@loop, 0)

DAT

loop rdlong x,addr
jmp #loop

addr long $1000

x res 1




D. Assembly loop with RDLONG from ROM (10.7mA)




CON

_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000

PUB main

cognew(@loop, 0)

DAT

loop rdlong x,addr
jmp #loop

addr long $8000

x res 1




E. Assembly loop with RDLONG from ROM + NOP (13.4mA)




CON

_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000

PUB main

cognew(@loop, 0)

DAT

loop rdlong x,addr
nop
jmp #loop

addr long $8000

x res 1




Observations so far:

····1. Assembly programs require less current than Spin programs, confirming the datasheet.

····2. It takes more current to read from RAM than from ROM.

····3. Programs that hit the hub "sweet spot" (E) require more current than programs that wait for hub access (D).

····4. Programs that hit the hub "sweet spot" (E) require more current than programs that don't access the hub (B).

Now I know Chip writes some amazingly tight code. But how can his interpreter draw more current than an assembly loop that hits the hub sweet spot every time (reading from ROM, no less)?

This required more testing. Maybe some instructions require more current than others. After all, NOPs and JMPs are pretty much "free" instructions, effortwise. So I added an ADD to the loop in B. Aha!

F. Assembly loop + ADD (15.1mA)




CON

_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000

PUB main

cognew(@loop, 0)
'cognew(@loop, 0)

DAT

loop add x,#0
jmp #loop

addr long $8000

x res 1




G. Assembly loop with RDLONG from ROM + ADD (14.3mA)




CON

_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000

PUB main

cognew(@loop, 0)
'cognew(@loop, 0)

DAT

loop rdlong x,addr
add x,#0
jmp #loop

addr long $8000

x res 1




Conclusions:

Okay, now we're comparing apples with apples and can revise a couple observations:

····1. Assembly programs require less current than Spin programs, confirming the datasheet. See #5.

····2. It takes more current to read from RAM than from ROM.

····3. Programs that hit the hub "sweet spot" (E) require more current than programs that wait for hub access (D).

····4. Programs that hit the hub "sweet spot" (E) require more current than programs that don't access the hub (B). See #6.

····5. Realistic assembly programs require just as much current as Spin programs, if not more.

····6. Hub accesses decrease the average current requirements due to the waiting time (even the single-cycle wait when hitting the "sweet spot").

So there you have it. It all makes sense now, at least to me. I think the datasheet is a little misleading in this regard, since it gives the impression that Spin programs are more current-consumptive than assembly programs. But this is only because the example program (JMP only) cited is not doing any real work, whereas the Spin interpreter is. So the difference, it turns out, has little to do with hub accesses.

-Phil

Post Edited (Phil Pilgrim (PhiPi)) : 1/11/2009 9:10:05 AM GMT

mpark
01-11-2009, 04:44 PM
Nicely done!

heater
01-11-2009, 04:48 PM
Seems clear to me. Doing something like ADD wiggles more transistors and so sucks more switching current. May even depend on the number of ones within the operands for example or whether a carry is propagated a long way. Think what a multiply would do.

Are you up to testing current consumption of instructions other than ADD whilst your at it? Or ADD/SUB with different operand values. Just curious.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Post Edited (heater) : 1/11/2009 8:55:02 AM GMT

Phil Pilgrim (PhiPi)
01-11-2009, 04:58 PM
Heater,

Maybe, if I have time tomorrow, I could test some other instructions. The main discrepancy that was bugging me has been resolved, though, so I may just leave it and move on.

-Phil

Cluso99
01-11-2009, 05:02 PM
Brilliant observations and what results http://forums.parallax.com/images/smilies/smile.gif

Now, in hindsight, I guess I am not surprised.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

My cruising website http://www.bluemagic.biz

Ariba
01-11-2009, 11:34 PM
Thank you Phil

I asked this question in the forum, when the first datasheet was released, but got no answer.

I must say that the Assembly current consumption curve in the datasheet has no relevance, in this case !

Andy

kwinn
01-12-2009, 12:22 AM
Nicely done Phil. I always knew doing real work took more energy than faking it, and you have verified it... at least for cpu's!

QuattroRS4
01-12-2009, 02:56 AM
Nicely Broken down Phil ! ... Now as it has been demystified - with measured results 'Oddities' can be removed from the thread title ! lol

Regards,
John

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Necessity is the mother of invention'

Those who can, do.Those who can’t, teach.

Paul Baker
01-12-2009, 04:27 AM
Ariba said...

I must say that the Assembly current consumption curve in the datasheet has no relevance, in this case !


Why do you say this Andy? It demonstrates the case where there is no wait state, no hub access, and no I/O. This represents a floor current consumption for a cog running assembly which never enters a wait state.

I think people are expecting too much from the datasheet, as Phil's experiments have shown measuring current consumption is a very dependent activity. You will get different results depending on what you do, and this isn't even taking into account I/O with varying impedance loads. The datasheet only attempts to provide a generalized picture of current consumption through a few select cases.

In order to arrive at the level of detail where all mysteries are resolved would require calculating how many coulombs each and every instruction would take (with hub accesses having many entries depending on how many clock cycles were used). But who would really use this information (who's use·would require computing a summation series hundreds of entries long then divide by execution time) when all you have to do is Phil's real world measurement of the system in action?

BTW nice job Phil.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker (mailto:pbaker@parallax.com)


Post Edited (Paul Baker) : 1/11/2009 9:02:07 PM GMT

Tracy Allen
01-12-2009, 04:43 AM
3 or 4 mA added to the base of 10 is not much, but significant. Curious minds want to know. Maybe the ALU is sleeping until needed. Looking at the die photo (http://forums.parallax.com/attachment.php?attachmentid=49689), I don't see the COG ALUs explicitly circled. Is the ALU a separate entity within each COG? 4 mA is a lot for one addition! If the loop is synced to a scope to monitor current, does it peak when the addition is executing, as Paul intimated, as coulombs per instruction, or does the mere presence of an addition in the loop bump up the current overall? How about the effect of a simple i/o pin access? Sorry, idle speculation. Thanks for insightful testing, Phil!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com (http://www.emesystems.com)

Paul Baker
01-12-2009, 04:53 AM
While I can't directly answer your question Tracy I can say that Chip has used gated clocks in quite a few circumstances. This is how he arrives at low power consumption values. While an ALU does not have any clocked signals inside itself, it is quite possible that the clocks to the dual port memory are being gated which would account for a larger than expected difference in power consumption since those clock cycles the dual port isn't accessed effectively put it to sleep.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker (mailto:pbaker@parallax.com)

Phil Pilgrim (PhiPi)
01-12-2009, 05:59 AM
Tracy,

It would be hard to measure instruction-by-instruction current variations without removing all the bypass caps. Even then, any residual inductance would tend to blur the demarcations between instructions.

It looks like the determinant factor for invoking the ALU gets complicated by the conditionals. Here are some example code fragments:

H. Loop with NOP (11.8mA)




loop nop
jmp #loop




I. Loop with ADD zero (15.5mA)




loop add x,#0
jmp #loop




J. Loop with ADD one (15.7 - 16.1mA)




loop add x,#1
jmp #loop




K. Loop with ADD zero not performed (14.0mA)




loop if_never add x,#0
jmp #loop




L. Loop with ADD #1 not performed (14.4mA)




loop if_never add x,#1
jmp #loop




M. Loop with ADD #511 not performed (15.6mA)




loop if_never add x,#511
jmp #loop




N. Loop with ADD $8000_0000 (15.9mA)




loop add x,adder 'adder is long $8000_0000
jmp #loop




O. Loop with ADD $8000_0000 not performed (14.2mA)




loop if_never add x,adder 'adder long $8000_0000
jmp #loop




P. Loop with ADD $8000_0000 nr (14.4mA)




loop add x,adder nr 'adder long $8000_0000
jmp #loop




Q. Loop with ADD $8000_0000 not performed and nr (14.mA)




loop if_never add x,adder nr 'adder long $8000_0000
jmp #loop




Conclusions:

····7. There's more happening when a conditional is not met than just substituting a NOP. (Strictly speaking any instruction with if_never is a NOP. But the assembler inserts one with low current consumption.) My guess is that the instruction is performed, but with writes to the result, flags, and program counter blocked.

····8. Current consumption is dependent on the number of ones in the instruction's source field.

····9. Current consumption varies with the instruction's result value.

In general, it looks like determining the average current consumption of a given program to any degree of precision a priori would be an exercise in frustration.

-Phil

Post Edited (Phil Pilgrim (PhiPi)) : 1/12/2009 3:09:58 AM GMT

Cluso99
01-12-2009, 10:50 AM
Are you digging for gold Phil? Really great detective work here. No doubt Chip is scratching his head !

I am playing with an FPGA doing a Cog emulation and in the 2nd cycle I have decoded the cccc and c & z flags to determine if the instruction will be executed. If not, then this translates into a nop. From your tests (as you state), it implies that the prop performs the operation and prevents the update.

Chip: good idea to save current on the propII (if you haven't already). I've also worked out (in theory) how to have no penalty cycles in non-taken jumps.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

My cruising website http://www.bluemagic.biz