Shop OBEX P1 Docs P2 Docs Learn Events
Propeller Current Consumption Oddities -- and a Conclusion — Parallax Forums

Propeller Current Consumption Oddities -- and a Conclusion

Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
edited 2009-01-12 02:50 in Propeller 1
A question came up in a different thread about why Spin programs draw more current than assembly programs (according to the graphs in the datasheet). One hypothesis had it that it was because the interpreter was always busy with hub interactions. I hypothesized that because of the hub interactions (and consequent idles waiting for the hub) that Spin programs should consume less current, not more, than non-hub-interacting assembly programs. So I decided to see for myself.

The Propeller Demo Board has a handy jumper that can be removed that breaks the Vdd connection for hooking up an ammeter (or low-ohmic resistor and voltmeter). Here are the programs I tried, along with their Vdd current draw, as measured on the Demo Board:

A. Spin loop only (14.2mA)

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB main

  repeat




B. Assembly loop only (10.1mA)

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB main

  cognew(@loop, 0)

DAT

loop  jmp       #loop




C. Assembly loop with RDLONG from RAM (10.1mA)

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB main

  cognew(@loop, 0)

DAT

loop    rdlong    x,addr
        jmp       #loop

addr    long      $1000

x       res       1




D. Assembly loop with RDLONG from ROM (10.7mA)

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB main

  cognew(@loop, 0)

DAT

loop    rdlong    x,addr
        jmp       #loop

addr    long      $8000

x       res       1




E. Assembly loop with RDLONG from ROM + NOP (13.4mA)

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB main

  cognew(@loop, 0)

DAT

loop    rdlong    x,addr
        nop
        jmp       #loop

addr    long      $8000

x       res       1
      



Observations so far:

····1. Assembly programs require less current than Spin programs, confirming the datasheet.

····2. It takes more current to read from RAM than from ROM.

····3. Programs that hit the hub "sweet spot" (E) require more current than programs that wait for hub access (D).

····4. Programs that hit the hub "sweet spot" (E) require more current than programs that don't access the hub (B).

Now I know Chip writes some amazingly tight code. But how can his interpreter draw more current than an assembly loop that hits the hub sweet spot every time (reading from ROM, no less)?

This required more testing. Maybe some instructions require more current than others. After all, NOPs and JMPs are pretty much "free" instructions, effortwise. So I added an ADD to the loop in B. Aha!

F. Assembly loop + ADD (15.1mA)

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB main

  cognew(@loop, 0)
  'cognew(@loop, 0)

DAT

loop  add       x,#0
        jmp       #loop

addr    long      $8000

x       res       1
       



G. Assembly loop with RDLONG from ROM + ADD (14.3mA)

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB main

  cognew(@loop, 0)
  'cognew(@loop, 0)

DAT

loop    rdlong    x,addr
        add       x,#0
        jmp       #loop

addr    long      $8000

x       res       1
       



Conclusions:

Okay, now we're comparing apples with apples and can revise a couple observations:

····1. Assembly programs require less current than Spin programs, confirming the datasheet. See #5.

····2. It takes more current to read from RAM than from ROM.

····3. Programs that hit the hub "sweet spot" (E) require more current than programs that wait for hub access (D).

····4. Programs that hit the hub "sweet spot" (E) require more current than programs that don't access the hub (B). See #6.

····5. Realistic assembly programs require just as much current as Spin programs, if not more.

····6. Hub accesses decrease the average current requirements due to the waiting time (even the single-cycle wait when hitting the "sweet spot").

So there you have it. It all makes sense now, at least to me. I think the datasheet is a little misleading in this regard, since it gives the impression that Spin programs are more current-consumptive than assembly programs. But this is only because the example program (JMP only) cited is not doing any real work, whereas the Spin interpreter is. So the difference, it turns out, has little to do with hub accesses.

-Phil

Post Edited (Phil Pilgrim (PhiPi)) : 1/11/2009 9:10:05 AM GMT

Comments

  • mparkmpark Posts: 1,305
    edited 2009-01-11 08:44
    Nicely done!
  • heaterheater Posts: 3,370
    edited 2009-01-11 08:48
    Seems clear to me. Doing something like ADD wiggles more transistors and so sucks more switching current. May even depend on the number of ones within the operands for example or whether a carry is propagated a long way. Think what a multiply would do.

    Are you up to testing current consumption of instructions other than ADD whilst your at it? Or ADD/SUB with different operand values. Just curious.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.

    Post Edited (heater) : 1/11/2009 8:55:02 AM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-01-11 08:58
    Heater,

    Maybe, if I have time tomorrow, I could test some other instructions. The main discrepancy that was bugging me has been resolved, though, so I may just leave it and move on.

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-01-11 09:02
    Brilliant observations and what results smile.gif

    Now, in hindsight, I guess I am not surprised.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Prop Tools under Development or Completed (Index)
    http://forums.parallax.com/showthread.php?p=753439

    My cruising website http://www.bluemagic.biz
  • AribaAriba Posts: 2,687
    edited 2009-01-11 15:34
    Thank you Phil

    I asked this question in the forum, when the first datasheet was released, but got no answer.

    I must say that the Assembly current consumption curve in the datasheet has no relevance, in this case !

    Andy
  • kwinnkwinn Posts: 8,697
    edited 2009-01-11 16:22
    Nicely done Phil. I always knew doing real work took more energy than faking it, and you have verified it... at least for cpu's!
  • QuattroRS4QuattroRS4 Posts: 916
    edited 2009-01-11 18:56
    Nicely Broken down Phil ! ... Now as it has been demystified - with measured results 'Oddities' can be removed from the thread title ! lol

    Regards,
    John

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Necessity is the mother of invention'

    Those who can, do.Those who can’t, teach.
  • Paul BakerPaul Baker Posts: 6,351
    edited 2009-01-11 20:27
    Ariba said...

    I must say that the Assembly current consumption curve in the datasheet has no relevance, in this case !

    Why do you say this Andy? It demonstrates the case where there is no wait state, no hub access, and no I/O. This represents a floor current consumption for a cog running assembly which never enters a wait state.

    I think people are expecting too much from the datasheet, as Phil's experiments have shown measuring current consumption is a very dependent activity. You will get different results depending on what you do, and this isn't even taking into account I/O with varying impedance loads. The datasheet only attempts to provide a generalized picture of current consumption through a few select cases.

    In order to arrive at the level of detail where all mysteries are resolved would require calculating how many coulombs each and every instruction would take (with hub accesses having many entries depending on how many clock cycles were used). But who would really use this information (who's use·would require computing a summation series hundreds of entries long then divide by execution time) when all you have to do is Phil's real world measurement of the system in action?

    BTW nice job Phil.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker


    Post Edited (Paul Baker) : 1/11/2009 9:02:07 PM GMT
  • Tracy AllenTracy Allen Posts: 6,662
    edited 2009-01-11 20:43
    3 or 4 mA added to the base of 10 is not much, but significant. Curious minds want to know. Maybe the ALU is sleeping until needed. Looking at the die photo, I don't see the COG ALUs explicitly circled. Is the ALU a separate entity within each COG? 4 mA is a lot for one addition! If the loop is synced to a scope to monitor current, does it peak when the addition is executing, as Paul intimated, as coulombs per instruction, or does the mere presence of an addition in the loop bump up the current overall? How about the effect of a simple i/o pin access? Sorry, idle speculation. Thanks for insightful testing, Phil!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • Paul BakerPaul Baker Posts: 6,351
    edited 2009-01-11 20:53
    While I can't directly answer your question Tracy I can say that Chip has used gated clocks in quite a few circumstances. This is how he arrives at low power consumption values. While an ALU does not have any clocked signals inside itself, it is quite possible that the clocks to the dual port memory are being gated which would account for a larger than expected difference in power consumption since those clock cycles the dual port isn't accessed effectively put it to sleep.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-01-11 21:59
    Tracy,

    It would be hard to measure instruction-by-instruction current variations without removing all the bypass caps. Even then, any residual inductance would tend to blur the demarcations between instructions.

    It looks like the determinant factor for invoking the ALU gets complicated by the conditionals. Here are some example code fragments:

    H. Loop with NOP (11.8mA)

    loop          nop 
                  jmp       #loop
    
    
    


    I. Loop with ADD zero (15.5mA)

    loop          add       x,#0 
                  jmp       #loop
    
    
    


    J. Loop with ADD one (15.7 - 16.1mA)

    loop          add       x,#1 
                  jmp       #loop
    
    
    


    K. Loop with ADD zero not performed (14.0mA)

    loop if_never add       x,#0 
                  jmp       #loop
    
    
    


    L. Loop with ADD #1 not performed (14.4mA)

    loop if_never add       x,#1 
                  jmp       #loop
    
    
    


    M. Loop with ADD #511 not performed (15.6mA)

    loop if_never add       x,#511
                  jmp       #loop
    
    
    


    N. Loop with ADD $8000_0000 (15.9mA)

    loop          add       x,adder                 'adder is long $8000_0000
                  jmp       #loop
    
    
    


    O. Loop with ADD $8000_0000 not performed (14.2mA)

    loop if_never add       x,adder                 'adder long $8000_0000 
                  jmp       #loop
    
    
    


    P. Loop with ADD $8000_0000 nr (14.4mA)

    loop          add       x,adder nr              'adder long $8000_0000 
                  jmp       #loop
    
    
    


    Q. Loop with ADD $8000_0000 not performed and nr (14.mA)

    loop if_never add       x,adder nr              'adder long $8000_0000 
                  jmp       #loop
    
    
    


    Conclusions:

    ····7. There's more happening when a conditional is not met than just substituting a NOP. (Strictly speaking any instruction with if_never is a NOP. But the assembler inserts one with low current consumption.) My guess is that the instruction is performed, but with writes to the result, flags, and program counter blocked.

    ····8. Current consumption is dependent on the number of ones in the instruction's source field.

    ····9. Current consumption varies with the instruction's result value.

    In general, it looks like determining the average current consumption of a given program to any degree of precision a priori would be an exercise in frustration.

    -Phil

    Post Edited (Phil Pilgrim (PhiPi)) : 1/12/2009 3:09:58 AM GMT
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-01-12 02:50
    Are you digging for gold Phil? Really great detective work here. No doubt Chip is scratching his head !

    I am playing with an FPGA doing a Cog emulation and in the 2nd cycle I have decoded the cccc and c & z flags to determine if the instruction will be executed. If not, then this translates into a nop. From your tests (as you state), it implies that the prop performs the operation and prevents the update.

    Chip: good idea to save current on the propII (if you haven't already). I've also worked out (in theory) how to have no penalty cycles in non-taken jumps.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Prop Tools under Development or Completed (Index)
    http://forums.parallax.com/showthread.php?p=753439

    My cruising website http://www.bluemagic.biz
Sign In or Register to comment.