Testing current consumption while running primes.c - A comparison between P2 Eval RevA and RevB

samuell · 2019-11-23 13:49

So, I redid the tests for the low power consumption. The results are pretty much similar, except for the relative absence of noise. You can see the files attached, as usual.

Kind regards, Samuel Lourenço

evanh · 2019-11-23 14:03

Program's 2 and 3 are crafted to use 100% of the cordic capacity. Program 3 is additionally crafted to use 100% of hubram read bandwidth, all with random data so there isn't any idling of those data paths. Both effects are unrealistically brutal on known to be power hungry circuits.

I didn't try writing lots to hubram. Maybe that would be worse, dunno.

samuell · 2019-11-23 15:15

evanh wrote: »

Program's 2 and 3 are crafted to use 100% of the cordic capacity. Program 3 is additionally crafted to use 100% of hubram read bandwidth, all with random data so there isn't any idling of those data paths. Both effects are unrealistically brutal on known to be power hungry circuits.

I didn't try writing lots to hubram. Maybe that would be worse, dunno.

Ok, that might explain it. That probably would cause the first P2 chip to get quite hot.

Kind regards, Samuel Lourenço

evanh · 2019-11-23 15:25

By "first", do you mean the revA silicon? RevB results is more dramatic since it comes from a lower idle power. There is only about 10% difference in power between revA and revB at the top end.

samuell · 2019-11-23 16:12

evanh wrote: »

By "first", do you mean the revA silicon?...

Yes.

evanh wrote: »

...RevB results is more dramatic since it comes from a lower idle power. There is only about 10% difference in power between revA and revB at the top end.

But shouldn't the RevA silicon consume almost double the current? I mean, 672mA versus 365mA, having all eight cogs at 160mA.

Kind regards, Samuel Lourenço

samuell · 2019-11-23 21:41

Ok, did another run using a slightly modified program that doesn't output individual numbers (code is attached). Since this program runs much more faster when calculating small numbers, I had to take a different procedure: prime number calculations were first done between 1 and 1000000 (notice the extra zero), and then between 1000000000 and 1000100000.

I was expecting the current consumption to increase, since the program doesn't have to wait for the serial to be available. Essentially, the verified current at a given point was about the same, but the program ran obviously faster in both cases.

Kind regards, Samuel Lourenço

evanh · 2019-11-23 23:47

samuell wrote: »

But shouldn't the RevA silicon consume almost double the current? I mean, 672mA versus 365mA, having all eight cogs at 160mA.

That's the general expectation, yes. But it's actually workload dependent. When I said top-end, I meant top end of capabilities of the prop2. If a revB is being well utilised then the bulk of the silicon is actively switching. It's lit up all over. That 50-60% power savings from the clock-gating almost vanish. RevB still has potential to run hot even if it doesn't in many tests.

samuell · 2019-11-24 00:02

evanh wrote: »

samuell wrote: »

But shouldn't the RevA silicon consume almost double the current? I mean, 672mA versus 365mA, having all eight cogs at 160mA.

That's the general expectation, yes. But it's actually workload dependent. When I said top-end, I meant top end of capabilities of the prop2. If a revB is being well utilised then the bulk of the silicon is actively switching. It's lit up all over. That 50-60% power savings from the clock-gating almost vanish. RevB still has potential to run hot even if it doesn't in many tests.

I see. So clock gating reduces power consumption, and that is one of the differences between RevA and RevB silicon. Now I remember.

Kind regards, Samuel Lourenço

evanh · 2019-11-24 10:04

evanh wrote: »

Cluso99 wrote: »

Did you actually test the VIO to the P2 pins using up to ~6.5V?

Yep, only the once so far. Now that I've been reminded I'm intending to have another shot at lower temperature. I suspect the curve to move left quite a ways.

Hmm, did some low temperature testing of this yesterday and things went weird. I might have to change what is powered because it clearly isn't making sense with my existing method.

One part of the weirdness was that the current draw drifted dramatically, presumably as the temperature changed inside the cooler bag that the Eval Board sat inside of. These measurements are somewhat arbitrarily chosen after I got fed up with the amount of time I was waiting for the drifting to stop. Temperature of the circuit board was rising again by this time.

Here's the result of yesterday's effort anyway ...
VIO%20quiescent%28-8degC%29.png

jmg · 2019-11-24 18:43

evanh wrote: »

...Now that I've been reminded I'm intending to have another shot at lower temperature. I suspect the curve to move left quite a ways.

Hmm, did some low temperature testing of this yesterday and things went weird. I might have to change what is powered because it clearly isn't making sense with my existing method.

One part of the weirdness was that the current draw drifted dramatically, presumably as the temperature changed inside the cooler bag that the Eval Board sat inside of.

I'd expect some parts of the curve to move more than others.
The upper/higher voltage area is more 'zener like' in nature, so that would not move so much.
The lower sub-uA zone would be more highly variable, that said the first plot crosses the 1uA threshold at 5.2V, but this one crosses at 3.2V ?!
- are those the same boards/pins under test ? (Maybe this one is post a 6V+ stress test, showing permanent damage?)

It would be interesting to see the Xtal Osc and PLL Icc numbers (but not over such extreme voltage ranges)

samuell · 2019-11-24 21:45

Hi,

Now that the tests are pretty much done, I took the opportunity to compare the behavior with P1 (P8X32A). So, I've tested a board of mine, the Prop S, similarly. The board was already programmed, so, basically the test was plugging the board and calculating numbers 1 to 100000 and 1000000000 to 1000010000.

I did this because I've suspected about the P2 strange behavior, since the current consumption actually decreases during the calculations. I found out that the P1 shows the opposite behavior, as to be expected. My question is, why the current consumption decreases when the P2 is calculating primes? The program is essentially the same, with some adaptations to work with the programming tools we have for the P2.

Kind regards, Samuel Lourenço

evanh · 2019-11-25 01:45

jmg wrote: »

The upper/higher voltage area is more 'zener like' in nature, so that would not move so much.
The lower sub-uA zone would be more highly variable, that said the first plot crosses the 1uA threshold at 5.2V, but this one crosses at 3.2V ?!

- are those the same boards/pins under test ? (Maybe this one is post a 6V+ stress test, showing permanent damage?)

Yep, it's always the revA board since it is the only one with the jumpers for isolating the VIO supply pins. I've repeatedly retested in each of these testing rounds. The behaviour hasn't changed. They still function fine.

At low temperature even 2 Volts was able to draw hundreds of micro-Amps at power up. It almost seemed like a large capacitor being charged with the current slowly dropping away, although no change in voltage. Then all of a sudden it would drop to the sub-uA area. So the variability was there.

evanh · 2019-11-25 01:58

[deleted]

jmg · 2019-11-25 01:59

evanh wrote: »

jmg wrote: »

The upper/higher voltage area is more 'zener like' in nature, so that would not move so much.
The lower sub-uA zone would be more highly variable, that said the first plot crosses the 1uA threshold at 5.2V, but this one crosses at 3.2V ?!

- are those the same boards/pins under test ? (Maybe this one is post a 6V+ stress test, showing permanent damage?)

Yep, it's always the revA board since it is the only one with the jumpers for isolating the VIO supply pins. I've repeatedly retested in each of these testing rounds. The behaviour hasn't changed. They still function fine.

At low temperature even 2 Volts was able to draw hundreds of micro-Amps at power up. It almost seemed like a large capacitor being charged with the current slowly dropping away, although no change in voltage. Then all of a sudden it would drop to the sub-uA area. So the variability was there.

Floating pins (or floating nodes) could give that symptom.
Do you define all the port pins before running these tests ?

evanh · 2019-11-25 02:01

VDD = 0 Volts. I'm only powering the one VIO pair.

jmg · 2019-11-25 02:06

evanh wrote: »

VDD = 0 Volts. I'm only powering the one VIO pair.

Hmm.. then at least pulldowns/ups on the pins would be needed.
I'm not sure if RESN does anything on the IO cells, if VDD=0 ?

samuell · 2019-11-25 11:32

I wonder if Chip ( @cgracey ) can answer my question about that weird behavior I' m seeing (as reported on post #41). I mean, what's happening in order for the P2 to consume less power when the program is doing actual calculations? The P2 consumes significantly more current when the program is waiting for user input. Doesn't make much sense.

Kind regards, Samuel Lourenço

cgracey · 2019-11-25 12:26

samuell wrote: »

I wonder if Chip ( @cgracey ) can answer my question about that weird behavior I' m seeing (as reported on post #41). I mean, what's happening in order for the P2 to consume less power when the program is doing actual calculations? The P2 consumes significantly more current when the program is waiting for user input. Doesn't make much sense.

Kind regards, Samuel Lourenço

I'm not sure, Samuel. It might be that more signals are toggling during the wait than during the calculations.

samuell · 2019-11-25 22:31

One can't deny it is counter-intuitive, Chip. Is there any chance that cordic consume less power than an actual cog doing calculations, and that cordic is indeed being used?

A power consumption stat per element (say cog/CORDIC/IO) would be needed to further analyze this. One solution could be just decreasing the frequency when the program is waiting for the user, but that may mess baud rate. Other solution would go through enabling and disabling cogs on demand, but, on the P1, once a cog was disabled it couldn't be re-enabled. I wonder if that is the same on the P2?

Kind regards, Samuel Lourenço

cgracey · 2019-11-25 23:02

samuell wrote: »

One can't deny it is counter-intuitive, Chip. Is there any chance that cordic consume less power than an actual cog doing calculations, and that cordic is indeed being used?

A power consumption stat per element (say cog/CORDIC/IO) would be needed to further analyze this. One solution could be just decreasing the frequency when the program is waiting for the user, but that may mess baud rate. Other solution would go through enabling and disabling cogs on demand, but, on the P1, once a cog was disabled it couldn't be re-enabled. I wonder if that is the same on the P2?

Kind regards, Samuel Lourenço

While you are in a GETQX/GETQY instruction, the CORDIC is busy, but the cog is stalled, cutting cog power. Maybe that's why you are seeing lower power during the math.

jmg · 2019-11-25 23:04

samuell wrote: »

One can't deny it is counter-intuitive, Chip. Is there any chance that cordic consume less power than an actual cog doing calculations, and that cordic is indeed being used?

A power consumption stat per element (say cog/CORDIC/IO) would be needed to further analyze this. One solution could be just decreasing the frequency when the program is waiting for the user, but that may mess baud rate. Other solution would go through enabling and disabling cogs on demand, but, on the P1, once a cog was disabled it couldn't be re-enabled. I wonder if that is the same on the P2?

Ideally, any wait opcode would clock gate to the max, and slash the power consumption, by disable of much of the COG clock tree.
P1 does this well, but it was a manual design.
I think the automated clock gating that was done in P2, is not quite smart enough to 'see' that scope, so wait will not show the same significant change.

Also note that code location is likely to affect current drain - an idle loop that straddles an address boundary, will change more address bits and that will change Icc.
Not sure of the order of that in P2.

evanh · 2019-11-26 09:26

Samuell,
I've just run primes.c and, at supply volts of 5.2 V, get around 151 mA while running the search loop ... but, amusingly, when waiting for keyboard input it is using 163.1 mA.

Removing the printf() and running 1 - 10,000,000 range it shows a slow reduction in current draw as the numbers get larger. Ended up below 143 mA. So, what little maths there is. is also diminishing. And as Chip said, with how it's written, the cog is always pausing during cordic operation.

It probably isn't possible to run them concurrently without turning the whole search loop into a single encompassing assembly function.

PS: My measurements don't include the USB load since I'm powering from AUX-5V rather than USB. The USB chip will add some milliAmps to your readings compared to mine.

samuell · 2019-11-26 10:54

cgracey wrote: »

While you are in a GETQX/GETQY instruction, the CORDIC is busy, but the cog is stalled, cutting cog power. Maybe that's why you are seeing lower power during the math.

That makes sense.

jmg wrote: »

Ideally, any wait opcode would clock gate to the max, and slash the power consumption, by disable of much of the COG clock tree.
P1 does this well, but it was a manual design.
I think the automated clock gating that was done in P2, is not quite smart enough to 'see' that scope, so wait will not show the same significant change.

Also note that code location is likely to affect current drain - an idle loop that straddles an address boundary, will change more address bits and that will change Icc.
Not sure of the order of that in P2.

This could be a suggestion to apply on a possible future chip, maybe P3.

evanh wrote: »

Samuell,
I've just run primes.c and, at supply volts of 5.2 V, get around 151 mA while running the search loop ... but, amusingly, when waiting for keyboard input it is using 163.1 mA.

Removing the printf() and running 1 - 10,000,000 range it shows a slow reduction in current draw as the numbers get larger. Ended up below 143 mA. So, what little maths there is. is also diminishing. And as Chip said, with how it's written, the cog is always pausing during cordic operation.

It probably isn't possible to run them concurrently without turning the whole search loop into a single encompassing assembly function.

PS: My measurements don't include the USB load since I'm powering from AUX-5V rather than USB. The USB chip will add some milliAmps to your readings compared to mine.

Yup, if we let "primes-silent.c" run from 1 to 100000000, for example, we should see an asymptotic-like curve. The current decreases to a certain value.

Kind regards, Samuel Lourenço

evanh · 2019-11-26 14:59

samuell wrote: »

jmg wrote: »

Ideally, any wait opcode would clock gate to the max, and slash the power consumption, by disable of much of the COG clock tree.
P1 does this well, but it was a manual design.
I think the automated clock gating that was done in P2, is not quite smart enough to 'see' that scope, so wait will not show the same significant change.

Also note that code location is likely to affect current drain - an idle loop that straddles an address boundary, will change more address bits and that will change Icc.
Not sure of the order of that in P2.

This could be a suggestion to apply on a possible future chip, maybe P3.

It's already there just fine. That's why the power is reduced at all. One cog's full share of the cordic (7 active stages) is hungrier than a full occupied cog. You're only using one stage at a time though.

PS: Chip's spiral demo makes good use of filling up the cordic pipeline to get the speed is has. One operation takes 55 clocks to come out the other end but he's paralleling up a bunch of calculations through the cordic in rapid fire.

The tricky part is you have to be ready for the results as they come available. Otherwise the subsequent result overwrites and you've lost order. The final cordic result for each cog is retained indefinitely.

samuell · 2019-11-27 19:16

Hi evanh ( @evanh ),

I wanted to ask you how can I use your program so I can test the P2 with its maximum settings and processing power, to get the maximum dissipation possible. I'm curious to see how much it dissipates, in terms of heat.

Kind regards, Samuel Lourenço

evanh · 2019-11-27 22:47

Use the constants at the top to choose which test program to run (BURN_TYPE), the number of cogs to use (COGS_TO_BURN) and what clock rate to run at (XMUL/XDIV). Then assemble and download as per normal.

So, for program #3, set burn type to 3. For 8 eight cogs set cogs to burn to 8. For 360 MHz set xmul to 36.

Be warned: You can't do this only powered from the PC-USB socket. And if your AUX-USB is not strong then it will power down. And you'll likely have to unplug the USB to recover.

evanh · 2019-11-27 23:06

LOL, oops, also need to edit the clock setting source. It's currently setup to use RCSLOW only.

Change

{
		hubset	clk_mode			'setup PLL mode for new frequency (still operating at RCFAST)
		waitx	##25_000_000/100		'~10ms (at RCFAST) for PLL to stabilise
		or	clk_mode, #XSEL			'add PLL as the clock source select
		hubset	clk_mode			'engage!  Switch to newly set PLL
}
'		hubset	#0		'RCFAST (~20 MHz)
 		hubset	#1		'RCSLOW (~20 KHz)

to


		hubset	clk_mode			'setup PLL mode for new frequency (still operating at RCFAST)
		waitx	##25_000_000/100		'~10ms (at RCFAST) for PLL to stabilise
		or	clk_mode, #XSEL			'add PLL as the clock source select
		hubset	clk_mode			'engage!  Switch to newly set PLL

'		hubset	#0		'RCFAST (~20 MHz)
' 		hubset	#1		'RCSLOW (~20 KHz)

evanh · 2019-11-27 23:30

Just checked the 5 volt supply current and that test exceeds the 900 mA limit of USB3.0. So it's entering USB 3.1 territory.

rogloh · 2019-11-28 00:09

What's the 1.8V current in this test case? Can you measure that easily evanh?

evanh · 2019-11-28 00:11

https://forums.parallax.com/discussion/comment/1482622/#Comment_1482622

Almost 2 Amps. Although it drops away to below 1.9, I think, as the prop2 heats up. I never leave it running long so haven't really seen how bad.

EDIT: Actually, without the inline current meter in place, the VDD volts and current are likely to be a little higher than measured. The volt measurements in that table are from downstream of the current meter while at 360 MHz.

Testing current consumption while running primes.c - A comparison between P2 Eval RevA and RevB

Comments