P2ME2 - (was Bypass capacitors needed for P2)

rogloh · 2021-09-08 07:14

Back to P2ME2 HW finally...

I'm seeing some strange behavior with some testing I did today. Looks like I might have a thermal and/or power issue here.

I wanted to run a supply torture test by spawning some SPIN2 COGs that just toggled different pin groups while operating at P2 clock frequencies of over 200MHz. After about a few minutes doing this with just 2 COGs I found the P2 would stop working and the 3.3V supply current dropped back to 28mA or so and then later down to 9mA which is probably just the LED on my board (so I am guessing the P2 chip may have first gone into reset then sleep here). Until it stopped the P2 began to get warm to the touch with the 3.3V total supply current measured at about 250mA or so. Even if most of this current was directed to the 1.8V switcher and this regulator was ~90% efficient, that would only be about 410 mA on the 1.8V line which is only 750mW of power drawn by the P2 core. Seems a little surprising that would be able to create enough heat to shutdown the P2.

I probed the DC supply voltages with a multimeter and it looks like the 1.8V regulator supply keeps up under load and remains stable at 1.8VDC across the bypass caps. I should have added a better test point for this voltage rail to make it easier to probe (lesson learned). I'll also try to scope it while under load when I get a better test setup on the bench to see if I observe any other nasty effects. There is a PG output on the regulator which can also potentially reset the P2 so I need to probe that too. If that has nothing to do with it then I sort of suspect that hand soldered ground hole I used may just not be sufficient for heatsinking the P2 on my PCB. So then I'd need to compare this with a proper solder pasted P2 chip on the next board I make. If that board also has this same issue then the PCB likely cannot dissipate nearly enough heat and I then may have to move to a 4 layer board.

I also ran Cluso's pin test again on this board to make sure no pins were shorted and they did not appear to be.

This was the SPIN code I ran which toggles a subset of pins on a port. I toggled port A bits 7-0 on one COG and 15-8 on another COG. Eventually I'll add all COGs and all pins to try to hammer it, but I'm not there yet.

PUB coga(pinmask)
    dira:= pinmask
    outa := pinmask & $55555555
    repeat
        outa := !outa

rogloh · 2021-09-08 07:45

Interesting, some more details...

Because I wanted to load my board up and use more power in this test, I had swapped out my USB powered 3.3V LDO (clean) supply I've been working with during the initial bring up for a larger switch mode DC power supply capable of delivering 3.3V up to 5A.

That little USB supply can't reliably source more than 500mA but that is enough for this particular test of ~250mA. So I changed it back to the USB power supply and have been running for over 20 minutes now and it seems fine. The P2 and regulator were warm to the touch but not hot.

I also observed that when it was failing before with my larger power supply, the reset pin was indeed being driven low (very probably by the regulator's PG signal, unlikely the prop-plug doing it then). So something about my larger supply powering my P2 board is affecting that 1.8V regulator's PG output. Maybe some noise spikes exceeding the input range on that regulator? I need to probe it more and read more about what triggers it.

Maybe the board's thermals aren't quite as bad as I thought...yet.

evanh · 2021-09-09 00:45

Some off-the-shelf switchmode power supplies are shockingly noisy. They're only good as a battery charger.

evanh · 2021-09-09 00:50

A Prop2 overheating won't reset either. It might crash but the one sure thing it will do is run slow. EDIT: Assuming the internal PLL is the clock source of course.

jmg · 2021-09-09 01:13

@rogloh said:
I also observed that when it was failing before with my larger power supply, the reset pin was indeed being driven low (very probably by the regulator's PG signal, unlikely the prop-plug doing it then). So something about my larger supply powering my P2 board is affecting that 1.8V regulator's PG output. Maybe some noise spikes exceeding the input range on that regulator? I need to probe it more and read more about what triggers it.

As above, some external power supplies are very noisy, so you could unplug the prop plug, to check if cross-talk type noise is firing the reset transistor, or if it really is 1v8 PG driven.

rogloh · 2021-09-09 01:14

@evanh said:
Some off-the-shelf switchmode power supplies are shockingly noisy. They're only good as a battery charger.

Yeah I think I'll need to try to filter it a bit and add a bulk cap on the P2 end of the power cable which is ~40-50cm long. Also the power leads I ran from the supply's terminals could be somewhat inductive with high switching currents loads causing voltage spikes and maybe that could trip the PG output of the regulator. I might have to shorten them down a bit. These leads were probably about 30cm shorter with my USB supply.

Also I read that this 1.8V buck regulator module drives its PG output low (which would reset the P2) if its junction temp reaches 150C. Although I think I'd feel that sort of heat unless the module is really well insulated on my board and retains its heat internally. The regulator barely feels warm, while the P2 does get rather warm in this test but not so hot I need to remove my finger when touching it.

@evanh said:
A Prop2 overheating won't reset either. It might crash but the one sure thing it will do is run slow. EDIT: Assuming the internal PLL is the clock source of course.

In this test I was using the PLL clock source, yes. I printed an incrementing counter every second over serial and they seemed to come out at the same rate. When it failed it would just stop outputting after a minute or two and I think this is also when the reset pulse is observed on my scope but I need to prove that happens at the same time and not later.

rogloh · 2021-09-09 01:16

@jmg said:
As above, some external power supplies are very noisy, so you could unplug the prop plug, to check if cross-talk type noise is firing the reset transistor, or if it really is 1v8 PG driven.

Yes I want to prove the reset pulse is not coming from the prop plug - it did look like a short pulse that the prop plug typically generates.

rogloh · 2021-09-09 02:19

Might have been the power leads. I shortened them by 30cm and have not seen the issue yet. Also in an attempt to overload it more I boosted the current up to 440mA @ 3.3V by running 6 COGs with the repeat loop above (leaving 1 main COG and one serial COG outputting a keep alive counter every 1 second to the console). After 8 minutes still running fine but the P2 is very warm/hot now, almost too hot to touch for very long. 1.45W going into my board, most of which dissipates in the P2. What is the most current a P2 will draw? I wonder how I can torture it more.

Update: with 8 COGs running the pin toggle code, I can get 700mA drawn at 3.3V at 350MHz. Will try for more later.

evanh · 2021-09-09 02:26

/me goes digging for old tests ... almost 2 Amps on 1v8. So you're close. Err, well I measured post-regulation so the 2 Amps doesn't include converter losses.

Yanomani · 2021-09-09 02:28

Adding to the most weird (random) transitioning pin patterns one can came with, while torturing Hub ram with random long-data write/read,; perhaps thru streamers, for not wasting too much time coding.

The other way, is random addressing, with CT-long data, but timing-granularity would be different.

Perhaps both, on alternate cogs...

Keep away from any "antenna" ears, that fixture is unshielded!!!

evanh · 2021-09-09 02:50

Just looking at my old test code now and realised that random data, as I did use, is not worst case at all. Random would only be the median case for power consumption.

A crafted sequence for maximum toggle rate for every bit of both address and data would draw more.

Yanomani · 2021-09-09 04:44

Good catch, evanh!

Reviewing my own concepts, I believe you got it right, since it would ensure transition-maximize current flow, and any eventual and subtle ringing, at each and every group of paralleled lanes, and also in between them, during all the time.

The same should apply to address lines, and because they are not interspersed with the data lanes, not any relationship between both signal buses would really matter.

It lacks only the understanding about the positioning of the byte-enable control lanes, and if their respective data-lane-groups will continue toggling, though unused, or if they are simply rulled-out, left with former data levels, or zero/one filled, in any sense.

rogloh · 2021-09-09 05:46

I was hoping for a fast IO toggle rate and core rate as well as something that exercised the streamer/fifo with HUB accesses. Because I used flexspin the SPIN code was assembling down to this version with FCACHE so once the loop is setup I don't think the hub RAM is being accessed. I probably need something that runs in hub exec, jumps in the loop to keep reloading from the FIFO path and toggles IO pins as well (maybe via the streamer). Actually no, I can't use the streamer and FIFO at the same time.

_coga
    mov dira, arg01
    and arg01, ##1431655765
    mov outa, arg01
    loc pa, #(@LR__0007-@LR__0005)
    call    #FCACHE_LOAD_
LR__0005
    rep @LR__0008, #0
LR__0006
    not arg01, outa
    mov outa, arg01
LR__0007
LR__0008
_coga_ret
    ret

evanh · 2021-09-09 19:21

I started to update but haven't finished. What I did originally was setup each streamer as part of the cog burn. I had a selection of programs for different levels of workout. The cog loop was using a tight REP block of four instructions with one being a QMUL to also fill the cordic pipeline. I'm gonna do some combination testing of that too now.

Anyway here's the newer streamer init before the cog starts its loop:

hubadr      long    (256*1024 - 8)      ' For a single 16 longword block crossing the 256 kB address transition
revcmp      long    $50ad0021       'revA (D = $42a01290  S = $50ad0021), revB (D = $84908405  S = $62690201)
m_revA      long    (%0011_0000_1000_0001<<16) | $ffff  'revA silicon, full sysclock/1 continuous FIFO reading
m_revB      long    X_RFLONG_32P_4DAC8 | X_PINS_ON | $ffff  'revB silicon, full sysclock/1 continuous FIFO reading
pat55       long    $5555_5555
pataa       long    $aaaa_aaaa


program3
'Fill one hubRAM FIFO block with toggle pattern
        waitx   #500
        wrfast  #1, hubadr      ' 16 longword cycling block
        rep @.rend, #8
        wflong  pat55
        wflong  pataa
.rend

'Silicon Rev A/B detect
'       mov pa, ptra
'       rdlut   inb, ptra++
'       cmp ptra, pa    wz  ' Rev A PTRA remains unchanged (Z = 1)

        mov pa, #1          ' set seed
        xoro32  pa          ' revA (D = $42a01290  S = $50ad0021), revB (D = $84908405  S = $62690201)
        cmp revcmp, 0-0 wz  ' Z set if revA silicon

'Continuously burst, every clock cycle, reading the toggle pattern
        rdfast  #1, hubadr      ' 16 longword cycling block
    if_z    xinit   m_revA, #0
    if_nz   xinit   m_revB, #0

Yanomani · 2021-09-10 02:38

By connecting a 64-channel micro-logic-analyzer to the guts of P2 fabric, and distributing its probes in groups of eight, among all eight banks that composes the Hub memory, this is what I believe it would show.

Note: At the actual 8-Cog P2 incarnation, Bankn_A0 = Cog/Streamer_A3, Bn_A1->C/S_A4, and so forth.

evanh · 2021-09-10 02:52

The FIFO is only a fixed increment addressing. And smallest block length is 16 longwords. So fastest I could make it cross a large bitwise address transition is every 8 clocks. Speaking of which, I guess it can be configured to rollover from 1 MB back to start of hubRAM ...

Yanomani · 2021-09-10 02:56

@evanh said:
The FIFO is only a fixed increment addressing. And smallest block length is 16 longwords. So fastest I could make it cross a large bitwise address transition is every 8 clocks. Speaking of which, I guess it can be configured to rollover from 1 MB back to start of hubRAM ...

Yes, for sure. One's code would need to team-up all eight cog/streamers, in order to be able to hit all banks, all times, with Sysclk granularity.

rogloh · 2021-09-10 03:02

@evanh said:
I started to update but haven't finished. What I did originally was setup each streamer as part of the cog burn. I had a selection of programs for different levels of workout. The cog loop was using a tight REP block of four instructions with one being a QMUL to also fill the cordic pipeline. I'm gonna do some combination testing of that too now.

Yes I thought about that too yesterday, I want to use the CORDIC once the streamer is used to get the IO pins toggling. Maybe keeping the CORDIC pipeline full of operations from all COGs is important too. Any other known non-CORDIC instructions that would hammer the COG core more vs others would? What do we think would toggle the most flops in a rep loop?

Also I wonder how hot the P2 can actually get before it toasts itself and does some permanent damage? I can already feel the chip getting hot when I load the 3.3V supply to 750mA and run at 350MHz for a minute or more. It will only get worse from here....

jmg · 2021-09-10 03:17

@rogloh said:
Also I wonder how hot the P2 can actually get before it toasts itself and does some permanent damage? I can already feel the chip getting hot when I load the 3.3V supply to 750mA and run at 350MHz for a minute or more. It will only get worse from here....

Is the first board 2 layer and 1 oz ?
4 layer board is one option to lower the peak temperatures, by getting more copper to spread the heat over the whole board area.

rogloh · 2021-09-10 03:48

@jmg said:

@rogloh said:
Also I wonder how hot the P2 can actually get before it toasts itself and does some permanent damage? I can already feel the chip getting hot when I load the 3.3V supply to 750mA and run at 350MHz for a minute or more. It will only get worse from here....

Is the first board 2 layer and 1 oz ?
4 layer board is one option to lower the peak temperatures, by getting more copper to spread the heat over the whole board area.

2 Layers yes, I think it would be 1 oz but @Tubular would know what was ordered.

With 8 COGs doing QMUL in a rep loop and simultaneously streaming to IO pins with evanh's code above I briefly (for few seconds) had the power supply up to 900mA at 3.3V or ~ 3W (but now it doesn't load the P2.... oh no did I fry it?)
EDIT: no the P2 is still working when I removed the extreme load COGs, but it wouldn't run that same test program and draw my 900mA again when I reload that same program. Not sure why yet. It's not hot right now.
Update2: Iooks like that power good/reset problem came back at that level of load. I see another reset pulse generated with 8 COGs loaded, but not with 1 COG loaded, which shuts down the P2.

evanh · 2021-09-10 03:57

@evanh said:
... Speaking of which, I guess it can be configured to rollover from 1 MB back to start of hubRAM ...

Right, first step, make sure the FIFO rolls over:

        getct   pb
'pre-erase
        setq    ##-1            ' max length (hardware truncated to 18 bits)
        wrlong  #0, #0          ' zero fill hubRAM

'report time taken
        getct   pa
        sub pa, pb
        call    #itod
        call    #putnl


'rollover fill test
        wrfast  #1, hubadr
        rep @.rend, #8
        wflong  pat55
        add pat55, #1
        wflong  pataa
        sub pataa, #1
.rend


'read back content
        mov ptra, hubadr
.loop
        call    #putsp
        mov pa, ptra
        call    #itoh
        call    #putsp

        rdlong  pa, ptra++
        call    #itoh
        call    #putnl

        djnz    count, #.loop

        jmp #$


hubadr      long    (1024*1024 - 8*4)   ' For a single 16 longword block crossing the address transition
pat55       long    $5555_5555
pataa       long    $aaaa_aaaa

And the results:

Total smartpins = 64   1111111111111111111111111111111111111111111111111111111111111111
Rev B silicon.  Sysclock  4.0000 MHz
   262159
   000fffe0   55555555
   000fffe4   aaaaaaaa
   000fffe8   55555556
   000fffec   aaaaaaa9
   000ffff0   55555557
   000ffff4   aaaaaaa8
   000ffff8   55555558
   000ffffc   aaaaaaa7
   00000000   55555559
   00000004   aaaaaaa6
   00000008   5555555a
   0000000c   aaaaaaa5
   00000010   5555555b
   00000014   aaaaaaa4
   00000018   5555555c
   0000001c   aaaaaaa3
   00000020   00000000
   00000024   00000000
   00000028   00000000
   0000002c   00000000

That's all good.

evanh · 2021-09-10 04:13

And add two more to the REP should show the counting pattern overwritten the first four addresses:

        wrfast  #1, hubadr
        rep @.rend, #10
        wflong  pat55
        add pat55, #1
        wflong  pataa
        sub pataa, #1
.rend

   262159
   000fffe0   5555555d
   000fffe4   aaaaaaa2
   000fffe8   5555555e
   000fffec   aaaaaaa1
   000ffff0   55555557
   000ffff4   aaaaaaa8
   000ffff8   55555558
   000ffffc   aaaaaaa7
   00000000   55555559
   00000004   aaaaaaa6
   00000008   5555555a
   0000000c   aaaaaaa5
   00000010   5555555b
   00000014   aaaaaaa4
   00000018   5555555c
   0000001c   aaaaaaa3
   00000020   00000000
   00000024   00000000
   00000028   00000000
   0000002c   00000000

Perfect!

jmg · 2021-09-10 04:25

@rogloh said:
EDIT: no the P2 is still working when I removed the extreme load COGs, but it wouldn't run that same test program and draw my 900mA again when I reload that same program. Not sure why yet. It's not hot right now.
Update2: Iooks like that power good/reset problem came back at that level of load. I see another reset pulse generated with 8 COGs loaded, but not with 1 COG loaded, which shuts down the P2.

Is that from the PG signal ? Could be either temperature, or VIN sag ? Data says typical 150 °C rising and typical hysteresis TJ falling 20 °C (so 130 °C restores)

evanh · 2021-09-10 04:26

Lol, found a hardware quirk with the FIFO. It's working data wise, but the FIFO's "PTR" is out of step with its own workings.

Which suggests its a fake register. I remember it was only added on request. The FIFO was already working long before that register existed.

Same data reported twice, both are correct data but the address from GETPTR should wrap with its data:
- First using RDLONG (shows the zero'd RAM off the end)
- Then RFLONG (shows wrapped 16 longword block)

Total smartpins = 64   1111111111111111111111111111111111111111111111111111111111111111
Rev B silicon.  Sysclock  4.0000 MHz
   262159
   000fffe0   55555555
   000fffe4   aaaaaaaa
   000fffe8   55555556
   000fffec   aaaaaaa9
   000ffff0   55555557
   000ffff4   aaaaaaa8
   000ffff8   55555558
   000ffffc   aaaaaaa7
   00000000   55555559
   00000004   aaaaaaa6
   00000008   5555555a
   0000000c   aaaaaaa5
   00000010   5555555b
   00000014   aaaaaaa4
   00000018   5555555c
   0000001c   aaaaaaa3
   00000020   00000000
   00000024   00000000
   00000028   00000000
   0000002c   00000000
   00000030   00000000
   00000034   00000000
   00000038   00000000
   0000003c   00000000

   000fffe0   55555555
   000fffe4   aaaaaaaa
   000fffe8   55555556
   000fffec   aaaaaaa9
   000ffff0   55555557
   000ffff4   aaaaaaa8
   000ffff8   55555558
   000ffffc   aaaaaaa7
   00000000   55555559
   00000004   aaaaaaa6
   00000008   5555555a
   0000000c   aaaaaaa5
   00000010   5555555b
   00000014   aaaaaaa4
   00000018   5555555c
   0000001c   aaaaaaa3
   00000020   55555555
   00000024   aaaaaaaa
   00000028   55555556
   0000002c   aaaaaaa9
   00000030   55555557
   00000034   aaaaaaa8
   00000038   55555558
   0000003c   aaaaaaa7

Source code:

        getct   pb
'pre-erase
        setq    ##-1            ' max length (hardware truncated to 18 bits)
        wrlong  #0, #0          ' zero fill hubRAM

'report time taken
        getct   pa
        sub pa, pb
        call    #itod
        call    #putnl


'rollover fill test
        wrfast  #1, hubadr
        rep @.rend, #8
        wflong  pat55
        add pat55, #1
        wflong  pataa
        sub pataa, #1
.rend


'read back content without FIFO
        mov count, #24
        mov ptra, hubadr
.loop1
        call    #putsp
        mov pa, ptra
        call    #itoh
        call    #putsp

        rdlong  pa, ptra++
        call    #itoh
        call    #putnl

        djnz    count, #.loop1

        call    #putnl


'read back content with FIFO
        mov count, #24
        rdfast  #1, hubadr
.loop2
        call    #putsp
        getptr  pa
        call    #itoh
        call    #putsp

        rflong  pa
        call    #itoh
        call    #putnl

        djnz    count, #.loop2

        jmp #$


hubadr      long    (1024*1024 - 8*4)   ' For a single 16 longword block crossing the address transition
pat55       long    $5555_5555
pataa       long    $aaaa_aaaa

evanh · 2021-09-10 04:51

@rogloh said:
With 8 COGs doing QMUL in a rep loop and simultaneously streaming to IO pins with evanh's code above I briefly (for few seconds) had the power supply up to 900mA at 3.3V or ~ 3W (but now it doesn't load the P2.... oh no did I fry it?)

Show them muscles.

evanh · 2021-09-10 04:53

I remember Chip freaking out when the simulations said that Prop2-Hot was going to reach 5 Watts at rated clock.

rogloh · 2021-09-10 04:57

@jmg said:
Is that from the PG signal ? Could be either temperature, or VIN sag ? Data says typical 150 °C rising and typical hysteresis TJ falling 20 °C (so 130 °C restores)

I am trying to pinpoint it down to being PG or PropPlug by removing the PropPlug from the reset source after download but every time I yank the PropPlug out it generates a reset spike! I'll have to try to flash it I guess.

I should also measure the VIN at the regulator input. At 900mA I might lose a little voltage. They are not thick power lead wires. I might also boost the input voltage a fraction to see if it helps. I hope the P2 can take 3.6V.

evanh · 2021-09-10 05:00

@evanh said:
Which suggests its a fake register. I remember it was only added on request. The FIFO was already working long before that register existed.

Huh, just tested it in the middle of hubRAM and it still does the same. GETPTR doesn't wrap around with the block wrapping. It only increments. Looks like that may have been a known quirk at design. I don't see it mentioned in the two main google docs though.

evanh · 2021-09-10 05:05

@rogloh said:

@jmg said:
Is that from the PG signal ? Could be either temperature, or VIN sag ? Data says typical 150 °C rising and typical hysteresis TJ falling 20 °C (so 130 °C restores)

I am trying to pinpoint it down to being PG or PropPlug by removing the PropPlug from the reset source after download but every time I yank the PropPlug out it generates a reset spike! I'll have to try to flash it I guess.

I should also measure the VIN at the regulator input. At 900mA I might lose a little voltage. They are not thick power lead wires. I might also boost the input voltage a fraction to see if it helps. I hope the P2 can take 3.6V.

My AUX USB switch died early on, so I ended up using a linear lab power supply directly into 5V_Common. No problems with drooping rail after that.

rogloh · 2021-09-10 05:19

Just tried an experiment by removing the reset signal from PropPlug cable path and only downloading at initial board power up. About 100ms after the program runs the CORDIC+streamer pin loading test with the 8 COGs I see a 1.2ms active low reset pulse and it shuts down. So the reset must be coming from that PG output pin. It cannot be heating up that fast and going into thermal shutdown, so it must be droop or noise. It is meant to have a 20us deglitch filter so perhaps it is droop on the output voltage getting back into the FB pin?

I'm going to setup my scope to trigger on this PG pulse and monitor the 1.8V rail at the output cap to see if I find anything there.

P2ME2 - (was Bypass capacitors needed for P2)

Comments