USB Testing

garryj · 2019-01-27 02:22

jmg wrote: »

When you send each TX chunk, does the USB send always align to some start-phase, or does the TX resync to the new TX-time (ie the edges move) ?
If it does resync every chunk sent, you would expect small delays to help, and there could also be multiple delay sweetspots.
ie in above, 3 sysclks/120MHz is 30% of one 12MHz period, so 13 sysclks would give 1period + 30%

The transmitted SOP is the SYNC field, so I think that makes each packet its own start-phase? It doesn't matter where they start but, like I said before, it does matter when they will finish.
From the USB 2.0 spec:

8.2 SYNC Field
All packets begin with a synchronization (SYNC) field, which is a coded sequence that generates a
maximum edge transition density. It is used by the input circuitry to align incoming data with the local
clock. A SYNC from an initial transmitter is defined to be eight bits in length for full/low-speed and 32 bits
for high-speed. Received SYNC fields may be shorter as described in Chapter 7. SYNC serves only as a
synchronization mechanism and is not shown in the following packet diagrams (refer to Section 7.1.10).
The last two bits in the SYNC field are a marker that is used to identify the end of the SYNC field and, by
inference, the start of the PID.

jmg · 2019-01-27 02:41

garryj wrote: »

...
The transmitted SOP is the SYNC field, so I think that makes each packet its own start-phase?

Looking at
http://www.jmaxwellusb.com/Articles/USB-Specifications.aspx
The issue I am getting at here is the SOF to SOF should be exact multiple of 1/12MHz times - & P2 should manage that ok.
The microframes they mention, what happens if they are not 1/12MHz aligned ? they may be ok within the microframe, but if they are not also 1/12Mhz aligned within SOF, I can imagine some USB slave devices could cope better than others. ie until new edges are uses to resync, it would be using the old sample point timings.
hence my question of
When you send each TX chunk, does the USB send always align to some (1/12MHz tick) start-phase, or does the TX resync to the new TX-time (ie the edges move) ?

dgately · 2019-01-27 03:26

garryj wrote: »

Yes. A waitx of #3 between the ackpin and wypin when from ~120MHz..180MHz, the runs are squeaky-clean, analyzer-wise. Above 180MHz, the OUT CRC error starts showing up at ~192MHz, but they may recover and pass within the six retries I have now. Bump the sysclock a little more, and it the recoveries disappear. I've upped the retry count up to 50+ tries with no recoveries.

Just an FYI... I get really good results with an 8GB Patriot thumbdrive. Not sure if this info helps but here's a dump from that drive, formatted on macOS as FAT32 (Did you expect scanfat to have a different result? I tried adding a wait #3, got the same result):

A:\>
USB low/full speed minimal host & bulk-only mass storage driver v0.03.
Debug output to terminal is off.

#:\>
<Full-Speed device connected.>

Vendor ID:
Product ID: Patriot Memory
Version level: PMAP
Media is removable
Device does not claim conformance to any SPC standard
Highest LBA: 15646719
Sector size: 512
Checking media for a FAT file system...

Partition type: 0x0B
Cluster size: 4096
Volume base sector: 2048
Reserved sector count: 32
FSInfo base sector (in reserved): 2049
FAT region base sector: 2080
Sector count of one FAT: 15249
FAT region sector count: 30498
RootDir base sector: 32578
RootDir cluster#: 2
Dir/file/data base sector: 32578
Count of data region clusters: 1951767
Count of free clusters: 1951540
FSInfo next free cluster: 209
Count of data region sectors: 15614142
Count of volume sectors: 15644672
FAT32 volume mounted.

A:\>TEST
Media present and ready

A:\>DIR

Volume label: P2USB
Serial number: F9B5-18F0

 Directory of A:\

01/26/2019  05:13 PM    <DIR>         SPOTLI~1
01/26/2019  05:13 PM    <DIR>         FSEVEN~1
01/17/2019  01:04 PM             8116 USDECL~1.TXT
01/26/2019  05:15 PM             4096 _USDEC~1.TXT

              2 File(s)          12212 bytes
              2 Dir(s)      7993507840 bytes free

A:\>DIRW
Volume label: P2USB
Serial number: F9B5-18F0

 Directory of A:\

[SPOTLI~1]      [FSEVEN~1]      USDECL~1.TXT    _USDEC~1.TXT
              2 File(s)          12212 bytes
              2 Dir(s)      7993507840 bytes free


A:\>TYPE USDECL~1.TXT
In Congress, July 4, 1776.
                          The unanimous Declaration of the thirteen united States of America, When in
the Course of human events, it becomes necessary for one people to dissolve the political bands which
have connected them with another, and to assume among the powers of the earth, the separate and equal
st
...

A:\>TGLDBG
Debug output to terminal is on.
HWStack return address:

00000016

A:\>SCANFAT
Scanning FAT region..............................

Free cluster count adjustment: +0
HWStack return address:

00000016

A:\>GETSEC 1000
04BD8: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00   '................'
04BE8: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00   '................'

GETSEC with any parameter, starts at: '04BD8'???

dgately

garryj · 2019-01-27 05:17

dgately wrote: »

Just an FYI... I get really good results with an 8GB Patriot thumbdrive. Not sure if this info helps but here's a dump from that drive, formatted on macOS as FAT32 (Did you expect scanfat to have a different result? I tried adding a wait #3, got the same result):

Every thing is looks as expected.
For scanfat, "+0" is good, as that means its free cluster count matches the one that's stored in the FSInfo data structure.

GETSEC with any parameter, starts at: '04BD8'???

Yes, that's correct, and yes, it's confusing. It's the start address of the hub RAM buffer read data. I'm currently using the ROM monitor's hex dump routine and it only shows local RAM addresses. Down the road a bit this will have the media addresses.

Thanks very much for giving it a spin!

garryj · 2019-01-27 06:06

jmg wrote: »

garryj wrote: »

...
The transmitted SOP is the SYNC field, so I think that makes each packet its own start-phase?

Looking at
http://www.jmaxwellusb.com/Articles/USB-Specifications.aspx
The issue I am getting at here is the SOF to SOF should be exact multiple of 1/12MHz times - & P2 should manage that ok.
The microframes they mention, what happens if they are not 1/12MHz aligned ? they may be ok within the microframe, but if they are not also 1/12Mhz aligned within SOF, I can imagine some USB slave devices could cope better than others. ie until new edges are uses to resync, it would be using the old sample point timings.
hence my question of
When you send each TX chunk, does the USB send always align to some (1/12MHz tick) start-phase, or does the TX resync to the new TX-time (ie the edges move) ?

Full-speed (thank heaven) doesn't do microframes. And mass storage is all bulk transfers, which are about as loose as it gets in regard to bus management. My packet scheduler is very simple -- if it looks like the transfer/transaction will fit within a frame, send it. In practice, the majority of the devices I've tested with are not very picky about when a transfer gets plonked on the bus. I'm doing this in my spare time, so for me, perfection is not a requirement. If I can keep the analyzer happy, I'm happy :-D

jmg · 2019-01-27 06:19

garryj wrote: »

.... My packet scheduler is very simple -- if it looks like the transfer/transaction will fit within a frame, send it. In practice, the majority of the devices I've tested with are not very picky about when a transfer gets plonked on the bus. I'm doing this in my spare time, so for me, perfection is not a requirement. If I can keep the analyzer happy, I'm happy :-D

Yes, on paper the self-sync nature of USB should tolerant any phase, and quite large clock movements, but it would be nice to know what breaks things, in those cases that seem less tolerant, and where you adding a fraction of a bus-bit in tweak delays, improves things (until it breaks on the other side).

garryj · 2019-01-27 17:50

jmg wrote: »

garryj wrote: »

.... My packet scheduler is very simple -- if it looks like the transfer/transaction will fit within a frame, send it. In practice, the majority of the devices I've tested with are not very picky about when a transfer gets plonked on the bus. I'm doing this in my spare time, so for me, perfection is not a requirement. If I can keep the analyzer happy, I'm happy :-D

Yes, on paper the self-sync nature of USB should tolerant any phase, and quite large clock movements, but it would be nice to know what breaks things, in those cases that seem less tolerant, and where you adding a fraction of a bus-bit in tweak delays, improves things (until it breaks on the other side).

At this point, the "on paper the self-sync nature of USB" is working for me. I, too, think it would be nice to know what breaks things in those cases, but that is not my priority. Perhaps you would like to take up that mantle, and let us know what you find?

evanh · 2019-01-28 23:39

Garry,
I got the P2ES Accessory Set today and was setting up for testing with the supplied USB adaptors but sadly seem to have destroyed the prop2 itself in the process.

Rayman · 2019-01-28 23:41

Ouch! Any idea what happened? 5V from USB get on a P2 pin?

evanh · 2019-01-28 23:56

Something to do with I/O power jumpers. I changed V2431 from linear regulator to VIO supply (first use) and the bench power supply beeped (I'm using +5v aux input from a fixed 5v linear bench supply). It surprised me, never heard it do that before. I tried some more times and by the time I thought to run tests it was already too late.

All I know at the moment is the programming is not responding. I've re-plugged the USB and double checked against the FPGA board.

evanh · 2019-01-29 00:11

Hmm, 0.3 ohms to GND on V2431 with jumper removed. The linear regulator is fried too.

Guess it's been that way all along. First thing I did when opening the P2ES-EVAL board was shift all VIO jumpers over to the linear regulators so VIO hadn't been used by any I/O pins until now.

jmg · 2019-01-29 00:12

evanh wrote: »

Something to do with I/O power jumpers. I changed V2431 from linear regulator to VIO supply (first use) and the bench power supply beeped (I'm using +5v aux input from a fixed 5v linear bench supply). It surprised me, never heard it do that before. I tried some more times and by the time I thought to run tests it was already too late.

All I know at the moment is the programming is not responding. I've re-plugged the USB and double checked against the FPGA board.

Was the power on, or off, when you moved jumpers ? The Linear 3v3 LDOs I think have less power margin than the SMPS, so maybe just that died ?

evanh · 2019-01-29 00:30

All the others are fine, I can change the jumpers at will. I'm guessing there is a short on V2431 and it was just unfortunate I picked it to put the USB adaptor on first.

VIO is stable at 3.31 volts.

cgracey · 2019-01-29 00:47

So, Evah, are you sure your P2 chip is dead?

The P2 won't work without V2431 being ~3.3V, as it won't be able regulate a replica of the 1.8V supply that the internal 20MHz+ RC oscillator powers from.

garryj · 2019-01-29 00:48

evanh wrote: »

Hmm, 0.3 ohms to GND on V2431 with jumper removed. The linear regulator is fried too.

Guess it's been that way all along. First thing I did when opening the P2ES-EVAL board was shift all VIO jumpers over to the linear regulators so VIO hadn't been used by any I/O pins until now.

Ow. That's depressing. I sure hope it is a one-off. I've been using VIO since day one, and different I/O pins in the v1623 and v2431 blocks.

evanh · 2019-01-29 01:05

cgracey wrote: »

So, Evah, are you sure your P2 chip is dead?

The P2 won't work without V2431 being ~3.3V, as it won't be able regulate a replica of the 1.8V supply that the internal 20MHz+ RC oscillator powers from.

Oh, so then the linear regulator, and V2431, must have been operational before the incident. I did change the jumper with power on. So, somehow I've destroyed this section of the I/O ring.

PS: And this dependence on V2431 for booting would explain the lack of comms.

evanh · 2019-01-29 01:28

The prop wasn't set to boot any Flash/SD at the time, it would have been waiting for comms. The power was on is all.

I presume this means it would be operating from the 20 MHz RC oscillator when the jumper was changed.

jmg · 2019-01-29 03:55

evanh wrote: »

Oh, so then the linear regulator, and V2431, must have been operational before the incident. I did change the jumper with power on...

Having seen how those LDOs quietly oscillate with no jumper, I've been quite careful (so far) to power-off when change of jumper.
Even so, a stone-dead result seems unexpected ?

cgracey · 2019-01-29 04:24

jmg wrote: »

evanh wrote: »

Oh, so then the linear regulator, and V2431, must have been operational before the incident. I did change the jumper with power on...

Having seen how those LDOs quietly oscillate with no jumper, I've been quite careful (so far) to power-off when change of jumper.
Even so, a stone-dead result seems unexpected ?

The question is whether the P2 power shorted inside the chip to GND, or it was something outside the chip that shorted to GND. The latter means the P2 chip may be okay.

By the way, I've always moved the jumpers around with the power on. I see no reason why it should be dangerous to do so.

evanh · 2019-01-29 04:47

I'll see if I can get someone in town here that can carefully remove the chip.

Cluso99 · 2019-01-29 08:58

evanh wrote: »

I'll see if I can get someone in town here that can carefully remove the chip.

Which chip? Not the P2 I hope! You should try everything else first.

evanh · 2019-01-29 09:20

Yeah, the prop2. And true, I won't do it now.

With the V2431 supply jumper removed, I've just been running a controlled 300 mA into V2431 to get it up to 100 mV. I first tried at the jumpers and had trouble seeing a voltage gradient anywhere, partly because the mid point of that rail is the jumpers. I then moved the 300 mA onto V2431 of the 12 pin header and could see a clear voltage drop from there to the jumper pins and a further smaller drop to C117/C118 next to P2ES chip.

I'm convinced the short is inside the chip.

Chip,
I'm more than happy to send the whole board back to you. Do you want to investigate?

T Chap · 2019-01-29 13:46

Evanh send me your address I’ll FedEx you my board. I won’t have time to mess around with it for a few weeks at least so you’ll make better use of it.

cgracey · 2019-01-29 18:27

T Chap wrote: »

Evanh send me your address I’ll FedEx you my board. I won’t have time to mess around with it for a few weeks at least so you’ll make better use of it.

T Chap, that's very generous of you, but Parallax can fix this problem.

Evanh, are you sure that microscopic little 3.3 volt regulator didn't short out? I think the way to isolate the problem would be to lift the V2427 and V2831 power pins on the P2 chip to see if either relieves the problem.

Well, wait a minute. It's simpler than that. Without the jumper, are you seeing that those two Vxxxx pins which are tied together are shorted to ground? I think you said you are. That would definitely indicate that the chip shorted out internally. In that case, you are going to need a new P2 chip.

Peter, are you equipped to solder one of your uncommitted P2 chips onto Evan's board? Do you have the means to safely remove it?

We still have, I think, four glob-top packages that we could solder on to boards.

So, if Peter's game, he could do the work, or we can do it at Parallax. Logistics would probably be cheaper if Peter could do it, since he lives in your area of the world. Or, just send us that thing and we'll get on it.

evanh · 2019-01-30 02:50

cgracey wrote: »

Without the jumper, are you seeing that those two Vxxxx pins which are tied together are shorted to ground? I think you said you are. That would definitely indicate that the chip shorted out internally. In that case, you are going to need a new P2 chip.

That's the one.

garryj · 2019-02-01 02:52

Got some time to test media writes of ~33.5GB non-stop. The same type of CRC error I am getting with reads also affects writes, but not in the way I expected. I'm doing the writes in 32KB blocks and that takes 513 OUTs back-to-back, with each receiving an ACK handshake byte from the device, and one IN to the device to request the command status wrapper. I had posited that I would see more of the CRC errors, but that didn't happen. When a CRC error did happen, it was always in the command block wrapper for a new 32KB of data (see attached image).

It doesn't happen at every new block -- there could be runs of many consecutive blocks without a single error. Applying the same "fix" used with reads that added more wait cycles between the tx byte routine's AKPIN #DP and WYPIN #DM got the CRC errors to disappear. Given that the remaining 512 OUTs for the transfer complete without raising a CRC error, it's hard to imagine that there could be some random bus glitch that's causing this. There is something very mysterious happening during the rx->tx transition where the device returns the command status and the host sends another data block. If a bad CRC happens, it's always here.

Edit: BTW, I received my ES accessory set today and the serial host add-on works like a charm! But maybe in a future revision, both host and device boards could have test points for D-/D+?

cgracey · 2019-02-01 03:23

garryj wrote: »

Got some time to test media writes of ~33.5GB non-stop. The same type of CRC error I am getting with reads also affects writes, but not in the way I expected. I'm doing the writes in 32KB blocks and that takes 513 OUTs back-to-back, with each receiving an ACK handshake byte from the device, and one IN to the device to request the command status wrapper. I had posited that I would see more of the CRC errors, but that didn't happen. When a CRC error did happen, it was always in the command block wrapper for a new 32KB of data (see attached image).

It doesn't happen at every new block -- there could be runs of many consecutive blocks without a single error. Applying the same "fix" used with reads that added more wait cycles between the tx byte routine's AKPIN #DP and WYPIN #DM got the CRC errors to disappear. Given that the remaining 512 OUTs for the transfer complete without raising a CRC error, it's hard to imagine that there could be some random bus glitch that's causing this. There is something very mysterious happening during the rx->tx transition where the device returns the command status and the host sends another data block. If a bad CRC happens, it's always here.

Edit: BTW, I received my ES accessory set today and the serial host add-on works like a charm! But maybe in a future revision, both host and device boards could have test points for D-/D+?

Garryj, it seems to me that there is some delay required in the USB protocol that the smart pin does not generate on its own, but you are needing to do, yourself, instead.

jmg · 2019-02-01 03:35

garryj wrote: »

.... When a CRC error did happen, it was always in the command block wrapper for a new 32KB of data (see attached image).

It doesn't happen at every new block -- there could be runs of many consecutive blocks without a single error. Applying the same "fix" used with reads that added more wait cycles between the tx byte routine's AKPIN #DP and WYPIN #DM got the CRC errors to disappear.

Could that be related to when bit-stuff action is taken ? Your highlighted 0x9d & 0x49 differ by a single inserted bit ?

cgracey · 2019-02-01 03:54

Could there be some error in the smart pin's USB bit-stuffing logic, maybe?

jmg · 2019-02-01 04:16

garryj wrote: »

.... There is something very mysterious happening during the rx->tx transition where the device returns the command status and the host sends another data block. If a bad CRC happens, it's always here.

What is the appx failure rate you see, and does it change if you warm up the P2 ?
What is the overall Bad/Good USB packet ratio, for that ~33.5GB non-stop ?

USB Testing

Comments