USB Testing

Martin Hodge · 2016-06-14 05:10

Looking good! :thumb:

garryj · 2016-06-14 17:11

Rayman wrote: »

Got new circuit boards and stencil

It looks good to me too, and I hope everything checks out OK!

Tubular · 2016-06-14 21:02

Very nice, Ray.

You might need to take the solder off the ball pads with some wick, if you'll be adding more with the stencil. I'm no expert on this but just seems like HASL + stencil solder + balls would be too much solder

Looks like a nice sharp stencil too

jmg · 2016-06-14 21:23

Tubular wrote: »

You might need to take the solder off the ball pads with some wick, if you'll be adding more with the stencil. I'm no expert on this but just seems like HASL + stencil solder + balls would be too much solder

Ball removal sounds risky to me. I'd keep the balls, but not overdo the paste - just enough to ensure thermal path from pad-ball, and reflow on a hot pad should work.

garryj · 2016-06-14 22:25

I'm to the point where I've got Control and Bulk IN/OUT transactions sorted, with CRC16 calcs/checks on all DATAx packets. It took me longer than expected to be able to handle DATAx packets if their payload size was less than the maximum allowed, as during device enumeration there are times the host is required to ask for more data than it's likely to receive. Distinguishing payload bytes from the CRC bytes in a time-sensitive way when there's a short packet put a little wrinkle in process when it came to verifying the incoming CRC...

Anyway, I got past that little hurdle, but along the way I had time to evaluate the how the receiver status flags are implemented. Though initially ambivalent, I now think that the receiver status bits should reflect, as close as possible, the state of the bus as applied by the transmitter. If I recall, the change discussion was about how many bit periods should follow the EOP->J transition before setting the J (K) status flag, in order to make inter-packet delay application more "automatic".

If I'm interpreting the USB spec properly, the transmitting device is allowed one full J (K) period to allow it time to turn off its output buffer. Given this, I'm now of the opinion that the receiver status J (K) flags should also reflect this, if possible. What I propose is this:

 [15:8] byte             - last byte received
 [7]    byte toggle      - cleared on SOP, toggled on each byte received
 [6]    error            - cleared on SOP, set on bit-unstuff error, EOP SE0 > 3 bits, or SE1
 [5]    EOP in           - cleared on SOP or 1+ bits of J or K, set on EOP (P2v9x is 7+)
 [4]    SOP in           - cleared on EOP, set on SOP (P2v9 is 7+; is a false SOP possible?)
 [3]    SE1 in (illegal) - cleared on !SE1, set on 1+ bits of SE1
 [2]    SE0 in (RESET)   - cleared on !SE0, set on 1+ bits of SE0
 [1]    K in (RESUME)    - cleared on !K, set on 1+ bits of K (P2v8+ is 7+ bits of K)
 [0]    J in (IDLE)      - cleared on !J, set on 1+ bits of J (P2v8+ is 7+ bits of J)

Since inter-packet delay and transaction response turn-around time begins at the EOP->J transition, I think that detecting the bus IDLE state as soon as possible after transmitting a packet would be a good thing. It's exactly the place you want to be to apply an inter-packet delay or start your transaction turn-around timer, whichever is required. When you're receiving data, your byte fetch routine should be responsible for EOP detection.

At 7+ bits before J (K) flag set, it might, depending on your design approach, become something you have to work around because by the time you detect bus IDLE, you're already at the maximum allowed threshold for IP delay (6.5 - 7.5 bit periods for function, 7.5 bit periods for host), and the receiving device may time out waiting for the trailing packet. If it's determined that 1+ is too short, I would propose that it be no more than 2+, which is the minimum J periods after SE0 for an inter-packet delay.

garryj · 2016-06-14 22:41

Not really USB related, but while trying optimize my rx byte fetch routine I tried Chip's example from his doc, and I couldn't get it to work -- the Z flag would never get written. If I'm not mis-understanding how CMPX works, there may be a bug. Here's my test code for the P123-A9 board.

' Test CMPX Z flag not being written as expected.

dat
		org

		setword	dirb, ##$ffff, #0	' P123-A9 USER_LED15..0 used for feedback
		setword	outb, #0, #0
		mov	flag, #0
.status
		mov	d, status		' Fetch rx status
		setbyte	outb, status, #1	' Show rx status byte on USER_LED15..8
		shr	d, #8		wc	' Data byte to d, new byte toggle bit into C
	if_c	or	outb, #$80		' Show state of C on USER_LED7
		cmpx	flag, #1	wz	' flag = 0 + C = 1: CMPX flag, #1 -> Z, toggle flag?
						' flag = 0 + NC = 0: CPMX flag, #1 -> NZ, flag unchanged?
						' flag = 1 + C = 2: CMPX flag, #1 -> NZ, flag unchanged?
						' flag = 1 + NC = 1: CMPX flag, #1 -> Z, toggle flag?
	if_z	or	outb, #$40		' Show state of Z on USER_LED6
	if_z	xor	flag, #1		' New byte state change, so toggle flag
		or	outb, flag		' Show state of flag register
		waitx	##60_000_000
		setword	outb, #0, #0
		notb	status, #7		' Toggle new byte status flag
		jmp	#.status

status		long	$7784
flag		res	1
d		res	1

Tubular · 2016-06-14 23:06

jmg wrote: »

Ball removal sounds risky to me. I'd keep the balls, but not overdo the paste - just enough to ensure thermal path from pad-ball, and reflow on a hot pad should work.

Sorry I meant the balls on the pcb, rather than the balls on the chip. It looks like the HASL on the PCB has formed balls.

I guess if they socket into the solder stencil, that reduces the paste that can effectively be applied anyway, so perhaps its not an issue. Perhaps someone with BGA experience can enlighten us

jmg · 2016-06-14 23:11

garryj wrote: »

...
At 7+ bits before J (K) flag set, it might, depending on your design approach, become something you have to work around because by the time you detect bus IDLE, you're already at the maximum allowed threshold for IP delay (6.5 - 7.5 bit periods for function, 7.5 bit periods for host), and the receiving device may time out waiting for the trailing packet. If it's determined that 1+ is too short, I would propose that it be no more than 2+, which is the minimum J periods after SE0 for an inter-packet delay.

This may need some field-tests to check ?
Is there room to support both ? Or even room to define this value ?
2+ seems to make more sense, & could be worth testing.

garryj · 2016-06-14 23:22

jmg wrote: »

garryj wrote: »

...
At 7+ bits before J (K) flag set, it might, depending on your design approach, become something you have to work around because by the time you detect bus IDLE, you're already at the maximum allowed threshold for IP delay (6.5 - 7.5 bit periods for function, 7.5 bit periods for host), and the receiving device may time out waiting for the trailing packet. If it's determined that 1+ is too short, I would propose that it be no more than 2+, which is the minimum J periods after SE0 for an inter-packet delay.

This may need some field-tests to check ?
Is there room to support both ? Or even room to define this value ?
2+ seems to make more sense, & could be worth testing.

My thinking is that it's better to add time instead of trying to subtract it.

jmg · 2016-06-14 23:30

garryj wrote: »

My thinking is that it's better to add time instead of trying to subtract it.

Plus it is nice if code can work on both LS and FS USB with minimal changes.
(ideally, just the Baud setting ).
I don't recall all the details of rayman's delay issues.

garryj · 2016-06-15 00:30

jmg wrote: »

garryj wrote: »

My thinking is that it's better to add time instead of trying to subtract it.

Plus it is nice if code can work on both LS and FS USB with minimal changes.
(ideally, just the Baud setting ).
I don't recall all the details of rayman's delay issues.

I haven't tried to do anything at FS yet, but I keep it in mind when coding. Outside of the frame/frame number generation required by FS, I don't think there's a whole lot of difference. But to keep my brain from completely exploding, so far I've just been concentrating on the LS specs <g>.

cgracey · 2016-06-15 08:08

<error, deleted>

cgracey · 2016-06-15 08:15

<error, deleted>

cgracey · 2016-06-15 08:16

<error, deleted>

cgracey · 2016-06-15 08:16

<error, deleted>

cgracey · 2016-06-15 08:19

<error, deleted>

cgracey · 2016-06-15 08:47

garryj wrote: »
I'm to the point where I've got Control and Bulk IN/OUT transactions sorted, with CRC16 calcs/checks on all DATAx packets. It took me longer than expected to be able to handle DATAx packets if their payload size was less than the maximum allowed, as during device enumeration there are times the host is required to ask for more data than it's likely to receive. Distinguishing payload bytes from the CRC bytes in a time-sensitive way when there's a short packet put a little wrinkle in process when it came to verifying the incoming CRC...

Anyway, I got past that little hurdle, but along the way I had time to evaluate the how the receiver status flags are implemented. Though initially ambivalent, I now think that the receiver status bits should reflect, as close as possible, the state of the bus as applied by the transmitter. If I recall, the change discussion was about how many bit periods should follow the EOP->J transition before setting the J (K) status flag, in order to make inter-packet delay application more "automatic".

If I'm interpreting the USB spec properly, the transmitting device is allowed one full J (K) period to allow it time to turn off its output buffer. Given this, I'm now of the opinion that the receiver status J (K) flags should also reflect this, if possible. What I propose is this:
 [15:8] byte             - last byte received
 [7]    byte toggle      - cleared on SOP, toggled on each byte received
 [6]    error            - cleared on SOP, set on bit-unstuff error, EOP SE0 > 3 bits, or SE1
 [5]    EOP in           - cleared on SOP or 1+ bits of J or K, set on EOP (P2v9x is 7+)
 [4]    SOP in           - cleared on EOP, set on SOP (P2v9 is 7+; is a false SOP possible?)
 [3]    SE1 in (illegal) - cleared on !SE1, set on 1+ bits of SE1
 [2]    SE0 in (RESET)   - cleared on !SE0, set on 1+ bits of SE0
 [1]    K in (RESUME)    - cleared on !K, set on 1+ bits of K (P2v8+ is 7+ bits of K)
 [0]    J in (IDLE)      - cleared on !J, set on 1+ bits of J (P2v8+ is 7+ bits of J)
Since inter-packet delay and transaction response turn-around time begins at the EOP->J transition, I think that detecting the bus IDLE state as soon as possible after transmitting a packet would be a good thing. It's exactly the place you want to be to apply an inter-packet delay or start your transaction turn-around timer, whichever is required. When you're receiving data, your byte fetch routine should be responsible for EOP detection.

At 7+ bits before J (K) flag set, it might, depending on your design approach, become something you have to work around because by the time you detect bus IDLE, you're already at the maximum allowed threshold for IP delay (6.5 - 7.5 bit periods for function, 7.5 bit periods for host), and the receiving device may time out waiting for the trailing packet. If it's determined that 1+ is too short, I would propose that it be no more than 2+, which is the minimum J periods after SE0 for an inter-packet delay.

The reason I set those SOP/EOP indicators to persist for 7+ clocks is because at only 1+ clock, they would disappear too quickly. Remember that as soon as an EOP is registered, which is really 1+ clock of SE0, you need to start responding right away for full-speed turnaround. That SE0 will transition to J, then idle, in the next two to three bit clocks. For low-speed, you'd need to do some time elapsing in software, but for full-speed, just turning a response packet around ASAP, after seeing the EOP, will eat up your time budget.

We could make it so that for low-speed, the SOP/EOP signals go away on the next bit period, but full-speed needs more persistence for software response. I think it's best to leave it the way it is, just to keep things consistent. You will need to have some sense of how much time your software takes in both low-speed and full-speed modes, and program accordingly. I don't see any other practical way to deal with this.

cgracey · 2016-06-15 08:51

<error, deleted>

jmg · 2016-06-15 09:07

cgracey wrote: »

The reason I set those SOP/EOP indicators to persist for 7+ clocks is because at only 1+ clock, they would disappear too quickly. Remember that as soon as an EOP is registered, which is really 1+ clock of SE0, you need to start responding right away for full-speed turnaround.

Are you saying they react faster, then persist for that 7+ time ?
That's not quite the description garryj gave then ?
React-with-persist operation is probably ok ?

garryj · 2016-06-15 16:20

cgracey wrote: »

The reason I set those SOP/EOP indicators to persist for 7+ clocks is because at only 1+ clock, they would disappear too quickly. Remember that as soon as an EOP is registered, which is really 1+ clock of SE0, you need to start responding right away for full-speed turnaround. That SE0 will transition to J, then idle, in the next two to three bit clocks. For low-speed, you'd need to do some time elapsing in software, but for full-speed, just turning a response packet around ASAP, after seeing the EOP, will eat up your time budget.

We could make it so that for low-speed, the SOP/EOP signals go away on the next bit period, but full-speed needs more persistence for software response. I think it's best to leave it the way it is, just to keep things consistent. You will need to have some sense of how much time your software takes in both low-speed and full-speed modes, and program accordingly. I don't see any other practical way to deal with this.

Thanks, Chip. I understand what you're saying in regard to EOP clearing at 7+ bits. If you have your trailing packet or response ready to go when the EOP flag clears, you're good to go. It would be the same for starting a turn-around timer -- you just need to take into account the elapsed J bits. Does my aged brain have it right?

I'm still a little fuzzy on SOP and 7+ J/K since it is cleared on EOP, it also makes it mutually exclusive with J/K.

Given that EOP is cleared after 7+ bits of J, which handles IP and response situations, is it still necessary to hold 7+ bits after SE0->J/K? For instance, if you're a sniffer, don't you want to catch J/K as quickly as possible?

Thanks!

Rayman · 2016-06-15 17:45

If we had GCC going already, we could look into porting over a Linux USB HID driver, right?

garryj · 2016-06-15 18:54

Rayman wrote: »

If we had GCC going already, we could look into porting over a Linux USB HID driver, right?

I think bus I/O will always be pasm, certainly at full-speed. But if we have a well defined IRP interface, I don't see why not.

Rayman · 2016-06-15 20:00

Got some solder paste on the HyperRam ball pads...
Maybe it will work...

cgracey · 2016-06-15 22:05

garryj wrote: »

cgracey wrote: »

The reason I set those SOP/EOP indicators to persist for 7+ clocks is because at only 1+ clock, they would disappear too quickly. Remember that as soon as an EOP is registered, which is really 1+ clock of SE0, you need to start responding right away for full-speed turnaround. That SE0 will transition to J, then idle, in the next two to three bit clocks. For low-speed, you'd need to do some time elapsing in software, but for full-speed, just turning a response packet around ASAP, after seeing the EOP, will eat up your time budget.

We could make it so that for low-speed, the SOP/EOP signals go away on the next bit period, but full-speed needs more persistence for software response. I think it's best to leave it the way it is, just to keep things consistent. You will need to have some sense of how much time your software takes in both low-speed and full-speed modes, and program accordingly. I don't see any other practical way to deal with this.

Thanks, Chip. I understand what you're saying in regard to EOP clearing at 7+ bits. If you have your trailing packet or response ready to go when the EOP flag clears, you're good to go. It would be the same for starting a turn-around timer -- you just need to take into account the elapsed J bits. Does my aged brain have it right?

I'm still a little fuzzy on SOP and 7+ J/K since it is cleared on EOP, it also makes it mutually exclusive with J/K.

Given that EOP is cleared after 7+ bits of J, which handles IP and response situations, is it still necessary to hold 7+ bits after SE0->J/K? For instance, if you're a sniffer, don't you want to catch J/K as quickly as possible?

Thanks!

Yes, a sniffer might want an instantaneous reporting of line states, but something trying to have a conversation needs things slowed down, a bit, so that it has time to register and respond.

There are some things that shouldn't happen in a normal USB connection, but the USB smart pin mode has to be ready for anything. That's why there are some clearing cases that don't seem necessary.

I just looked over the USB mode and I don't know what to change to improve it. There is probably room for improvement, but I don't know where, given the complexity of USB and my lack of experience. If I were to dive into USB programming, I would probably find some things that need tweaking, but I don't know what they are right now. If anyone here can come up with some improvement ideas, we can implement them.

garryj · 2016-06-16 00:42

Yes, so far it's been one of the more challenging projects I've taken up. After your explanation I realized I had been looking at the USB status with a reactive instead of proactive mindset. I'm currently going through my inter-packet and turn-around code with that perspective and there are definitely areas where things can be tightened up.

T Chap · 2016-06-16 15:33

Rayman, any luck soldering yet? Hope to see some pics after it's done.

Rayman · 2016-06-16 17:55

I soldered it. Think it's OK, but hard to say until I test it. Hope to have time for that soon.

Here's a view from the edge after soldering:

jmg · 2016-06-16 20:04

Rayman wrote: »

I soldered it. Think it's OK, but hard to say until I test it. Hope to have time for that soon.

Cool.
I guess a simple first test is to do a rolling value write-then-read of a fixed small 1~4 byte addresses, at full speed.
Then, start stretching various things in time, to see where it breaks.

Some thinks it refreshes during CS=H, but my reading is it refreshes only in the few cycles between Address & data.
I think continual raster read is ok, with the caveat you lose other RAM areas.

I guess a fast enough read, might be able to support 2 areas where one is enabled to LCD and the other 'waits' - 64ms is the time budget.
At room temp, I'd expect to have to go >> 64ms to trigger refresh fail.

Yanomani · 2016-06-17 03:58

jmg wrote: »

Rayman wrote: »

I soldered it. Think it's OK, but hard to say until I test it. Hope to have time for that soon.

Cool.
I guess a simple first test is to do a rolling value write-then-read of a fixed small 1~4 byte addresses, at full speed.
Then, start stretching various things in time, to see where it breaks.

Some thinks it refreshes during CS=H, but my reading is it refreshes only in the few cycles between Address & data.
I think continual raster read is ok, with the caveat you lose other RAM areas.

I guess a fast enough read, might be able to support 2 areas where one is enabled to LCD and the other 'waits' - 64ms is the time budget.
At room temp, I'd expect to have to go >> 64ms to trigger refresh fail.

Hi jmg et all

Perhaps the following document, from Cypress, of course, could bring some clear info, when it comes to HyperRam auto refresh operation.

AN209853

Based not only on it, but by reading almost all available data from Cypress and ISSI on that subject, I'm convinced of the following:

- If the device has power applied and stable, isn't subjected to an active reset (RESET# pin level is HIGH) and its CS# pin is HIGH too, them IT CAN AND WILL BE performing automatic and internally timed self refresh operations, with the following exception;
- automatic self refresh operations are suspended, and thus ALL ram contents CAN BE lost, if the device is put into its DEEP POWER DOWN mode of operation, thru the programming of a "0" value at bit CR0[15]. The ALL ... CAN BE is only conditional in the following sense:

- If the device is put into its DEEP POWER DOWN mode, SOME of its data CAN surely be lost, mainly due to the total time involved into entering, then leaving Deep Power Down mode, that violates the maximum allowed time due to a(many) row(s) to be auto or programmatically refreshed.(please, for more information, consult target device's datasheet DEEP POWER DOWN section);
- during normal operation (CS# being actively driven by the HyperBus master device (P2 in that case)), where random read and/or write cycles are happening, two situations can occur:

- CS# is driven LOW for short periods, only enough to enable read and/or write operations to take place, then driven HIGH afterwards, thus providing enough time to enable self refresh operations to take place (the timing and conditions for it to occur, can be derived from reading target device's datasheet);

- when CS# is driven low by the HyperBus master, during the Command-Address (CA) transfer phase, AND the device was programmed to use variable latency, IF there is an auto refresh operation in progress, THEN RWDS will be brought HIGH by the device (HyperRam), during the CA phase, to signal the need of waiting some more clock cycles (Latency Count), in order to allow enough time for the CURRENT INTERNALY TIMED row refresh operation to take place or terminate, BEFORE any data can be read from or write into the device. To simplify interface logic (programming/design), a fixed latency can be programmed (see: Configuration Register 0 Bit Assignments into device's datasheet), at the expense of spending some more time into each read or write operation.

- only one automatic refresh operation of a single row (maybe two, if page boundaries are being crossed by the refresh counter/address circuit) can be automatically inserted during each read/write operation, just after the CA phase and before the data phase.

- when fixed latency is programmed (CR0[3]=1), there is always an extra latency cycle count inserted, between the CA phase and the data phase, that can be interpreted as a "window" for the self refresh circuit to act and do its job, before actual data movement can take place;

- If one intends to do burst reads or writes, whose lenght can be longer than the maximum allowed (tCMS, tCSM, these two are defined and can be very confusing along both Cypress and Issi datasheets) for intended device and temperature, then it's better for the system designer to take the burden of programatically doing refresh operations, interspersed with all the other read and write operations, as needed.

- since 512 words that forms a row are internally transfered into an 8192 bit register, each time a single word of that row is requested to be read or write by HyperBus interface logic, and written back afterwards, when operations at that row are finished, programatically controlled refresh can be done thru reading or writing at least a single word from each of the 8192 rows (8M devices), but sure it will take time, since an entire CA phase must preceed each operation.

- in order to halve the number of CA phases and "speed up" programatically controlled refresh operations, one can set a linear burst read operation of two words, the last from a row and the first from the next row. That way, two rows are refreshed within a single read sequence, effectivelly halving the interface time consumed to completely refresh the whole memory.

- other than the above mentioned, no pin states or conditions are assumed/checked or will interfere in self refresh operations.

- along all my readings there is no mention, at any of the white papers I could reach, to some way to "track" where the automatic internal refresh counter points, or to interfere in its contents, nor if it is initiated to any specific value at power on, reset or leaving deep power down mode.

- the clocking limits for the devices (e.g. 166 MHz) are an allowed maximum only. One can drive the clock waveform at a rate as low as she/he wants. Also steer its rate up and down, during any phase of an operation, without concern, as long the upper limits are not reached, and the appropriate latency count is selected for the intended rate. Sure, at very low rates, the time budget needed for refreshing the whole memory array (either automatically or programatically) will fairly exceed device's specifications and cause data loss.

Henrique

jmg · 2016-06-17 06:14

Yanomani wrote: »

Based not only on it, but by reading almost all available data from Cypress and ISSI on that subject, I'm convinced of the following:

- If the device has power applied and stable, isn't subjected to an active reset (RESET# pin level is HIGH) and its CS# pin is HIGH too, them IT CAN AND WILL BE performing automatic and internally timed self refresh operations, with the following exception;

It would be nice if that were true, as that widens use cases.
Should be clear enough when the real devices are fired up...

Sometimes I think IC vendors do such a poor job of writing data sheets, that they should always paste code that works...

USB Testing

Comments