Garry,
I recommend testing with the new v33i FPGA files with the corrected SRAM dual-porting of lutRAM.
EDIT: Added v33i.
Yes, and look at the usb_turnaround.spin2 file for guidance on USB mode settings.
If you are using LUT sharing, you definitely need to try v33i.
Yeah, that's the place to start. The FPGA and silicon code is in separate branches, with the latter getting changes that eliminated the "shortcuts" I had taken due to 80MHz constraints. I've gotten behind in keeping the FPGA branch updated, so I guess I'd best get to it.
Stupid question time regarding the LUT sharing issue: can the bug ramifications have an effect outside of LUT read/write? If my USB IRP request tokens passed via LUT share were getting mangled, things would breaking in an entirely different way.
Garry,
I recommend testing with the new v33i FPGA files with the corrected SRAM dual-porting of lutRAM.
EDIT: Added v33i.
Yes, and look at the usb_turnaround.spin2 file for guidance on USB mode settings.
If you are using LUT sharing, you definitely need to try v33i.
Yeah, that's the place to start. The FPGA and silicon code is in separate branches, with the latter getting changes that eliminated the "shortcuts" I had taken due to 80MHz constraints. I've gotten behind in keeping the FPGA branch updated, so I guess I'd best get to it.
Stupid question time regarding the LUT sharing issue: can the bug ramifications have an effect outside of LUT read/write? If my USB IRP request tokens passed via LUT share were getting mangled, things would breaking in an entirely different way.
The LUT bug only occured when you read a location that was simultaneously being written.
If you are not suffering from the LUT bug, to begin with, then keep developing on the existing silicon.
It would be good to give v33i a quick check, though, to make sure that I didn't break anything in rearranging the USB modes, in order to accommodate the new ADC and scope modes.
.... There is something very mysterious happening during the rx->tx transition where the device returns the command status and the host sends another data block. If a bad CRC happens, it's always here.
What is the appx failure rate you see, and does it change if you warm up the P2 ?
What is the overall Bad/Good USB packet ratio, for that ~33.5GB non-stop ?
The bad CRC issue rate rises as the sysclock frequency is increased. I don't think it's due to heat, as it starts showing up around 160MHz.
.... When a CRC error did happen, it was always in the command block wrapper for a new 32KB of data (see attached image).
It doesn't happen at every new block -- there could be runs of many consecutive blocks without a single error. Applying the same "fix" used with reads that added more wait cycles between the tx byte routine's AKPIN #DP and WYPIN #DM got the CRC errors to disappear.
Could that be related to when bit-stuff action is taken ? Your highlighted 0x9d & 0x49 differ by a single inserted bit ?
I hadn't given it being a bit stuff issue much thought, as the write test includes a sector of all $FF bytes, so if bit stuffing was an issue, I thought it would likely show up there, too?
Stupid question time regarding the LUT sharing issue: can the bug ramifications have an effect outside of LUT read/write? If my USB IRP request tokens passed via LUT share were getting mangled, things would breaking in an entirely different way.
Stupid question time regarding the LUT sharing issue: can the bug ramifications have an effect outside of LUT read/write? If my USB IRP request tokens passed via LUT share were getting mangled, things would breaking in an entirely different way.
Yes, those H_EVENT/D_EVENT states will be momentarily scrambled if both read and write coincide. WRLUT writes correct data (I wasn't sure of this detail at the time) but RDLUT doesn't see it correctly if they coincide on the same clock.
One way to test for the issue is renumber them so that H_READY/D_READY become non-zero. I believe this will increase chance of mangling on both state change edges.
I found that the LUT bug didn't break my event-counter based buffer test - if you are waiting for a location to increment to a particular value you seem to be safe as the glitch is a mix of old and new bits and will never erroneously match a higher value than has been written during an incrementing sequence.
Stupid question time regarding the LUT sharing issue: can the bug ramifications have an effect outside of LUT read/write? If my USB IRP request tokens passed via LUT share were getting mangled, things would breaking in an entirely different way.
Yes, those H_EVENT/D_EVENT states will be momentarily scrambled if both read and write coincide. WRLUT writes correct data (I wasn't sure of this detail at the time) but RDLUT doesn't see it correctly if they coincide on the same clock.
One way to test for the issue is renumber them so that H_READY/D_READY become non-zero. I believe this will increase chance of mangling on both state change edges.
Good idea, if you pass tokens as one-hot encoded, you can detect and flag any aperture effects, and if you know the order they should occur, code might even be able to proceed as it can decide the new bit.
I found that the LUT bug didn't break my event-counter based buffer test - if you are waiting for a location to increment to a particular value you seem to be safe as the glitch is a mix of old and new bits and will never erroneously match a higher value than has been written during an incrementing sequence.
In your case all you needed was a trigger to say a change has occurred, I think. Corrupted trigger content almost didn't matter, just as long as it changed.
Edit: BTW, I received my ES accessory set today and the serial host add-on works like a charm! But maybe in a future revision, both host and device boards could have test points for D-/D+?
If you can get some stackable headers, you can use these to extend the pins through the plug in ES boards, so you can still scope the P2 pins
Got the silicon branch merged into the USB mass storage FPGA-based code. I started with the FPGA v32i/j build first to set a baseline.
From the analyzer view, there appeared to be no functional differences between v32i and v33i.
My three "flaky" v3.x usb parts were still flaky on both builds.
The rest of the devices all enumerated and the FAT volume access tests all passed, so it's looking good regarding the USB smart pin mode changes
On v32i I hadn't been experiencing any symptoms of the LUT sharing same-clock read/write issue, and it was status quo on v33i. I guess that's good
The silicon at 80MHz behaved the same, with one exception. I found that setting bit 16 to enable clocking when configuring the USB smart pins does have a positive affect, as one of my "flaky" devices that refused to enumerate on the FPGA did so nicely, and without error, with clocking enabled. As I expected, when I cranked silicon sysclock up to 180MHz, the OUT crc fail remains an issue. But maybe, since the clock bit has no effect on the FPGA, what Chip described in this post could still be a factor? https://forums.parallax.com/discussion/comment/1462620/#Comment_1462620
Note that this particular CRC fail that can be reproduced on silicon has never popped up on the FPGA, but we're stuck at 80MHz there...
@garryj
Tubular and I ran some USB tests today with your code "USBBootMouseKbdLite.spin" on a P2_ES.
Using the accessory USB board 7 out of 8 keyboards worked Ok.
3 mice (1 cordless) all worked Ok.
<Full-Speed device connected.>
Vendor ID: SanDisk
Product ID: Cruzer Facet
Version level: 1.00
Media is removable
SCSI version is ANSI X3.131:1994 (SCSI-2) or higher
Highest LBA: 15630335
Sector size: 512
Checking media for a FAT file system...
Partition type: 0x0B
Cluster size: 16384
Volume base sector: 32
Reserved sector count: 18
FSInfo base sector (in reserved): 33
FAT region base sector: 50
Sector count of one FAT: 3815
FAT region sector count: 7630
RootDir base sector: 7680
RootDir cluster#: 2
Dir/file/data base sector: 7680
Count of data region clusters: 488208
Count of free clusters: 861
FSInfo next free cluster: 64677
Count of data region sectors: 15622656
Count of volume sectors: 15630304
FAT32 volume mounted.
A:\>
<Full-Speed device connected.>
Vendor ID: SanDisk
Product ID: Cruzer Facet
Version level: 1.00
Media is removable
SCSI version is ANSI X3.131:1994 (SCSI-2) or higher
Highest LBA: 123174911
Sector size: 512
Checking media for a FAT file system...
Partition type: 0x0C
Cluster size: 32768
Volume base sector: 32
Reserved sector count: 14
FSInfo base sector (in reserved): 33
FAT region base sector: 46
Sector count of one FAT: 15033
FAT region sector count: 30066
RootDir base sector: 30112
RootDir cluster#: 2
Dir/file/data base sector: 30112
Count of data region clusters: 1924137
Count of free clusters: 869954
FSInfo next free cluster: 32287
Count of data region sectors: 123144800
Count of volume sectors: 123174880
FAT32 volume mounted.
A:\>
<Device disconnected>.
#:\>
<Full-Speed device connected.>
Vendor ID: SanDisk
Product ID: Cruzer Switch
Version level: 1.27
Media is removable
SCSI version is ANSI X3.131:1994 (SCSI-2) or higher
Highest LBA: 15431337
Sector size: 512
Checking media for a FAT file system...
Partition type: 0x0B
Cluster size: 16384
Volume base sector: 32
Reserved sector count: 20
FSInfo base sector (in reserved): 33
FAT region base sector: 52
Sector count of one FAT: 3766
FAT region sector count: 7532
RootDir base sector: 7584
RootDir cluster#: 2
Dir/file/data base sector: 7584
Count of data region clusters: 481992
Count of free clusters: 401636
FSInfo next free cluster: 15
Count of data region sectors: 15423754
Count of volume sectors: 15431306
FAT32 volume mounted.
A:\>
<Device disconnected>.
#:\>
<Full-Speed device connected.>
Vendor ID: Verbatim
Product ID: STORE N GO
Version level: PMAP
Media is removable
Device does not claim conformance to any SPC standard
No data...
Bulk-IN endpoint STALL...
SCSI command error: Failed
ASC: 0x28, ASCQ: 0x00
Unit requires attention: not ready to ready change, or medium may have changed
Command retry...
Highest LBA: 15653375
Sector size: 512
Checking media for a FAT file system...
Partition type: 0x0B
Cluster size: 4096
Volume base sector: 8064
Reserved sector count: 2274
FSInfo base sector (in reserved): 8065
FAT region base sector: 10338
Sector count of one FAT: 15247
FAT region sector count: 30494
RootDir base sector: 40832
RootDir cluster#: 2
Dir/file/data base sector: 40832
Count of data region clusters: 1951568
Count of free clusters: 1443573
FSInfo next free cluster: 16534
Count of data region sectors: 15612544
Count of volume sectors: 15645312
FAT32 volume mounted.
A:\>
<Device disconnected>.
#:\>
Are there any lingering mysteries about USB viability? It seems there is maybe a need to insert delays at some junctures at high speeds?
The inter-packet and turn-around delays are specified in bit times, so the higher timing resolution due to the silicon's speed has made any packet separation issues that could/would pop up at 80MHz disappear. I'm pretty sure that having the extra cycles available in silicon was why my "flaky" 3.x parts began working reliably due to more accurate IP and TAT timings.
I've only been working in the host USB mode, so the internal pull-up resistors for low/full speed device mode have not been directly tested, yet.
The one big mystery left is the "why" behind the "bad CRC on OUT". A CRC error on transmit was something I never saw during FPGA development. So far, the only thing I know for sure is that it popped up when sysclock was increased. The "fix" of increasing the delay between the AKPIN and WYPIN at byte tx is something that's outside of the realm of "regular" USB bus timings, as the packet spacing when this glitch triggers is within acceptable limits, and it's happening in a place I can't get to via code (other than the "fix"). For media writes, if the CRC error is cleared using the USB transaction retry mechanism, the 512 OUT data packets that follow are transmitted without error via the same routine.
What is the buffer size of the USB transmitter? I have been assuming that it is at least two bytes? TX is unique in that it's the only USB smartpin action where the upper pin is polled and acknowledge and the tx byte is written to the lower pin. The analyzer shows that bit(s) are definitely getting affected and the device does what it's supposed to do (ignore the corrupted packet). But I'm out of ideas regarding what I could possibly do in code other than the delay bandage?
What is the buffer size of the USB transmitter? I have been assuming that it is at least two bytes?
Chip will elaborate, but I don't think the Smart Pins have a FIFO, so you cannot write 2 bytes very closely together.
It can do continual sends, which means a shifter, and a queue buffer.
The queue buffer will load into the shifter, usually on a next-baud-clock, so you have almost a whole char time to next load the queue buffer, before underflow occurs.
You probably could load 2 bytes, spaced at least one baud-clock time apart, if you know the shifter was empty at the start.
That does suggest a mechanism where faster SysCLKs could spawn failures, as the USB baud-clock is not changing.
The crc error always involves data byte #4, 5, 6 or 7. So the bus sequence would be:
SOP + OUT packet (3 bytes)
Inter-packet delay in bit periods (spec is 2.0 to 7.5)
SOP + DATAx PID + data bytes (zero to 64) + packet CRC (2 bytes)
Maybe something could be happening at the NCO roll-over point? But if that were the case, you'd think CRC errors would pop up in other OUTS of the transfer, and not just in the first OUT of the transfer. That's the part that's driving me nuts :zombie:
..
Maybe something could be happening at the NCO roll-over point? But if that were the case, you'd think CRC errors would pop up in other OUTS of the transfer, and not just in the first OUT of the transfer. That's the part that's driving me nuts :zombie:
Using Chips NCO formula, finds these USB sweet spots (NCO error free)
96 & 192 are clean binary so they have no jitter and no errors.
They could be useful SysCLK speed check points, as the BUS timings should be identical for both, but the SW path delays will be 2:1 ?
128M is MHz exact, but achieves that by NCO /11/11/10/11/11/10 repeating, so has jitter.
180M is 9*20, for low VCO jitter, and has a 12MHz error of 15.25ppm, because one in every 4369 cycles, it divides by 16, adding 5.55ns (NCO artifact)
170M is 17*10M, and has 12MHz eror of 15.26ppm, but here it /14 ~ 5/6th of the time, and /15 ~1/6th
BTW @garryj
I made a few changes to your code to get things going on the P2_ES USB accessory board.
I added a basepin constant for the desired IO group base pin.
then a few changes to use the constant
DM = basepin+2 ' DM is "The Brain"
DP = basepin+3 ' DP is passive
..and
HOST_ACTIVE_LED = basepin ' Blinks while in the host's main processing loop
DRIVER_ACTIVE_LED = basepin+4 ' Pulses during mouse activity
Does this CRC error also occur on the P2D2 with 12MHz oscillator?
Good question, but I don't have a P2D2 board to test with. If anyone can give it a try, this post explains how you should be able to make it break: https://forums.parallax.com/discussion/comment/1462709/#Comment_1462709
Edit: and a 16GB FAT32 volume should be big enough to trigger it.
FYI, for those wanting an easier time probing with scope leads, etc - I found the rear shell of the connector housing on the Accessory Set's Serial Host board pops off easily - then, at least the top port's leads can be latched onto.
Comments
I recommend testing with the new v33i FPGA files with the corrected SRAM dual-porting of lutRAM.
EDIT: Added v33i.
Yes, and look at the usb_turnaround.spin2 file for guidance on USB mode settings.
If you are using LUT sharing, you definitely need to try v33i.
Yeah, that's the place to start. The FPGA and silicon code is in separate branches, with the latter getting changes that eliminated the "shortcuts" I had taken due to 80MHz constraints. I've gotten behind in keeping the FPGA branch updated, so I guess I'd best get to it.
Stupid question time regarding the LUT sharing issue: can the bug ramifications have an effect outside of LUT read/write? If my USB IRP request tokens passed via LUT share were getting mangled, things would breaking in an entirely different way.
The LUT bug only occured when you read a location that was simultaneously being written.
If you are not suffering from the LUT bug, to begin with, then keep developing on the existing silicon.
It would be good to give v33i a quick check, though, to make sure that I didn't break anything in rearranging the USB modes, in order to accommodate the new ADC and scope modes.
If you suspect LUT issues, and you have the cycles spare, you can 'read until two the same' which is a SW workaround of the same-sysclk RD & WR
Yes, those H_EVENT/D_EVENT states will be momentarily scrambled if both read and write coincide. WRLUT writes correct data (I wasn't sure of this detail at the time) but RDLUT doesn't see it correctly if they coincide on the same clock.
One way to test for the issue is renumber them so that H_READY/D_READY become non-zero. I believe this will increase chance of mangling on both state change edges.
Good idea, if you pass tokens as one-hot encoded, you can detect and flag any aperture effects, and if you know the order they should occur, code might even be able to proceed as it can decide the new bit.
In your case all you needed was a trigger to say a change has occurred, I think. Corrupted trigger content almost didn't matter, just as long as it changed.
If you can get some stackable headers, you can use these to extend the pins through the plug in ES boards, so you can still scope the P2 pins
They sell 6 pin headers for arduino shields, there are 2 in a pack, such as
https://www.freetronics.com.au/products/stackable-arduino-shield-headers
From the analyzer view, there appeared to be no functional differences between v32i and v33i.
My three "flaky" v3.x usb parts were still flaky on both builds.
The rest of the devices all enumerated and the FAT volume access tests all passed, so it's looking good regarding the USB smart pin mode changes
On v32i I hadn't been experiencing any symptoms of the LUT sharing same-clock read/write issue, and it was status quo on v33i. I guess that's good
The silicon at 80MHz behaved the same, with one exception. I found that setting bit 16 to enable clocking when configuring the USB smart pins does have a positive affect, as one of my "flaky" devices that refused to enumerate on the FPGA did so nicely, and without error, with clocking enabled. As I expected, when I cranked silicon sysclock up to 180MHz, the OUT crc fail remains an issue. But maybe, since the clock bit has no effect on the FPGA, what Chip described in this post could still be a factor?
https://forums.parallax.com/discussion/comment/1462620/#Comment_1462620
Note that this particular CRC fail that can be reproduced on silicon has never popped up on the FPGA, but we're stuck at 80MHz there...
Tubular and I ran some USB tests today with your code "USBBootMouseKbdLite.spin" on a P2_ES.
Using the accessory USB board 7 out of 8 keyboards worked Ok.
3 mice (1 cordless) all worked Ok.
Are there any lingering mysteries about USB viability? It seems there is maybe a need to insert delays at some junctures at high speeds?
I've only been working in the host USB mode, so the internal pull-up resistors for low/full speed device mode have not been directly tested, yet.
The one big mystery left is the "why" behind the "bad CRC on OUT". A CRC error on transmit was something I never saw during FPGA development. So far, the only thing I know for sure is that it popped up when sysclock was increased. The "fix" of increasing the delay between the AKPIN and WYPIN at byte tx is something that's outside of the realm of "regular" USB bus timings, as the packet spacing when this glitch triggers is within acceptable limits, and it's happening in a place I can't get to via code (other than the "fix"). For media writes, if the CRC error is cleared using the USB transaction retry mechanism, the 512 OUT data packets that follow are transmitted without error via the same routine.
What is the buffer size of the USB transmitter? I have been assuming that it is at least two bytes? TX is unique in that it's the only USB smartpin action where the upper pin is polled and acknowledge and the tx byte is written to the lower pin. The analyzer shows that bit(s) are definitely getting affected and the device does what it's supposed to do (ignore the corrupted packet). But I'm out of ideas regarding what I could possibly do in code other than the delay bandage?
USB Mass Storage:
https://forums.parallax.com/discussion/comment/1462546/#Comment_1462546
USB Mouse/Keyboard:
https://forums.parallax.com/discussion/comment/1462633/#Comment_1462633
It can do continual sends, which means a shifter, and a queue buffer.
The queue buffer will load into the shifter, usually on a next-baud-clock, so you have almost a whole char time to next load the queue buffer, before underflow occurs.
You probably could load 2 bytes, spaced at least one baud-clock time apart, if you know the shifter was empty at the start.
That does suggest a mechanism where faster SysCLKs could spawn failures, as the USB baud-clock is not changing.
SOP + OUT packet (3 bytes)
Inter-packet delay in bit periods (spec is 2.0 to 7.5)
SOP + DATAx PID + data bytes (zero to 64) + packet CRC (2 bytes)
Maybe something could be happening at the NCO roll-over point? But if that were the case, you'd think CRC errors would pop up in other OUTS of the transfer, and not just in the first OUT of the transfer. That's the part that's driving me nuts :zombie:
Using Chips NCO formula, finds these USB sweet spots (NCO error free)
96M/(2^16/round(12M/96M* 0x10000)) = 12000000
128M/(2^16/round(12M/128M* 0x10000)) = 12000000
192M/(2^16/round(12M/192M* 0x10000)) = 12000000
96 & 192 are clean binary so they have no jitter and no errors.
They could be useful SysCLK speed check points, as the BUS timings should be identical for both, but the SW path delays will be 2:1 ?
128M is MHz exact, but achieves that by NCO /11/11/10/11/11/10 repeating, so has jitter.
180M is 9*20, for low VCO jitter, and has a 12MHz error of 15.25ppm, because one in every 4369 cycles, it divides by 16, adding 5.55ns (NCO artifact)
170M is 17*10M, and has 12MHz eror of 15.26ppm, but here it /14 ~ 5/6th of the time, and /15 ~1/6th
I made a few changes to your code to get things going on the P2_ES USB accessory board.
I added a basepin constant for the desired IO group base pin.
then a few changes to use the constant and added 1 instruction to enable the USB.
https://forums.parallax.com/discussion/comment/1462709/#Comment_1462709
Edit: and a 16GB FAT32 volume should be big enough to trigger it.
Cheers