XDIV 1 or 2 is best for low jitter as jmg said.
I'd be surprised if jitter is the culprit, but definitely worth trying...
Looks like you may be correct. No joy in Mudville, as testing with a WAITX #3 between the AKPIN #DP and WRPIN #DM at 120, 180 and 260MHz, the results were the same as my earlier testing:
120 - my troublesome 3.x parts were still flaky, the others had clean runs. I still don't know if this could be related to the OUT CRC error issue because the reason they are failing is due to orphaned packets, or the device rejecting both IN/OUT transactions with a count of NAKs that break my (rather arbitrary) limit. In the bulk-only USB world, NAK from the device is basically an "I don't like this packet" that could mean a lot of things that I haven't had to time to look into. Since it affects only three of my test devices, it's been a pretty low priority issue.
180 - this is the sweet spot. All were successful with not a single complaint from the analyzer, whether the WATIX was #3 or #20.
260 240 - eventual OUT CRC error on large consecutive sector reads. A bump to WAITX #20 is the "fix".
Regardless, I'll be changing all frequencies in my test group to use the preferred XDIV values. Hopefully this might remove a potential issue down the road.
..
120 - my troublesome 3.x parts were still flaky, the others had clean runs. I still don't know if this could be related to the OUT CRC error issue because the reason they are failing is due to orphaned packets, or the device rejecting both IN/OUT transactions with a count of NAKs that break my (rather arbitrary) limit. In the bulk-only USB world, NAK from the device is basically an "I don't like this packet" that could mean a lot of things that I haven't had to time to look into. Since it affects only three of my test devices, it's been a pretty low priority issue.
180 - this is the sweet spot. All were successful with not a single complaint from the analyzer, whether the WATIX was #3 or #20.
260 - eventual OUT CRC error on large consecutive sector reads. A bump to WAITX #20 is the "fix".
Regardless, I'll be changing all frequencies in my test group to use the preferred XDIV values. Hopefully this might remove a potential issue down the road.
Did you mean 240MHz here ?
/5 is certainly better than /20.
180MHz is going to have more accurate placement, than 120MHz, but you would expect 240MHz to be even better still, from an initial precision viewpoint, but maybe up here the P2 is getting 'too quick' for other systems ?
At those MHz values are the failures 100% of the time, or some lower failure rate ?
Jitter issues are going to fail on a more statistical manner, with some good passes, and some worse ones.
For some more margin testing, there is fine adjust possible via the Xtal Load C - below are the ppm movements I measured, at modest temperatures.
HUBSET ##%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS 'set clock mode
where
E - Enables PLL
DDDDDD for PFD = XI/(D+1) 1..64 division of XI pin frequency
MMMMMMMMMM for VCO/(M+1) = PFD 1..1024 division of VCO frequency
PPPP for SysCLK from VCO 15 == VCO/1
CC Xtal pins mode
SS Osc Select : RCFAST : RCSLOW : XI : PLL
%CC XI status XO status XI/XO Z XI/XO caps Measures ppm change PLL
%00 ignored float Hi-Z OFF
%01 input 600-ohm drive 1M-ohm OFF -> 1000.1439Hz +144 ppm
%10 input 600-ohm drive 1M-ohm 15pF per pin -> 999.9933Hz -6.700 ppm
%11 input 600-ohm drive 1M-ohm 30pF per pin -> 999.9472Hz -53 ppm
I'm wondering whether we shouldn't get one of OzProp's Si5351's driving into XI directly, and eliminate the PLL altogether for this kind of testing, just to rule out the basics. Once we have something stable we can try different perturbations such as dividers and temperature sweeps
I think they can generate up to 200 MHz, and we already have a P1 driver that should be easy to port to P2
I'm thinking use one of Von Szarvas's new proto boards to load the adafruit si5351a module onto, with a single link wire across to XI
I just thought of something that might make a difference: Enable clocking on the USB pins by setting bit 16 of the long value that you WRPIN to each USB pin. That will cause the OUT signal to be registered, keeping it later than the %HHHLLL signals which the USB smart pin mode controls.
In the next silicon, we will need to apply timing constraints to tightly group the %HHHLLL signals, which are always asynchronous, so that they all transition at nearly the same time. The troubles you have been having may be related to the same kind of glitch phenomenon that made the pull-up/down sensor code in the ROM misread the boot pins. We could have varying race conditions in different pins with the %HHHLLL signals.
I just thought of something that might make a difference: Enable clocking on the USB pins by setting bit 16 of the long value that you WRPIN to each USB pin. That will cause the OUT signal to be registered, keeping it later than the %HHHLLL signals which the USB smart pin mode controls.
In the next silicon, we will need to apply timing constraints to tightly group the %HHHLLL signals, which are always asynchronous, so that they all transition at nearly the same time. The troubles you have been having may be related to the same kind of glitch phenomenon that made the pull-up/down sensor code in the ROM misread the boot pins. We could have varying race conditions in different pins with the %HHHLLL signals.
Sadly, that nibble-sized bus glitch is still there. Putting the workaround back made things good again (with bit 16 set), so it looks like there is no negative effect, either.
Garryj, do you have a scope with which you can actually monitor the DM and DP signals?
If there are any race conditions, you should see some level glitches here and there.
Do you know if USB works equally well on every set of pins? I'm thinking that each pin has a different set of timings for those asynchronous %HHHLLL signals.
I'm wondering whether we shouldn't get one of OzProp's Si5351's driving into XI directly, and eliminate the PLL altogether for this kind of testing, just to rule out the basics. Once we have something stable we can try different perturbations such as dividers and temperature sweeps
I think they can generate up to 200 MHz, and we already have a P1 driver that should be easy to port to P2
I'm thinking use one of Von Szarvas's new proto boards to load the adafruit si5351a module onto, with a single link wire across to XI
Yes, they are good to 200MHz.
Passing a 200MHz clock is more of a challenge, but you certainly could run a Si5351A to (say) 48MHz which ups the PFD even more, and a Si5351 allows some testing of clock tolerance, and it has a spread spectrum mode, which could be interesting to try.
I'm wondering whether we shouldn't get one of OzProp's Si5351's driving into XI directly, and eliminate the PLL altogether for this kind of testing, just to rule out the basics. Once we have something stable we can try different perturbations such as dividers and temperature sweeps
I think they can generate up to 200 MHz, and we already have a P1 driver that should be easy to port to P2
I'm thinking use one of Von Szarvas's new proto boards to load the adafruit si5351a module onto, with a single link wire across to XI
I can generate signals up to 1Ghz with very, very little phase noise from my Communications Service Monitor. Can we directly drive XI at 200Mhz? What level of signal is required?
Garry,
I've just started looking for uses in your code of known issues with the hardware. I've found one potential already - lutRAM sharing. I've found that a simultaneous write by one cog and read of the same location by the other cog corrupts the data. My present workaround is to use spare smartpins as single longword mailboxes instead of any lutRAM sharing.
In your case I see there is also buffering in lutRAM. Err, just some parameters. Your "events" could be done in smartpins so that the buffers parameters can stay in lutRAM.
I can generate signals up to 1Ghz with very, very little phase noise from my Communications Service Monitor. Can we directly drive XI at 200Mhz? What level of signal is required?
Let me review the schematic for the ES board.
P2 has some choices on CLK in
HUBSET ##%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS 'set clock mode
where
E - Enables PLL
DDDDDD for PFD = XI/(D+1) 1..64 division of XI pin frequency
MMMMMMMMMM for VCO/(M+1) = PFD 1..1024 division of VCO frequency
PPPP for SysCLK from VCO 15 == VCO/1
CC Xtal pins mode
SS Osc Select : RCFAST : RCSLOW : XI : PLL
%CC XI status XO status XI/XO Z XI/XO caps Measures ppm change PLL
%00 ignored float Hi-Z OFF
%01 input 600-ohm drive 1M-ohm OFF -> 1000.1439Hz +144 ppm
%10 input 600-ohm drive 1M-ohm 15pF per pin -> 999.9933Hz -6.700 ppm
%11 input 600-ohm drive 1M-ohm 30pF per pin -> 999.9472Hz -53 ppm
%SS Clock Source Notes
%11 PLL CC != %00 and E=1, allow 10ms for crystal+PLL to stabilize before switching to PLL
%10 XI CC != %00, allow 5ms for crystal to stabilize before switching to XI
%01 ~20kHz ~20kHz, can be switched to at any time, low-power
%00 20MHz+ 20MHz+, can be switched to at any time, used on boot-up.
So you could use %SS to select XI drive, and AC couple a sine generator into XI.
Probably you would also select XI/XO caps to 0pF, via %CC .
The Xtal amplifier will have decreasing gain with frequency, so it would be useful to know the mVp-p vs MHz clocking threshold.
Clipped sine oscillators have low Icc, low RFI, and low prices, and usually spec > 0.8V p-p, and come up to 48MHz or 52MHz (26MHz is a common GPS value)
Garry,
I've found one potential already - lutRAM sharing. I've found that a simultaneous write by one cog and read of the same location by the other cog corrupts the data.
Here, do you mean the read data value corrupts, so is neither = old value nor = new value, or does this mean the address corrupts, and an unexpected address is written ?
Another small, and easy, thing to try out is setting "clocked" mode on both physical I/O. This could have a significant impact on those delay adjustments you are having to preset to suit various sysclock rates. Eg:
Garry,
I've just started looking for uses in your code of known issues with the hardware. I've found one potential already - lutRAM sharing. I've found that a simultaneous write by one cog and read of the same location by the other cog corrupts the data. My present workaround is to use spare smartpins as single longword mailboxes instead of any lutRAM sharing.
In your case I see there is also buffering in lutRAM. Err, just some parameters. Your "events" could be done in smartpins so that the buffers parameters can stay in lutRAM.
I've found lut sharing to be simple, efficient and robust for mailbox style data transfers. The parameters you're speaking of do always remain in lutRAM -- cog register->lut written by the sender and lut->cog register by the reader. There is no read/write contention because after the data is written the mailbox flag is raised using a lut write event that triggers on a lut cell that only the sender can write to. The only simultaneous write contention to worry about is at cog load/start time when these two lut trigger cells get initialized. Using a lock at startup to protect the initialization writes might be a prudent thing to do, though...
Another small, and easy, thing to try out is setting "clocked" mode on both physical I/O. This could have a significant impact on those delay adjustments you are having to preset to suit various sysclock rates. Eg:
Garryj, do you have a scope with which you can actually monitor the DM and DP signals?
If there are any race conditions, you should see some level glitches here and there.
Oy. I do have an "economy" 100MHz 2-channel scope, but it's been quite a while since I last used it, and most of the details needed to get things set up and working have pretty much vanished from my aging memory :depressed: But I'll give it a shot if it becomes necessary.
In this case, I'm not sure I would be able to detect anything meaningful (to me), because whatever is happening on the bus is very subtle. If it was babble or other significant bus disruption, the analyze would detect it as a corrupted packet. But this is a bad packet CRC, so the analyzer is interpreting the bitstream as being valid -- it's just that the bits aren't in the proper order.
Attached are two looks at another fail/recover incident, showing something I see quite often. In this case the byte that triggers the CRC fail has the same value as the byte that follows it and the "bad" byte (always for this type of fail) is within the data byte #5, 6, 7 group. I would think that if it was a bad read from RAM I would be seeing issues in a lot of different places. Confusing, indeed.
Do you know if USB works equally well on every set of pins? I'm thinking that each pin has a different set of timings for those asynchronous %HHHLLL signals.
I moved from pin pair 26/27 to 18/19 and got the same results, but I'll be moving pin pairs occasionally going forward, to see if something might eventually change.
Did you say earlier that you were able to prevent these problems by padding out the timing a little bit?
I am wondering if, due to race conditions, the pin pair is temporarily left in a different data state than supposed, after a transmission burst. A scope would make such a problem glaringly obvious, while the USB analyzer maybe only occasionally sees a packet-level error.
The only simultaneous write contention to worry about is at cog load/start time when these two lut trigger cells get initialized. Using a lock at startup to protect the initialization writes might be a prudent thing to do, though...
I wasn't clear enough. I'm talking about a hardware design flaw. If one cog reads on the exact clock cycle that the other cog writes, then the data is corrupted.
This most likely means your "events" are occasionally false negatives. And probably corrected on next pass. I wouldn't rule out false positives either though.
Garry, those are interesting clues.
Did you say earlier that you were able to prevent these problems by padding out the timing a little bit?
Yes. A waitx of #3 between the ackpin and wypin when from ~120MHz..180MHz, the runs are squeaky-clean, analyzer-wise. Above 180MHz, the OUT CRC error starts showing up at ~192MHz, but they may recover and pass within the six retries I have now. Bump the sysclock a little more, and it the recoveries disappear. I've upped the retry count up to 50+ tries with no recoveries.
I am wondering if, due to race conditions, the pin pair is temporarily left in a different data state than supposed, after a transmission burst. A scope would make such a problem glaringly obvious, while the USB analyzer maybe only occasionally sees a packet-level error.
I'll start re-reading my scope manual
Also, if you get the time to check out my USB mass storage demo, have a 32GB thumbdrive formatted FAT32 and the media successfully mounts, run the SCANFAT command. If it succeeds, you can put a waitx #0 between the #utx_byte subroutine's ackpin/wypin and run again and the media should still mount successfully. Another "SCANFAT" command and I can almost guarantee you'll see what the attached image shows. Change that #0 to #3 and it should succeed. If not, #10 or #20 should elicit a fail succeed. Doh!
...
Yes. A waitx of #3 between the ackpin and wypin when from ~120MHz..180MHz, the runs are squeaky-clean, analyzer-wise. Above 180MHz, the OUT CRC error starts showing up at ~192MHz, but they may recover and pass within the six retries I have now. Bump the sysclock a little more, and it the recoveries disappear. I've upped the retry count up to 50+ tries with no recoveries.
...
I'll start re-reading my scope manual
Given the low failures here, I'm wondering if some cards are fussier about total phase than others.
To check that, if you output a signal on a separate pin every USB frame, and trigger the scope on that, the ideal would be to have TX edges appear on every 12MHz picket inside that, which means all USB edges would appear with clean eyes.
Systems that resync more rapidly, could tolerate more edge movement, but ones using a PLL (as USB 3 might) may have less tolerance for relative bounce.
Some USB systems may also be fussier about resync-edges, if those stretch too far apart, it may affect them ?
When you send each TX chunk, does the USB send always align to some start-phase, or does the TX resync to the new TX-time (ie the edges move) ?
If it does resync every chunk sent, you would expect small delays to help, and there could also be multiple delay sweetspots.
ie in above, 3 sysclks/120MHz is 30% of one 12MHz period, so 13 sysclks would give 1period + 30%
This most likely means your "events" are occasionally false negatives. And probably corrected on next pass. I wouldn't rule out false positives either though.
Oh, false positives might be more common than I first thought. Just the fact that bit0 is set in both D_READY and H_READY means that an intended state change to an even event value might glitch to an odd event value instead. Ie: It'll add +1 to whatever the intended event was. Oh, they're not #1 but #0. That's new tool to me. I've never built constant tables in assembly before.
There is still potential for glitches there at any rate. lutRAM sharing is flawed in the hardware at the moment. Chip hasn't said much but I think has a plan to consult OnSemi about the issue.
This most likely means your "events" are occasionally false negatives. And probably corrected on next pass. I wouldn't rule out false positives either though.
Oh, false positives might be more common than I first thought. Just the fact that bit0 is set in both D_READY and H_READY means that an intended state change to an even event value might glitch to an odd event value instead.
It must be pretty darn rare (or I've been exceptionally lucky), because I haven't experienced any miss-interpreted "event" postings since I've been using the P2-ES. The driver cog events pass USB I/O request packet parameters to the host and the host passes the IRP result back to the driver, and I can't say that I've run across any USB transaction fail that I could trace back to an event ID error. But now that I'm aware, I'll see if I can help find out how much it happens "in the wild", as it were. I already have an "out of range" trap in #post_hevent_sync in the driver. I'll expand it to accumulate any read of an invalid event ID, and do the same in the host. We'll see what numbers pop up.
Comments
I'd be surprised if jitter is the culprit, but definitely worth trying...
Looks like you may be correct. No joy in Mudville, as testing with a WAITX #3 between the AKPIN #DP and WRPIN #DM at 120, 180 and 260MHz, the results were the same as my earlier testing:
120 - my troublesome 3.x parts were still flaky, the others had clean runs. I still don't know if this could be related to the OUT CRC error issue because the reason they are failing is due to orphaned packets, or the device rejecting both IN/OUT transactions with a count of NAKs that break my (rather arbitrary) limit. In the bulk-only USB world, NAK from the device is basically an "I don't like this packet" that could mean a lot of things that I haven't had to time to look into. Since it affects only three of my test devices, it's been a pretty low priority issue.
180 - this is the sweet spot. All were successful with not a single complaint from the analyzer, whether the WATIX was #3 or #20.
260 240 - eventual OUT CRC error on large consecutive sector reads. A bump to WAITX #20 is the "fix".
Regardless, I'll be changing all frequencies in my test group to use the preferred XDIV values. Hopefully this might remove a potential issue down the road.
Did you mean 240MHz here ?
/5 is certainly better than /20.
180MHz is going to have more accurate placement, than 120MHz, but you would expect 240MHz to be even better still, from an initial precision viewpoint, but maybe up here the P2 is getting 'too quick' for other systems ?
At those MHz values are the failures 100% of the time, or some lower failure rate ?
Jitter issues are going to fail on a more statistical manner, with some good passes, and some worse ones.
For some more margin testing, there is fine adjust possible via the Xtal Load C - below are the ppm movements I measured, at modest temperatures.
I think they can generate up to 200 MHz, and we already have a P1 driver that should be easy to port to P2
I'm thinking use one of Von Szarvas's new proto boards to load the adafruit si5351a module onto, with a single link wire across to XI
Or, do the P123 ones still work with crystal hubsets added?
I just thought of something that might make a difference: Enable clocking on the USB pins by setting bit 16 of the long value that you WRPIN to each USB pin. That will cause the OUT signal to be registered, keeping it later than the %HHHLLL signals which the USB smart pin mode controls.
In the next silicon, we will need to apply timing constraints to tightly group the %HHHLLL signals, which are always asynchronous, so that they all transition at nearly the same time. The troubles you have been having may be related to the same kind of glitch phenomenon that made the pull-up/down sensor code in the ROM misread the boot pins. We could have varying race conditions in different pins with the %HHHLLL signals.
If there are any race conditions, you should see some level glitches here and there.
Do you know if USB works equally well on every set of pins? I'm thinking that each pin has a different set of timings for those asynchronous %HHHLLL signals.
Yes, they are good to 200MHz.
Passing a 200MHz clock is more of a challenge, but you certainly could run a Si5351A to (say) 48MHz which ups the PFD even more, and a Si5351 allows some testing of clock tolerance, and it has a spread spectrum mode, which could be interesting to try.
I can generate signals up to 1Ghz with very, very little phase noise from my Communications Service Monitor. Can we directly drive XI at 200Mhz? What level of signal is required?
Let me review the schematic for the ES board.
I've just started looking for uses in your code of known issues with the hardware. I've found one potential already - lutRAM sharing. I've found that a simultaneous write by one cog and read of the same location by the other cog corrupts the data. My present workaround is to use spare smartpins as single longword mailboxes instead of any lutRAM sharing.
In your case I see there is also buffering in lutRAM. Err, just some parameters. Your "events" could be done in smartpins so that the buffers parameters can stay in lutRAM.
P2 has some choices on CLK in
So you could use %SS to select XI drive, and AC couple a sine generator into XI.
Probably you would also select XI/XO caps to 0pF, via %CC .
The Xtal amplifier will have decreasing gain with frequency, so it would be useful to know the mVp-p vs MHz clocking threshold.
Clipped sine oscillators have low Icc, low RFI, and low prices, and usually spec > 0.8V p-p, and come up to 48MHz or 52MHz (26MHz is a common GPS value)
Here, do you mean the read data value corrupts, so is neither = old value nor = new value, or does this mean the address corrupts, and an unexpected address is written ?
Already been tried and no joy.
In this case, I'm not sure I would be able to detect anything meaningful (to me), because whatever is happening on the bus is very subtle. If it was babble or other significant bus disruption, the analyze would detect it as a corrupted packet. But this is a bad packet CRC, so the analyzer is interpreting the bitstream as being valid -- it's just that the bits aren't in the proper order.
Attached are two looks at another fail/recover incident, showing something I see quite often. In this case the byte that triggers the CRC fail has the same value as the byte that follows it and the "bad" byte (always for this type of fail) is within the data byte #5, 6, 7 group. I would think that if it was a bad read from RAM I would be seeing issues in a lot of different places. Confusing, indeed.
I moved from pin pair 26/27 to 18/19 and got the same results, but I'll be moving pin pairs occasionally going forward, to see if something might eventually change.
Did you say earlier that you were able to prevent these problems by padding out the timing a little bit?
I am wondering if, due to race conditions, the pin pair is temporarily left in a different data state than supposed, after a transmission burst. A scope would make such a problem glaringly obvious, while the USB analyzer maybe only occasionally sees a packet-level error.
This most likely means your "events" are occasionally false negatives. And probably corrected on next pass. I wouldn't rule out false positives either though.
I'll start re-reading my scope manual
Also, if you get the time to check out my USB mass storage demo, have a 32GB thumbdrive formatted FAT32 and the media successfully mounts, run the SCANFAT command. If it succeeds, you can put a waitx #0 between the #utx_byte subroutine's ackpin/wypin and run again and the media should still mount successfully. Another "SCANFAT" command and I can almost guarantee you'll see what the attached image shows. Change that #0 to #3 and it should succeed. If not, #10 or #20 should elicit a fail succeed. Doh!
Given the low failures here, I'm wondering if some cards are fussier about total phase than others.
To check that, if you output a signal on a separate pin every USB frame, and trigger the scope on that, the ideal would be to have TX edges appear on every 12MHz picket inside that, which means all USB edges would appear with clean eyes.
Systems that resync more rapidly, could tolerate more edge movement, but ones using a PLL (as USB 3 might) may have less tolerance for relative bounce.
Some USB systems may also be fussier about resync-edges, if those stretch too far apart, it may affect them ?
When you send each TX chunk, does the USB send always align to some start-phase, or does the TX resync to the new TX-time (ie the edges move) ?
If it does resync every chunk sent, you would expect small delays to help, and there could also be multiple delay sweetspots.
ie in above, 3 sysclks/120MHz is 30% of one 12MHz period, so 13 sysclks would give 1period + 30%
There is still potential for glitches there at any rate. lutRAM sharing is flawed in the hardware at the moment. Chip hasn't said much but I think has a plan to consult OnSemi about the issue.