Maybe Chip can make an 80 MHz version? Would that help?
It would reduce the 'too many changes' effect.
Right now there are opcodes changed and MHz changed, and neither fully proven.
In the meantime, testing across FPGAs may give more information, as they are likely to have differing actual upper MHz limits.
.... The USB analyzer shows both devices are getting configured correctly, but when the polling timers are set and running, it looks like their CTn = CT flags never get set, which results in the driver cog getting stuck in its main->dpoll_data->main loop.
... My USB analyzer shows that the full-speed 1ms frame interval packets transmitted using the host cog "CTn = CT"-triggered ISR is functioning as expected, but when the first SETUP packet is transmitted, the non-ISR code will call poll_waitx to time an inter-packet delay, and that is where it looks like things go south.
... but after waiting at least the ~36 seconds for the 120Mhz clock timer rollover, no joy.
Finally, an odd, but potentially important quirk, that I can reproduce with the USBMouse.spin2 file, is that if the mouse consistently works over several connect, exercise and disconnect cycles, search the file for the parse_dev_desc label, comment out the NOP, and run the program again. On my system just commenting the NOP out causes the code to never get to the device polling stage, i.e. the 8ms polling interval timer appears to never trigger.
Hmm, be interesting to see how this runs across different FPGAs, as it's starting to sound like maybe that 120MHz is too optimistic in overclocking ?
Can you do some small test cases of the suspect opcodes ?
I sure hope that when the issue(s) are found, they don't include the 120Mhz sysclock, because at full-speed now the USB smartpins are working really well. I was able to remove the "if full-speed" special case shortcuts, so now LS and FS are using the same code, with the only difference being the smartpin setup for the appropriate speed. The USB 3.0 parts that were being finicky at 80Mhz appear to be much less so now.
Also, I tested the low-speed devices I have with the sysclock set at 60Mhz and it works well. My testing recollection is that I saw the same "odd NOP quirk" mentioned above at 60Mhz, and if that's true, I'd think that would mean that the 120Mhz sysclock would be less of a suspect. I'll have to re-test that to be sure my memory hasn't failed me -- I'm going to have to get into the habit of taking more notes -- getting old is the pits :frown:
So far I've done a small and rather simplistic simulation of the driver cog's main loop, using the POLLCTn timers I've been suspecting, and that code works fine, so far. The next step is to add the host cog's main loop in there, and get the complete host<->driver inter-process communication routines simulated. Hopefully that will result in something popping up.
My testing recollection is that I saw the same "odd NOP quirk" mentioned above at 60Mhz, and if that's true, I'd think that would mean that the 120Mhz sysclock would be less of a suspect.
Yes, suggests MHz is less a factor, for that specific issue.
Does this fail with no NOP, but works fine with 1,2,3 NOPs etc ?
Are the instructions either side of the NOP, ones that may be time-sensitive, or any boundary cases ?
Can other opcodes be used instead of NOP ?
ISTR Chip has mentioned a couple of opcodes that have 'too close/ too soon' type caveats on use ?
I've been experiencing some "weirdness" with my code on V19 and V20 too.
I can't quite put my finger on it yet, but I have been bouncing between V19 and V20 on 3 different platforms (P123-A9,P123-A7 and DE2-115) trying to find the problem.
I have even tried pulling back to 60MHz but that made no difference either.
In my case I seem to be having problems with the DIRA register.
dirh #mypin 'mypin = 17 ***works ok****
but
bith dira,#mypin 'doesn't work
and strangely this seems to clobber DIRA too
or dira,#0
Trouble is that if I test the "problem" instructions in isolation they work ok.
i will just have to keep digging....
My testing recollection is that I saw the same "odd NOP quirk" mentioned above at 60Mhz, and if that's true, I'd think that would mean that the 120Mhz sysclock would be less of a suspect.
Yes, suggests MHz is less a factor, for that specific issue.
Does this fail with no NOP, but works fine with 1,2,3 NOPs etc ?
Are the instructions either side of the NOP, ones that may be time-sensitive, or any boundary cases ?
Can other opcodes be used instead of NOP ?
ISTR Chip has mentioned a couple of opcodes that have 'too close/ too soon' type caveats on use ?
I just tested the "odd NOP quirk" at 60Mhz, and it happens at that speed, too. In my test cases, it has been working like a switch, i.e. if the NOP is there and it works, you comment it out and it doesn't, and vice versa. And it doesn't have to be a NOP, nor does it have to be in a specific spot in the code. That's what makes it so odd.
I just tested the "odd NOP quirk" at 60Mhz, and it happens at that speed, too. In my test cases, it has been working like a switch, i.e. if the NOP is there and it works, you comment it out and it doesn't, and vice versa. And it doesn't have to be a NOP, nor does it have to be in a specific spot in the code. That's what makes it so odd.
Must be triggering something...
Is it code-location, or code delay, or delay between opcodes ?
Is 2 NOPs OK ? What about NOP before the label ? - (moves code in memory, but does not change speed)
When I did that little bit of Smartpin count mode testing I noted some significant lag on the state transitions. If your code is making any assumptions on a Smartpin responding to instructions within a single instruction step you'll probably be seeing old state information.
PS: I did all that with v19. I wouldn't know what may have been different beforehand.
I just tested all the LUT events and they work okay.
The only difference from the last release is that the LUT r/w events are now sensed one clock later than they were before. I had to use some flops to capture the ENA, WR, and ADDR, in order to make sense of them in the next cycle, as there was insufficient time to do it all in one clock. So, the LUT events are sensed one clock later than they used to be, causing events/interrupts to trigger one clock later. It seems unlikely that anyone's code would be so sensitive to such a small change.
I checked all my WRLUTs and it didn't look like any I had fell into this category, but it got me thinking about my POLLCTn problem child. Recall from my test files post, one "bad behavior trigger" routine I had was this:
Putting the NOPs before the .wait label doesn't work, so it looks like maybe there's something going on at the event flag set/reset time? Anything less than 10 clocks @ 120Mhz was problematic. The JCTn/JNCTn instructions don't appear to be as much affected by this, but I haven't explored that much at all, yet.
But at least now I've actually had a full-speed keyboard/mouse combo working together on v19+ for more than zero seconds :-D I'm still getting occasional hangs, but at least I've got something much better to work with now :-D
Also, when it's working, the "odd NOP quirk" appears to disappear. Maybe the sysclock demons are adequately appeased by the NOPs offering
Woo hoo! Back in business!
But at least now I've actually had a full-speed keyboard/mouse combo working together on v19+ for more than zero seconds :-D I'm still getting occasional hangs, but at least I've got something much better to work with now :-D
Also, when it's working, the "odd NOP quirk" appears to disappear. Maybe the sysclock demons are adequately appeased by the NOPs offering
Putting the NOPs before the .wait label doesn't work, so it looks like maybe there's something going on at the event flag set/reset time? Anything less than 10 clocks @ 120Mhz was problematic. The JCTn/JNCTn instructions don't appear to be as much affected by this, but I haven't explored that much at all, yet.
Hmm, that a little weird, how many times does this typically loop ?
If they need to be post-label, all the NOPs seem to do here, is make the polling more granular, and so delay the eventual exit (plus add some jitter).
If you move the NOPs to after the test/loop, just before ret, is the effect the same ?
Hmm, that a little weird, how many times does this typically loop ?
If they need to be post-label, all the NOPs seem to do here, is make the polling more granular, and so delay the eventual exit (plus add some jitter).
If you move the NOPs to after the test/loop, just before ret, is the effect the same ?
This loop is in the host cog space and is called whenever the USB 1ms frame generator ISR is active and you need WAITX behavior.
If the NOPs are before the .wait, or after the if_nc jmp #.wait there is a 0% success rate that a FS keyboard/mouse will make it to polling for device data. When in the happy spot, the success rate isn't 100%, but at least now it gets to the data polling stage, though after a few seconds to many minutes, to tens of minutes, is may again hang. At this point, that could be due to software problems elsewhere, but at least now I've got something to work with.
I think that at this point, though, we could now postulate that this is likely a systemic problem, and not a programming/logic problem?
Here's my latest test code, for others to poke and prod, if they wish.
This loop is in the host cog space and is called whenever the USB 1ms frame generator ISR is active and you need WAITX behavior.
If the NOPs are before the .wait, or after the if_nc jmp #.wait there is a 0% success rate that a FS keyboard/mouse will make it to polling for device data. When in the happy spot, the success rate isn't 100%, but at least now it gets to the data polling stage, though after a few seconds to many minutes, to tens of minutes, is may again hang.
If it tolerates NOPs can you toggle a pin in the loop, & one just after, to confirm exactly where it does hang ?
I probably should have said place 2 blocks of NOPS, one before label, and the other before ret.
That duplicates possible entry and exit delays and only changes the poll granularity. Minor detail but we are looking at long-shots here.
Do pollct1 or pollct2 fail the same ?
I notice those are RMW instructions, I wonder if a fire at just the wrong phase, misses being seen, but clears ?
With a non-binary NOP+other loop cycle count, if that did happen, on the next go-around, 2^32 clks later, it should miss that phase, and exit ok ?
Conversely, with fewer and binary cycles, it might never exit.
Is there any thought on using an actual USB PHY to properly implement FS and HS? Does P2 support the fast turnaround needed to do HS with a PHY? All I know is that on XS-1 it took a lot of work to get a compliant HS host, including self-modifying code and such shenanigans. Would it be easier on P2?
Finally, how much would it cost to add a USB 2 HS phy (or two) into P2? It'd be a licensable 3rd party IP of course, but I wonder whether one is available for the process that'll be used and how much silicon would it use?
Life is much easier with a real USB PHY iff the read-to-write turnaround can be fast enough - it needs to be one external clock cycle to switch between reading and writing on a bunch of pins.
All I know is that on XS-1 it took a lot of work to get a compliant HS host, including self-modifying code and such shenanigans. Would it be easier on P2?
P2 will never support High Speed (480MHz USB)
I'm surprised XS-1 was able to manage that.
Is there any thought on using an actual USB PHY to properly implement FS and HS?
LS/FS looks to not need anything added, so that's the P2 sweet-spot.
An external HS part might be possible, but those tends to be niche, with plenty of complexity.
Do you have any low cost, well stocked, candidates in mind ?
Better, I think, are HS-USB companion parts, that manage ALL the HS-USB side.
eg I've found FT4222H, which is ~ $1.50, and gives a decent boost over 12MHz ,
and also NUC505 is the lowest price ( $1.74/1000) MCU I've found with HS-USB included.
Perhaps possible are Ultra-USB companion parts...like FT60x, which look to OUTPUT a 66MHz/100MHz CLK, but these seem to have no fall-back speeds.
shouldn't the 8-bit bus help in reducing data rate for HS?
Yes, that a ULPI interface, and 8 bits does help, but that also has a 60MHz CLKOUT that the P2 must slave or lock to.
Other parts use 60/66/100MHz CLKOUTs and right now the phase tolerances of P2 are unproven.
USB3300 is cheaper (just under $1), but also has less included, so has no buffers which means the P2 must 'keep up' exactly.
I don't know if the next test silicon, has enough included to try to lock to such clocks, and check the Pin-delays. Of course, it would be very nice if P2 can talk to ULPI parts.
FT4222 has a total Endpoint Buffer of 4160 bytes, and claims 'Up to 53.8Mbps data transfer rate in SPI master with quad mode transfer'
- so while that's not near the top end of HS USB, it is much faster than FS-USB, and looks 'relatively easy' on P2.
I'm not sure what the top possible link speeds with NUC505 are, but 100MBps in quad only needs 25MHz SPI clock.
Hmm, yeah ... I'm certainly struggling to see how to slave a Streamer to that. There is very little handshaking on ULPI and without control of the clock the databus sequencing windows don't really have any chance of being met.
@Rayman, this is the P2v18->v19 code I just tested under P2v22. Since v19 had just the instruction changes, hopefully it won't be any more difficult to integrate into your project than v18 was.
Where is USB testing and FPGA builds up to ?
Last I recall, some issue around Low Speed only, and opcode-sampling inside WAITs needed a FPGA fix & re-build ?
Where is USB testing and FPGA builds up to ?
Last I recall, some issue around Low Speed only, and opcode-sampling inside WAITs needed a FPGA fix & re-build ?
The problem was that I had introduced a phase error with event flag sampling and clearing when I was optimizing the ALU result mux. It was causing the POLLxxx instructions to sometimes miss the event, but clear the flag, anyway. This was affecting POLLCT1 in Garryj's code, causing things to hang until the counter wrapped around again. I made a new version with the bug fix and I posted it. Garryj tried it out and, indeed, it fixed the problem. So, there are no known issues with bugs in USB testing, anymore.
Where is USB testing and FPGA builds up to ?
Last I recall, some issue around Low Speed only, and opcode-sampling inside WAITs needed a FPGA fix & re-build ?
The problem was that I had introduced a phase error with event flag sampling and clearing when I was optimizing the ALU result mux. It was causing the POLLxxx instructions to sometimes miss the event, but clear the flag, anyway. This was affecting POLLCT1 in Garryj's code, causing things to hang until the counter wrapped around again. I made a new version with the bug fix and I posted it. Garryj tried it out and, indeed, it fixed the problem. So, there are no known issues with bugs in USB testing, anymore.
Cool, thanks, I now found the mention of Oct 5th P2v22 - but I think the post #1 FPGA files set, is still Sept 29th v21b ?
I just updated P123 board with latest image (v21b) and loaded the code from 5 posts ago.
Not sure if it is working or not. It shows me that it is connected to a low speed device, but I don't see any data when keyboard keys are pressed or when mouse moves...
Been a while though, might need to check my hardware....
I just updated P123 board with latest image (v21b) and loaded the code from 5 posts ago.
Not sure if it is working or not. It shows me that it is connected to a low speed device, but I don't see any data when keyboard keys are pressed or when mouse moves...
Been a while though, might need to check my hardware....
You may need the 5 oct P2v22 build, for P123, linked here ? (fixes the LS USB)
I just updated P123 board with latest image (v21b) and loaded the code from 5 posts ago.
Not sure if it is working or not. It shows me that it is connected to a low speed device, but I don't see any data when keyboard keys are pressed or when mouse moves...
Been a while though, might need to check my hardware....
Yeah, needs the P2v22 image. The P2v21b PX and PNut works with it.
Comments
Right now there are opcodes changed and MHz changed, and neither fully proven.
In the meantime, testing across FPGAs may give more information, as they are likely to have differing actual upper MHz limits.
Also, I tested the low-speed devices I have with the sysclock set at 60Mhz and it works well. My testing recollection is that I saw the same "odd NOP quirk" mentioned above at 60Mhz, and if that's true, I'd think that would mean that the 120Mhz sysclock would be less of a suspect. I'll have to re-test that to be sure my memory hasn't failed me -- I'm going to have to get into the habit of taking more notes -- getting old is the pits :frown:
So far I've done a small and rather simplistic simulation of the driver cog's main loop, using the POLLCTn timers I've been suspecting, and that code works fine, so far. The next step is to add the host cog's main loop in there, and get the complete host<->driver inter-process communication routines simulated. Hopefully that will result in something popping up.
Does this fail with no NOP, but works fine with 1,2,3 NOPs etc ?
Are the instructions either side of the NOP, ones that may be time-sensitive, or any boundary cases ?
Can other opcodes be used instead of NOP ?
ISTR Chip has mentioned a couple of opcodes that have 'too close/ too soon' type caveats on use ?
I can't quite put my finger on it yet, but I have been bouncing between V19 and V20 on 3 different platforms (P123-A9,P123-A7 and DE2-115) trying to find the problem.
I have even tried pulling back to 60MHz but that made no difference either.
In my case I seem to be having problems with the DIRA register.
Trouble is that if I test the "problem" instructions in isolation they work ok.
i will just have to keep digging....
Is it code-location, or code delay, or delay between opcodes ?
Is 2 NOPs OK ? What about NOP before the label ? - (moves code in memory, but does not change speed)
PS: I did all that with v19. I wouldn't know what may have been different beforehand.
maybe odd/even COG register?
Mike
But at least now I've actually had a full-speed keyboard/mouse combo working together on v19+ for more than zero seconds :-D I'm still getting occasional hangs, but at least I've got something much better to work with now :-D
Also, when it's working, the "odd NOP quirk" appears to disappear. Maybe the sysclock demons are adequately appeased by the NOPs offering
..maybe... , good there is progress.
Hmm, that a little weird, how many times does this typically loop ?
If they need to be post-label, all the NOPs seem to do here, is make the polling more granular, and so delay the eventual exit (plus add some jitter).
If you move the NOPs to after the test/loop, just before ret, is the effect the same ?
If the NOPs are before the .wait, or after the if_nc jmp #.wait there is a 0% success rate that a FS keyboard/mouse will make it to polling for device data. When in the happy spot, the success rate isn't 100%, but at least now it gets to the data polling stage, though after a few seconds to many minutes, to tens of minutes, is may again hang. At this point, that could be due to software problems elsewhere, but at least now I've got something to work with.
I think that at this point, though, we could now postulate that this is likely a systemic problem, and not a programming/logic problem?
Here's my latest test code, for others to poke and prod, if they wish.
I probably should have said place 2 blocks of NOPS, one before label, and the other before ret.
That duplicates possible entry and exit delays and only changes the poll granularity. Minor detail but we are looking at long-shots here.
Do pollct1 or pollct2 fail the same ?
I notice those are RMW instructions, I wonder if a fire at just the wrong phase, misses being seen, but clears ?
With a non-binary NOP+other loop cycle count, if that did happen, on the next go-around, 2^32 clks later, it should miss that phase, and exit ok ?
Conversely, with fewer and binary cycles, it might never exit.
Finally, how much would it cost to add a USB 2 HS phy (or two) into P2? It'd be a licensable 3rd party IP of course, but I wonder whether one is available for the process that'll be used and how much silicon would it use?
Life is much easier with a real USB PHY iff the read-to-write turnaround can be fast enough - it needs to be one external clock cycle to switch between reading and writing on a bunch of pins.
I'm surprised XS-1 was able to manage that.
LS/FS looks to not need anything added, so that's the P2 sweet-spot.
An external HS part might be possible, but those tends to be niche, with plenty of complexity.
Do you have any low cost, well stocked, candidates in mind ?
Better, I think, are HS-USB companion parts, that manage ALL the HS-USB side.
eg I've found FT4222H, which is ~ $1.50, and gives a decent boost over 12MHz ,
and also NUC505 is the lowest price ( $1.74/1000) MCU I've found with HS-USB included.
Perhaps possible are Ultra-USB companion parts...like FT60x, which look to OUTPUT a 66MHz/100MHz CLK, but these seem to have no fall-back speeds.
Somehow, I doubt the process would reach 480MHz HS Phy performance...
http://www.waveshare.com/usb3300-usb-hs-board.htm
shouldn't the 8-bit bus help in reducing data rate for HS?
(sorry, didn't even glance at the datasheet though)
Yes, that a ULPI interface, and 8 bits does help, but that also has a 60MHz CLKOUT that the P2 must slave or lock to.
Other parts use 60/66/100MHz CLKOUTs and right now the phase tolerances of P2 are unproven.
USB3300 is cheaper (just under $1), but also has less included, so has no buffers which means the P2 must 'keep up' exactly.
I don't know if the next test silicon, has enough included to try to lock to such clocks, and check the Pin-delays.
Of course, it would be very nice if P2 can talk to ULPI parts.
FT4222 has a total Endpoint Buffer of 4160 bytes, and claims 'Up to 53.8Mbps data transfer rate in SPI master with quad mode transfer'
- so while that's not near the top end of HS USB, it is much faster than FS-USB, and looks 'relatively easy' on P2.
I'm not sure what the top possible link speeds with NUC505 are, but 100MBps in quad only needs 25MHz SPI clock.
I suspect something about these changes...
"TESTB/TESTBN and TESTP/TESTPN (were TESTIN/TESTNIN) now have flag AND/OR/XOR"
..might be the culprit.
Last I recall, some issue around Low Speed only, and opcode-sampling inside WAITs needed a FPGA fix & re-build ?
The problem was that I had introduced a phase error with event flag sampling and clearing when I was optimizing the ALU result mux. It was causing the POLLxxx instructions to sometimes miss the event, but clear the flag, anyway. This was affecting POLLCT1 in Garryj's code, causing things to hang until the counter wrapped around again. I made a new version with the bug fix and I posted it. Garryj tried it out and, indeed, it fixed the problem. So, there are no known issues with bugs in USB testing, anymore.
Cool, thanks, I now found the mention of Oct 5th P2v22 - but I think the post #1 FPGA files set, is still Sept 29th v21b ?
Not sure if it is working or not. It shows me that it is connected to a low speed device, but I don't see any data when keyboard keys are pressed or when mouse moves...
Been a while though, might need to check my hardware....
You may need the 5 oct P2v22 build, for P123, linked here ? (fixes the LS USB)
http://forums.parallax.com/discussion/comment/1421812/#Comment_1421812
Guess we're back in business...