USB Testing

1789101113»

Comments

  • Maybe Chip can make an 80 MHz version? Would that help?
    Prop Info and Apps: http://www.rayslogic.com/
  • jmgjmg Posts: 10,208
    edited July 16 Vote Up0Vote Down
    Rayman wrote: »
    Maybe Chip can make an 80 MHz version? Would that help?
    It would reduce the 'too many changes' effect.
    Right now there are opcodes changed and MHz changed, and neither fully proven.
    In the meantime, testing across FPGAs may give more information, as they are likely to have differing actual upper MHz limits.

  • jmg wrote: »
    garryj wrote: »

    .... The USB analyzer shows both devices are getting configured correctly, but when the polling timers are set and running, it looks like their CTn = CT flags never get set, which results in the driver cog getting stuck in its main->dpoll_data->main loop.

    ... My USB analyzer shows that the full-speed 1ms frame interval packets transmitted using the host cog "CTn = CT"-triggered ISR is functioning as expected, but when the first SETUP packet is transmitted, the non-ISR code will call poll_waitx to time an inter-packet delay, and that is where it looks like things go south.

    ... but after waiting at least the ~36 seconds for the 120Mhz clock timer rollover, no joy.

    Finally, an odd, but potentially important quirk, that I can reproduce with the USBMouse.spin2 file, is that if the mouse consistently works over several connect, exercise and disconnect cycles, search the file for the parse_dev_desc label, comment out the NOP, and run the program again. On my system just commenting the NOP out causes the code to never get to the device polling stage, i.e. the 8ms polling interval timer appears to never trigger.
    Hmm, be interesting to see how this runs across different FPGAs, as it's starting to sound like maybe that 120MHz is too optimistic in overclocking ?
    Can you do some small test cases of the suspect opcodes ?
    I sure hope that when the issue(s) are found, they don't include the 120Mhz sysclock, because at full-speed now the USB smartpins are working really well. I was able to remove the "if full-speed" special case shortcuts, so now LS and FS are using the same code, with the only difference being the smartpin setup for the appropriate speed. The USB 3.0 parts that were being finicky at 80Mhz appear to be much less so now.

    Also, I tested the low-speed devices I have with the sysclock set at 60Mhz and it works well. My testing recollection is that I saw the same "odd NOP quirk" mentioned above at 60Mhz, and if that's true, I'd think that would mean that the 120Mhz sysclock would be less of a suspect. I'll have to re-test that to be sure my memory hasn't failed me -- I'm going to have to get into the habit of taking more notes -- getting old is the pits :frown:

    So far I've done a small and rather simplistic simulation of the driver cog's main loop, using the POLLCTn timers I've been suspecting, and that code works fine, so far. The next step is to add the host cog's main loop in there, and get the complete host<->driver inter-process communication routines simulated. Hopefully that will result in something popping up.
    garryj
  • jmgjmg Posts: 10,208
    garryj wrote: »
    My testing recollection is that I saw the same "odd NOP quirk" mentioned above at 60Mhz, and if that's true, I'd think that would mean that the 120Mhz sysclock would be less of a suspect.
    Yes, suggests MHz is less a factor, for that specific issue.
    Does this fail with no NOP, but works fine with 1,2,3 NOPs etc ?
    Are the instructions either side of the NOP, ones that may be time-sensitive, or any boundary cases ?
    Can other opcodes be used instead of NOP ?
    ISTR Chip has mentioned a couple of opcodes that have 'too close/ too soon' type caveats on use ?

  • I've been experiencing some "weirdness" with my code on V19 and V20 too.
    I can't quite put my finger on it yet, but I have been bouncing between V19 and V20 on 3 different platforms (P123-A9,P123-A7 and DE2-115) trying to find the problem.
    I have even tried pulling back to 60MHz but that made no difference either.

    In my case I seem to be having problems with the DIRA register.
    	dirh	#mypin   'mypin = 17 ***works ok****
    but
    	bith	dira,#mypin	'doesn't work
    and strangely this seems to clobber DIRA too
    	or	dira,#0
    
    Trouble is that if I test the "problem" instructions in isolation they work ok.
    i will just have to keep digging....
    Melbourne, Australia
  • jmg wrote: »
    garryj wrote: »
    My testing recollection is that I saw the same "odd NOP quirk" mentioned above at 60Mhz, and if that's true, I'd think that would mean that the 120Mhz sysclock would be less of a suspect.
    Yes, suggests MHz is less a factor, for that specific issue.
    Does this fail with no NOP, but works fine with 1,2,3 NOPs etc ?
    Are the instructions either side of the NOP, ones that may be time-sensitive, or any boundary cases ?
    Can other opcodes be used instead of NOP ?
    ISTR Chip has mentioned a couple of opcodes that have 'too close/ too soon' type caveats on use ?
    I just tested the "odd NOP quirk" at 60Mhz, and it happens at that speed, too. In my test cases, it has been working like a switch, i.e. if the NOP is there and it works, you comment it out and it doesn't, and vice versa. And it doesn't have to be a NOP, nor does it have to be in a specific spot in the code. That's what makes it so odd.
    garryj
  • jmgjmg Posts: 10,208
    garryj wrote: »
    I just tested the "odd NOP quirk" at 60Mhz, and it happens at that speed, too. In my test cases, it has been working like a switch, i.e. if the NOP is there and it works, you comment it out and it doesn't, and vice versa. And it doesn't have to be a NOP, nor does it have to be in a specific spot in the code. That's what makes it so odd.
    Must be triggering something...
    Is it code-location, or code delay, or delay between opcodes ?
    Is 2 NOPs OK ? What about NOP before the label ? - (moves code in memory, but does not change speed)


  • evanhevanh Posts: 3,960
    edited July 17 Vote Up0Vote Down
    When I did that little bit of Smartpin count mode testing I noted some significant lag on the state transitions. If your code is making any assumptions on a Smartpin responding to instructions within a single instruction step you'll probably be seeing old state information.


    PS: I did all that with v19. I wouldn't know what may have been different beforehand.
    $50,000 buys you a discrediting of a journalist
  • And it doesn't have to be a NOP, nor does it have to be in a specific spot in the code. That's what makes it so odd.

    maybe odd/even COG register?

    Mike
    I am just another Code Monkey.

    A determined coder can write COBOL programs in any language. -- Author unknown.

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • msrobots wrote: »
    And it doesn't have to be a NOP, nor does it have to be in a specific spot in the code. That's what makes it so odd.

    maybe odd/even COG register?

    Mike
    The code block it's in is hubexec.

    garryj
  • Woo hoo! Back in business!
    jmg wrote: »
    ISTR Chip has mentioned a couple of opcodes that have 'too close/ too soon' type caveats on use ?
    I combed back through the posts and ran across this:
    cgracey wrote: »
    I just tested all the LUT events and they work okay.
    The only difference from the last release is that the LUT r/w events are now sensed one clock later than they were before. I had to use some flops to capture the ENA, WR, and ADDR, in order to make sense of them in the next cycle, as there was insufficient time to do it all in one clock. So, the LUT events are sensed one clock later than they used to be, causing events/interrupts to trigger one clock later. It seems unlikely that anyone's code would be so sensitive to such a small change.
    I checked all my WRLUTs and it didn't look like any I had fell into this category, but it got me thinking about my POLLCTn problem child. Recall from my test files post, one "bad behavior trigger" routine I had was this:
    poll_waitx
                    getct   hct3
                    addct3  hct3, hctwait
    .wait
                    pollct3                         wc
            if_nc   jmp     #.wait
    '                jnct3   #.wait
                    ret
    
    There's no time between the ADDCTn POLLCTn, so plugged in some NOPs and ended up with this:
    poll_waitx
                    getct   hct3
                    addct3  hct3, hctwait
    .wait
                    nop
                    nop
                    nop
                    nop
                    nop
                    pollct3                         wc
            if_nc   jmp     #.wait
    '                jnct3   #.wait
                    ret
    
    Putting the NOPs before the .wait label doesn't work, so it looks like maybe there's something going on at the event flag set/reset time? Anything less than 10 clocks @ 120Mhz was problematic. The JCTn/JNCTn instructions don't appear to be as much affected by this, but I haven't explored that much at all, yet.

    But at least now I've actually had a full-speed keyboard/mouse combo working together on v19+ for more than zero seconds :-D I'm still getting occasional hangs, but at least I've got something much better to work with now :-D

    Also, when it's working, the "odd NOP quirk" appears to disappear. Maybe the sysclock demons are adequately appeased by the NOPs offering ;)
    garryj
  • jmgjmg Posts: 10,208
    garryj wrote: »
    Woo hoo! Back in business!
    But at least now I've actually had a full-speed keyboard/mouse combo working together on v19+ for more than zero seconds :-D I'm still getting occasional hangs, but at least I've got something much better to work with now :-D
    Also, when it's working, the "odd NOP quirk" appears to disappear. Maybe the sysclock demons are adequately appeased by the NOPs offering ;)

    ..maybe... ;) , good there is progress.
    garryj wrote: »
    Putting the NOPs before the .wait label doesn't work, so it looks like maybe there's something going on at the event flag set/reset time? Anything less than 10 clocks @ 120Mhz was problematic. The JCTn/JNCTn instructions don't appear to be as much affected by this, but I haven't explored that much at all, yet.
    Hmm, that a little weird, how many times does this typically loop ?
    If they need to be post-label, all the NOPs seem to do here, is make the polling more granular, and so delay the eventual exit (plus add some jitter).
    If you move the NOPs to after the test/loop, just before ret, is the effect the same ?

  • jmg wrote: »
    Hmm, that a little weird, how many times does this typically loop ?
    If they need to be post-label, all the NOPs seem to do here, is make the polling more granular, and so delay the eventual exit (plus add some jitter).
    If you move the NOPs to after the test/loop, just before ret, is the effect the same ?
    This loop is in the host cog space and is called whenever the USB 1ms frame generator ISR is active and you need WAITX behavior.
    If the NOPs are before the .wait, or after the if_nc jmp #.wait there is a 0% success rate that a FS keyboard/mouse will make it to polling for device data. When in the happy spot, the success rate isn't 100%, but at least now it gets to the data polling stage, though after a few seconds to many minutes, to tens of minutes, is may again hang. At this point, that could be due to software problems elsewhere, but at least now I've got something to work with.

    I think that at this point, though, we could now postulate that this is likely a systemic problem, and not a programming/logic problem?

    Here's my latest test code, for others to poke and prod, if they wish.
    garryj
  • jmgjmg Posts: 10,208
    garryj wrote: »
    This loop is in the host cog space and is called whenever the USB 1ms frame generator ISR is active and you need WAITX behavior.
    If the NOPs are before the .wait, or after the if_nc jmp #.wait there is a 0% success rate that a FS keyboard/mouse will make it to polling for device data. When in the happy spot, the success rate isn't 100%, but at least now it gets to the data polling stage, though after a few seconds to many minutes, to tens of minutes, is may again hang.
    If it tolerates NOPs can you toggle a pin in the loop, & one just after, to confirm exactly where it does hang ?

    I probably should have said place 2 blocks of NOPS, one before label, and the other before ret.
    That duplicates possible entry and exit delays and only changes the poll granularity. Minor detail but we are looking at long-shots here.

    Do pollct1 or pollct2 fail the same ?

    I notice those are RMW instructions, I wonder if a fire at just the wrong phase, misses being seen, but clears ?

    With a non-binary NOP+other loop cycle count, if that did happen, on the next go-around, 2^32 clks later, it should miss that phase, and exit ok ?
    Conversely, with fewer and binary cycles, it might never exit.
Sign In or Register to comment.