I'll do another all-FPGA compile soon. Right now, I'm looking into tightening up hub timing (CORDIC, COGINIT, etc.) for smaller-than-16-cog configurations.
... the next demo drop can identify all of the USB-IF registered device/interface base class types, from the class/subclass/protocol "triad" that's in the device and interface descriptors. Subclass and protocol identification can get pretty gnarly, so getting detailed info for these is optional, but it's pretty easy to add, if desired. The demo does the full triad for HID, Mass Storage and the Bluetooth subset of the "Wireless Controller" base class.
From discussion in another thread about boot choices....
Q: What is the current code size for what you have working as HID host LS/FS now ?
Q: Any educated guesses on what code size would be needed to appear as any of HID or CDC or MSD device to a PC host ?
Q: What is the current code size for what you have working as HID host LS/FS now ?
Currently the "minimal" boot protocol mouse/keyboard footprint is 2 cogs and ~11KB total code/data.
Q: Any educated guesses on what code size would be needed to appear as any of HID or CDC or MSD device to a PC host ?
Nope. But I do know that a proper HID implementation would be a very significant undertaking because you need to parse the report descriptors to determine device capabilities, and that's a very large can of worms.
I have yet to even scratch the surface as to what's required for mass storage and other device types outside of reading the configuration description chain.
Is there much extra work for supporting USB hubs do you know? Given the per USB port COG footprint you mentioned above, it likely makes sense if you wanted say both a keyboard and a mouse device hanging off a P2 to use a USB hub to a single port, rather than burn 4 COGs to support two physical USB ports particularly now given the P2 will have 8 COGs not 16.
From what you have done and learned so far with USB you have any idea if there will be any issues with eventually going via a hub and then supporting multiple devices on the bus? Does it then become purely a device tree management issue with additional configuration software and device state overhead etc or something more significant that needs to be considered that may add a lot of overhead somewhere?
This hub thing also applies to the P1's USB host implementation too which is probably more of what I'm interested in at the moment (well P1V actually). There is a single physical USB port limit there right now apparently, though it appears to be software not hardware related.
I think that is a very good mindset for another reason too. The hub IC takes on the job of hardened interface chip. It takes the wallops from static discharges.
Mouse and keyboard at the same time is supported in my "USBdemo" code, as long as both devices support the boot protocol. I have several 2.4GHz wireless (not BlueTooth) keyboard/mouse combos that I test with.
I haven't done anything with hubs other than do some cursory spec reading. Chapter 11 in the USB 1.1 and 2.0 docs describes the hub specification. From what I've read so far, the gain of the additional physical ports provided by the hub essentially become the mouth of a funnel, and the question then becomes how big can that mouth get before the P2 and two cogs get overwhelmed? A hub handles switching and repeating, but it doesn't do anything to lighten the device count load -- the host must still manage that.
A hub handles switching and repeating, but it doesn't do anything to lighten the device count load -- the host must still manage that.
It must also buffer and adjust packet delivery timing (like a network switch) to make everything handled by the single 11 Mb/s port. This must be helpful for keeping processing speed manageable.
I suppose it could be more like a multi-drop bus (RS485). But either way the limit for the host is still the single 11 Mb/s port. That inherently keeps the load limited.
Good thing is that the bandwidth required on the bus and transfer sizes are effectively sent by the device's various descriptors and controlled by the host polling rate so in theory you could tell in advance if you are going to hit the P2 limit and can prevent enabling too many devices downstream from the hub and instead just report a problem to the user when further devices are attached. That probably violates some USB spec somewhere and you're probably meant to support the worst case bus conditions and device limits etc but at least it could keep existing devices operating if some processing limit of P2 COG implementation would be exceeded by activating too many additional devices.
Yeah there's the USB wire transfer limit but there's also a potential limit for actually scheduling all the polling requests needed when there are multiple devices on the bus. I was wondering if that limit might be reached first, even if the P2 USB low level portions can sustain wire speed on the USB bus. Eg. Consider 127 devices on a bus all needing to be polled at various times etc. Will a controlling COG be able to keep up I wonder??
What that means is that there must be built in allowance for sharing delays. Which in turn means it goes however fast it can. One moment it might be pulling the full rate to a single device, a file transfer, the next moment it might be checking out ten low speed devices at a rather pathetic average.
Q: Any educated guesses on what code size would be needed to appear as any of HID or CDC or MSD device to a PC host ?
Nope. But I do know that a proper HID implementation would be a very significant undertaking because you need to parse the report descriptors to determine device capabilities, and that's a very large can of worms.
I have yet to even scratch the surface as to what's required for mass storage and other device types outside of reading the configuration description chain.
I was meaning flip from your current host, to a slave design, I think you answered for a HID host ?
HID device only needs to supply the report descriptors, not parse them ?
(in another thread, questions are asked about USB booting, and that needs to avoid custom drivers, so some generic USB path is needed)
Given the code size & '2 COGs' indicated, and the fact you need to configure to some Clock source (cannot use RCFAST), I think ROM support for USB boot is not practical.
Best to support that in Flash, I think. Flash parts are very cheap, and the user properly knows the Xtal/Osc used.
I was meaning flip from your current host, to a slave design, I think you answered for a HID host ?
HID device only needs to supply the report descriptors, not parse them ?
Oops Yes, my perspective was from host. The HID client just supplies reports when asked.
(in another thread, questions are asked about USB booting, and that needs to avoid custom drivers, so some generic USB path is needed)
Given the code size & '2 COGs' indicated, and the fact you need to configure to some Clock source (cannot use RCFAST), I think ROM support for USB boot is not practical.
Best to support that in Flash, I think. Flash parts are very cheap, and the user properly knows the Xtal/Osc used.
This is my thinking, too. The two cog approach I've used separates the host USB transaction management from the "device driver". In its current form, the host's bus management, transaction scheduling and host<->driver interface is very crude. What @evanh and @rogloh have discussed is very much on point, and a robust implementation is a significant undertaking, as is defining a practical host/driver interface compatible with PASM, Spin, C/C++, Forth, etc. So far, my USB "demo" is just that -- an attempt to demonstrate that the P2 can self-host FS/LS USB. The heavy lift to make it truly functional on the P2 has yet to come...
.... So far, my USB "demo" is just that -- an attempt to demonstrate that the P2 can self-host FS/LS USB. The heavy lift to make it truly functional on the P2 has yet to come...
That 'proof of hardware' step is a very significant one, do not underestimate your work... .
It would be nice to have 'whatever is simplest device' (ie USB slave) also confirmed working, just in case some timing wrinkle is uncovered in Device cases.
Since you have already coded to Host a Boot Protocol device, would the 'simplest device test' be to flip that, to have P2 the emulate the mouse/keyboard ?
.. The two cog approach I've used separates the host USB transaction management from the "device driver". In its current form, the host's bus management, transaction scheduling and host<->driver interface is very crude. What @evanh and @rogloh have discussed is very much on point, and a robust implementation is a significant undertaking, as is defining a practical host/driver interface compatible with PASM, Spin, C/C++, Forth, etc. So far, my USB "demo" is just that -- an attempt to demonstrate that the P2 can self-host FS/LS USB. The heavy lift to make it truly functional on the P2 has yet to come...
Mouse and keyboard are low data rate connections, have you done any tests around the highest data throughput the code can support ?
For a P2 Eval board, as well as UART-Bridges, and small USB MCUs, a second P2 is another candidate for Debug-Bridge, but that really needs some-mega-baud links.
P2 will not compete on size or price with the sub $1 USB MCUs, but Parallax may not care about that, and may prefer to include a 'working example'of P2-USB.
The 2nd P2 could certainly deliver better test timing and capture specs, than a sub $1 MCU.
USB keyboard/mouse "demo" v0.17 for P2 FPGA image v32.i with verbose device enumeration information to serial terminal and a "lite" version with minimal serial output that's almost half the size. This version uses Cluso99's P2 ROM monitor routines for serial terminal output at a default rate of 2MBaud. Since I have the Prop123-A9, it uses some of the green LEDs for user feedback regarding host and driver cog activity. Button PB3 can be used to toggle between verbose/minimal bus activity and descriptor information output, PB2 will toggle mouse button/velocity/position data to line scroll or rewrite the current line to the terminal and pressing PB1 will route the driver cog to the ROM monitor.
This version has a startup routine that will search for a free even/odd cog pair and publishes the keyboard and mouse data to a common area of hub RAM, so it should now be a bit easier to integrate into other code.
This week we are wrapping up verification simulations at On semi. Tape out in another week.
Sounds good. How long does tape-out take ?
Do they have WAIT and RUN current predictions for each COG yet, vz MHz ?
Can their tools show/simulate the crowbar/dual conduction in-rush current effects of large SRAM startup ? - or is that checked experimentally ?
This week we are wrapping up verification simulations at On semi. Tape out in another week.
Sounds good. How long does tape-out take ?
Do they have WAIT and RUN current predictions for each COG yet, vz MHz ?
Can their tools show/simulate the crowbar/dual conduction in-rush current effects of large SRAM startup ? - or is that checked experimentally ?
I would think that because power supplies don't go from zero to final voltage in no time, but in over a millisecond, there would be no notable inrush current.
For your other questions, I don't think they have that data. Tomorrow, we will know what the expected max power is, given a brutal test.
I would think that because power supplies don't go from zero to final voltage in no time, but in over a millisecond, there would be no notable inrush current.
The effect is not purely a dV/dT one, it is a CMOS logic P+N simultaneous conduction effect. See this TI app note.
"These in-rush currents are needed not only to charge the capacitances of the millions of internal components of the FPGA but also to momentarily supply current through a low resistance path to ground created by, for example,
stacked complementary transistors that are both on. So, during startup, the power rail’s point-of-load dc/dc converter must simultaneously be able to supply a large in-rush current and to maintain a monotonically ramping
output voltage with a certain dv/dt. Therefore, in addition to other components it is powering, the converter’s input power supply must be capable of supplying large load currents (load transients) for short periods of time. "
The silicon USB LS/FS smartpin host mode configuration using the internal DM/DP pull-ups is working. I've only had time to cobble up some quick/dirty tests, but basic device enumeration is looking good. So far, I've only tested at 80Mhz and 180MHz, using a breakout board connection, and not the auxiliary "P2 USB" socket of the P2-ES board.
It will be a few days before I can get some decent time to explore further, but things so far are looking good.
BTW: I noticed yesterday that the P123 USB board I sent you had the USB pullups going to 5 V instead of 3.3 V. Major blunder. Surprised it worked
A blunder only if the connector was configured as a device. For an upstream connection the solder jumpers were there to route the DM/DP pull-down resistors to ground, so no smoke
Comments
From discussion in another thread about boot choices....
Q: What is the current code size for what you have working as HID host LS/FS now ?
Q: Any educated guesses on what code size would be needed to appear as any of HID or CDC or MSD device to a PC host ?
I have yet to even scratch the surface as to what's required for mass storage and other device types outside of reading the configuration description chain.
Is there much extra work for supporting USB hubs do you know? Given the per USB port COG footprint you mentioned above, it likely makes sense if you wanted say both a keyboard and a mouse device hanging off a P2 to use a USB hub to a single port, rather than burn 4 COGs to support two physical USB ports particularly now given the P2 will have 8 COGs not 16.
From what you have done and learned so far with USB you have any idea if there will be any issues with eventually going via a hub and then supporting multiple devices on the bus? Does it then become purely a device tree management issue with additional configuration software and device state overhead etc or something more significant that needs to be considered that may add a lot of overhead somewhere?
This hub thing also applies to the P1's USB host implementation too which is probably more of what I'm interested in at the moment (well P1V actually). There is a single physical USB port limit there right now apparently, though it appears to be software not hardware related.
I haven't done anything with hubs other than do some cursory spec reading. Chapter 11 in the USB 1.1 and 2.0 docs describes the hub specification. From what I've read so far, the gain of the additional physical ports provided by the hub essentially become the mouth of a funnel, and the question then becomes how big can that mouth get before the P2 and two cogs get overwhelmed? A hub handles switching and repeating, but it doesn't do anything to lighten the device count load -- the host must still manage that.
I was meaning flip from your current host, to a slave design, I think you answered for a HID host ?
HID device only needs to supply the report descriptors, not parse them ?
(in another thread, questions are asked about USB booting, and that needs to avoid custom drivers, so some generic USB path is needed)
Given the code size & '2 COGs' indicated, and the fact you need to configure to some Clock source (cannot use RCFAST), I think ROM support for USB boot is not practical.
Best to support that in Flash, I think. Flash parts are very cheap, and the user properly knows the Xtal/Osc used.
This is my thinking, too. The two cog approach I've used separates the host USB transaction management from the "device driver". In its current form, the host's bus management, transaction scheduling and host<->driver interface is very crude. What @evanh and @rogloh have discussed is very much on point, and a robust implementation is a significant undertaking, as is defining a practical host/driver interface compatible with PASM, Spin, C/C++, Forth, etc. So far, my USB "demo" is just that -- an attempt to demonstrate that the P2 can self-host FS/LS USB. The heavy lift to make it truly functional on the P2 has yet to come...
That 'proof of hardware' step is a very significant one, do not underestimate your work... .
It would be nice to have 'whatever is simplest device' (ie USB slave) also confirmed working, just in case some timing wrinkle is uncovered in Device cases.
Since you have already coded to Host a Boot Protocol device, would the 'simplest device test' be to flip that, to have P2 the emulate the mouse/keyboard ?
Mouse and keyboard are low data rate connections, have you done any tests around the highest data throughput the code can support ?
For a P2 Eval board, as well as UART-Bridges, and small USB MCUs, a second P2 is another candidate for Debug-Bridge, but that really needs some-mega-baud links.
P2 will not compete on size or price with the sub $1 USB MCUs, but Parallax may not care about that, and may prefer to include a 'working example'of P2-USB.
The 2nd P2 could certainly deliver better test timing and capture specs, than a sub $1 MCU.
This version has a startup routine that will search for a free even/odd cog pair and publishes the keyboard and mouse data to a common area of hub RAM, so it should now be a bit easier to integrate into other code.
This week we are wrapping up verification simulations at On semi. Tape out in another week.
Do they have WAIT and RUN current predictions for each COG yet, vz MHz ?
Can their tools show/simulate the crowbar/dual conduction in-rush current effects of large SRAM startup ? - or is that checked experimentally ?
I would think that because power supplies don't go from zero to final voltage in no time, but in over a millisecond, there would be no notable inrush current.
For your other questions, I don't think they have that data. Tomorrow, we will know what the expected max power is, given a brutal test.
"These in-rush currents are needed not only to charge the capacitances of the millions of internal components of the FPGA but also to momentarily supply current through a low resistance path to ground created by, for example,
stacked complementary transistors that are both on. So, during startup, the power rail’s point-of-load dc/dc converter must simultaneously be able to supply a large in-rush current and to maintain a monotonically ramping
output voltage with a certain dv/dt. Therefore, in addition to other components it is powering, the converter’s input power supply must be capable of supplying large load currents (load transients) for short periods of time. "
It will be a few days before I can get some decent time to explore further, but things so far are looking good.
I've also spent today with my P2-Eval, will post up some notes soon
A blunder only if the connector was configured as a device. For an upstream connection the solder jumpers were there to route the DM/DP pull-down resistors to ground, so no smoke