Hmm.. something like this might help make use of idle time on other cogs to accelerate the encode/decode process.. but it wouldn't help reduce the cog count below three.
The peak of 3 cogs is currently needed for receiving a packet. While receiving, there are a number of tasks that have to take place:
Receiving 1 bit every 8 clock cycles, and storing it somewhere. Should be able to store approximately 1 kilobyte total.
Noticing the end-of-packet, so we know when to stop receiving
Sending an ACK packet if we need to. This must happen with bounded latency after the end-of-packet signal.
Watchdog for the receiver cog(s), so we can wake them up if a packet never arrives.
Currently these tasks are divided between three cogs, which I'm calling TX, RX1, and RX2:
TX: Receiver watchdog
TX: Poll in a tight loop for the end-of-packet, and send an ACK using the video generator
RX1: Store the first 16 bits of each 32-bit word, in hub memory.
RX2: Store the second 16 bits of each 32-bit word, in hub memory.
RX1/2: Both also have a higher-latency end-of-packet detector, for terminating their own receive loops.
The only way I can think of to optimize this further is to combine the RX1 and RX2 cogs into one- but that requires receiving an entire packet using one fully unrolled loop. So you're limited to receiving pretty small packets. Even if you assume that the entirety of cog memory is used only for the receive loop and receive buffer, you would only have enough memory to receive a 31 byte packet. And that isn't enough for most USB devices.
So, as best I can tell, there's no way to do better than 3 cogs without either introducing external hardware or severely restricting the kinds of USB devices you can talk to.
Let me jump in to comment here as this kernel is something I created.
The kernel is based on a selectable tick time, somewhere in the 2 to 5 (or even 10) uSec neighborhood. The faster for higer performance and lower latencies, but fewer threads, and the slower for the respective opposites. Depending on the number of threads, the task context switch takes just under half of a microsecond (@80 MHz), so it really is not directly suitable for switching events that need performance faster than that.
micah: This is really neat. Get as much working as possible. We can all chime in later to get it faster and maybe reduce cogs, but we need to know what needs to be present first. This is one great little addition to the prop. I have long wanted to be able to use a USB memory stick and USB Bluetooth on the prop. Both these items are so cheap. I have a miniature USB that has a microSD socket for uSD cards, so it is an easy way to transfer files. Thanks.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
I just posted a new version from today's svn (20100405).
There's a lot of cleanup and optimization in the host controller driver. Memory usage is 949 longs, and it's down to 3 cogs. The object is a singleton now, so you can declare it in the OBJ section of each USB device class driver, and they're all sharing the same instance of the host controller.
The other big change is that this includes a basic USB storage class driver. The storage class driver should include enough functionality to be usable, though YMMV. It's seen very little testing so far, and I've seen certain data patterns trigger bugs in the host controller that will manifest as E_CRC errors. I had to change my demo to read sector 1 instead of sector 0 to work around this bug on my disk [noparse]:)[/noparse]
I'll keep trying to debug and polish this as time permits, but I figured that what I had was a lot nicer than the original version already [noparse]:)[/noparse]
The implications of this are simply amazing! Three cogs makes this extremely enticing!
I take it that a USB thumbdrive driver is only a FSRW modification away?
The implications of this are simply amazing! Three cogs makes this extremely enticing!
I take it that a USB thumbdrive driver is only a FSRW modification away?
OBC
I hope so [noparse]:)[/noparse]
I know how to fix the CRC problems I was hitting last night... I ran some more tests, and it looks like the failing packets were those which happened to have a string of "1" bits at the end of their CRC, and which happen to end with a zero on the D- pin. This confuses the pseudo-end-of-packet detector that I'm using now. But now that I have a real end-of-packet detector for sending ACKs, I can calculate the real packet length by taking timestamps at the beginning and end of the packet. Just need to write and debug that code, and hopefully it'll be a working block-level driver.
I expect the first working version to be pretty slow, though. The host controller's packet encode/decode steps haven't been optimized at all yet.
hover1 said...
Is the code locked to 96Mhz? If it will run a 100Mhz I'll send microcontrolled a 6.25Mhz crystal.
Jim
It's locked to 96 MHz. I need an integer number of (2 or greater) instructions per USB bit period. So 96 MHz is the slowest speed it would work at, and 144 MHz is the next fastest one.
So yeah.. sorry, but you do need a 6 MHz crystal [noparse]:([/noparse]
·I guess Sapieha will be the only one running at 144 MHz.
Jim
Micah Dowty said...
hover1 said...
Is the code locked to 96Mhz? If it will run a 100Mhz I'll send microcontrolled a 6.25Mhz crystal.
Jim
It's locked to 96 MHz. I need an integer number of (2 or greater) instructions per USB bit period. So 96 MHz is the slowest speed it would work at, and 144 MHz is the next fastest one.
So yeah.. sorry, but you do need a 6 MHz crystal [noparse]:([/noparse]
I can order one. I am was going to order an ethernet chip here soon anyway.
Also, I technically HAVE a 6Mhz Xtal that came with a Gadget Gangster kit, but after 30 minutes+ of searching I guess it is lost. :-( Now back to looking......
You need an xtal with about 18-20pF. I use both 6MHz and 6.5MHz (104MHz) xtals from DigiKey (see RamBlade thread for the part number of the 6.5MHz). Note you will require special decoupling and pcb layout on the prop to run at these speeds. I also have 13.5MHz (108MHz, pll=8) but have not completed testing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
@Holly: NICE!! AND they accept PayPal!! DigiKey, unfortunatly, does not, and I have no other online paying method. With the cheap shipping, I'll order a few other things as well.
@Microcontrolled
They had those nice little NO switches that can mount easily on a breadboard for 4 cents each.
I ordered 100 of those, 100 1N4007 diodes, a couple of thousand resistors at 1 cent each, 100 PN4401.
Most small parts like resistors, ceramic caps..etc were one or two cents each.
If you're lucky the configuration of your 6MHz crystal is of no consequence. If you're not lucky, you have to follow recommended designs (definitely a requirement for high volume production).
BTW, I found the RALINK 802.11b/g adapter linux driver source and I have a compatible device, but don't have time for it just yet.
Good progress Micah [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
May the road rise to meet you; may the sun shine on your back.
May you create something useful, even if it's just a hack.
Good progress Micah!
I almost forgot about a thought I had to reduce cog usage (I looked at your code before bedtime, had the thought, and forgot about it the next couple days- see what you think)
You currently use 2 cogs to receive one bit at a time every 2 instructions. After receiving 16 bits with one cog that cog has some time to write the data to hub.
Using the "mov x,ina" instruction, you can read multiple bits at the same time- provided that they're waiting for you on the Propeller's IO pins. Using delay lines with multiple pins, you can use this trick to read multiple bits at the same time. I would start with reading data into the cog's ram and spooling it back to hub ram when the receive is finished.
Good luck!
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Co-author of the official Propeller Guide- available at Amazon
Developer of ViewPort, the premier visual debugger for the Propeller (read the review here, thread here), 12Blocks, the block-based programming environment (thread here)
and PropScope, the multi-function USB oscilloscope/function generator/logic analyzer
Hanno said...
Good progress Micah!
I almost forgot about a thought I had to reduce cog usage (I looked at your code before bedtime, had the thought, and forgot about it the next couple days- see what you think)
You currently use 2 cogs to receive one bit at a time every 2 instructions. After receiving 16 bits with one cog that cog has some time to write the data to hub.
Using the "mov x,ina" instruction, you can read multiple bits at the same time- provided that they're waiting for you on the Propeller's IO pins. Using delay lines with multiple pins, you can use this trick to read multiple bits at the same time. I would start with reading data into the cog's ram and spooling it back to hub ram when the receive is finished.
Good luck!
Hanno
Thanks!
Delay lines would definitely help trade cogs for pins. But part of the fun IMHO is to do this with no external active components. If I'm going to buy a delay line chip, might as well make it a USB host controller chip [noparse]:)[/noparse]
It's still really rough, but I just checked in an FT232 driver. I'm still getting CRC errors occasionally (well, maybe a bit more than occasionally) but for the most part you can use this to talk to a Prop Plug over USB [noparse]:)[/noparse]
I've been gradually bugfixing the host controller core and making it more robust. Found a couple fairly serious bugs while working on the USB storage driver. At this point, I think the FT232 and storage drivers are pretty much complete, it's just a matter of using those drivers to bugfix, polish, and optimize the host controller itself. Storage was a bit inconvenient for debugging purposes.. the FT232 driver should make it easier to test arbitrary packet lengths and contents.
I'm making progress on a Bluetooth stack built around this host controller. So far I have support for:
- HCI (the low-level Host Controller Interface protocol that gets sent over USB)
- Device discovery (Inquiry, setting local name/class)
- ACL packets
- Basic support for L2CAP connections, echo response
- Work-in-progress SDP server (service discovery)
I've been testing it mostly using tools from Linux's BlueZ stack, especially sdptool and l2ping. The l2ping performance seems reasonable- 10-20ms latency with about 2% packet loss. I suspect all of that packet loss is due to CRC errors on the received USB packets, due to the corners I had to cut in the bit-banging USB receiver.
So, it's at about the point where I'm thinking about what other protocols to implement after I finish SDP. I think I'll create a sort of low-level socket interface that allows attaching hub-memory buffer lists to L2CAP or RFCOMM connections. I'm interested in hearing from you about what applications this stack might be useful for. My ideas so far:
- Communicating with other Propellers or small embedded systems (L2CAP + Custom protocols)
- Talking to the Wiimote (L2CAP + HID)
- Serial port emulation (RFCOMM)
- Sending/receiving files and messages? (OBEX)
What do you think? Is anyone else likely to use this Bluetooth stack? If so, what for?
I have some BlueTooth dongles and other devices and can see some interesting possibilities with audio and generic PAN wireless device communications.
Have you implemented a socket layer interface? Some Linux L2CAP examples use sockets (I haven't researched this much). Sockets would easily enable other devices such as RALINK WIFI. Are there alternatives to sockets?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
May the road rise to meet you; may the sun shine on your back.
May you create something useful, even if it's just a hack.
Comments
I'm not really qualified to suggest this, but could the scheduler-kernel of Peter Van der Zee's PropRTOS be helpful?
T o n y
http://www.parallax.com/PropRTOS/tabid/852/Default.aspx
Hmm.. something like this might help make use of idle time on other cogs to accelerate the encode/decode process.. but it wouldn't help reduce the cog count below three.
The peak of 3 cogs is currently needed for receiving a packet. While receiving, there are a number of tasks that have to take place:
- Receiving 1 bit every 8 clock cycles, and storing it somewhere. Should be able to store approximately 1 kilobyte total.
- Noticing the end-of-packet, so we know when to stop receiving
- Sending an ACK packet if we need to. This must happen with bounded latency after the end-of-packet signal.
- Watchdog for the receiver cog(s), so we can wake them up if a packet never arrives.
Currently these tasks are divided between three cogs, which I'm calling TX, RX1, and RX2:- TX: Receiver watchdog
- TX: Poll in a tight loop for the end-of-packet, and send an ACK using the video generator
- RX1: Store the first 16 bits of each 32-bit word, in hub memory.
- RX2: Store the second 16 bits of each 32-bit word, in hub memory.
- RX1/2: Both also have a higher-latency end-of-packet detector, for terminating their own receive loops.
The only way I can think of to optimize this further is to combine the RX1 and RX2 cogs into one- but that requires receiving an entire packet using one fully unrolled loop. So you're limited to receiving pretty small packets. Even if you assume that the entirety of cog memory is used only for the receive loop and receive buffer, you would only have enough memory to receive a 31 byte packet. And that isn't enough for most USB devices.So, as best I can tell, there's no way to do better than 3 cogs without either introducing external hardware or severely restricting the kinds of USB devices you can talk to.
--Micah
Let me jump in to comment here as this kernel is something I created.
The kernel is based on a selectable tick time, somewhere in the 2 to 5 (or even 10) uSec neighborhood. The faster for higer performance and lower latencies, but fewer threads, and the slower for the respective opposites. Depending on the number of threads, the task context switch takes just under half of a microsecond (@80 MHz), so it really is not directly suitable for switching events that need performance faster than that.
Cheers,
Peter (pjv)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
There's a lot of cleanup and optimization in the host controller driver. Memory usage is 949 longs, and it's down to 3 cogs. The object is a singleton now, so you can declare it in the OBJ section of each USB device class driver, and they're all sharing the same instance of the host controller.
The other big change is that this includes a basic USB storage class driver. The storage class driver should include enough functionality to be usable, though YMMV. It's seen very little testing so far, and I've seen certain data patterns trigger bugs in the host controller that will manifest as E_CRC errors. I had to change my demo to read sector 1 instead of sector 0 to work around this bug on my disk [noparse]:)[/noparse]
Here's some sample output from test-storage.spin:
I'll keep trying to debug and polish this as time permits, but I figured that what I had was a lot nicer than the original version already [noparse]:)[/noparse]
--Micah
The implications of this are simply amazing! Three cogs makes this extremely enticing!
I take it that a USB thumbdrive driver is only a FSRW modification away?
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Are you Propeller Powered? PropellerPowered.com
Visit the: PROPELLERPOWERED SIG forum kindly hosted by Savage Circuits.
I hope so [noparse]:)[/noparse]
I know how to fix the CRC problems I was hitting last night... I ran some more tests, and it looks like the failing packets were those which happened to have a string of "1" bits at the end of their CRC, and which happen to end with a zero on the D- pin. This confuses the pseudo-end-of-packet detector that I'm using now. But now that I have a real end-of-packet detector for sending ACKs, I can calculate the real packet length by taking timestamps at the beginning and end of the packet. Just need to write and debug that code, and hopefully it'll be a working block-level driver.
I expect the first working version to be pretty slow, though. The host controller's packet encode/decode steps haven't been optimized at all yet.
--Micah
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out my new website!!
Use the Propeller icon!!
Follow me on Twitter! Search "Microcontrolled"
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Are you Propeller Powered? PropellerPowered.com
Visit the: PROPELLERPOWERED SIG forum kindly hosted by Savage Circuits.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out my new website!!
Use the Propeller icon!!
Follow me on Twitter! Search "Microcontrolled"
Jim
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Are you Propeller Powered? PropellerPowered.com
Visit the: PROPELLERPOWERED SIG forum kindly hosted by Savage Circuits.
It's locked to 96 MHz. I need an integer number of (2 or greater) instructions per USB bit period. So 96 MHz is the slowest speed it would work at, and 144 MHz is the next fastest one.
So yeah.. sorry, but you do need a 6 MHz crystal [noparse]:([/noparse]
--Micah
·I guess Sapieha will be the only one running at 144 MHz.
Jim
Also, I technically HAVE a 6Mhz Xtal that came with a Gadget Gangster kit, but after 30 minutes+ of searching I guess it is lost. :-( Now back to looking......
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out my new website!!
Use the Propeller icon!!
Follow me on Twitter! Search "Microcontrolled"
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out my new website!!
Use the Propeller icon!!
Follow me on Twitter! Search "Microcontrolled"
Thanks again for this marvelous USB object!!!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out my new website!!
Use the Propeller icon!!
Follow me on Twitter! Search "Microcontrolled"
Yep, that crystal looks good.
Glad to provide something that might be useful if it works [noparse];)[/noparse] Good luck!
--Micah
price is $3.29 including shipping
www.taydaelectronics.com/servlet/the-95/6.000-MHz-6-MHz/Detail
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out my new website!!
Use the Propeller icon!!
Follow me on Twitter! Search "Microcontrolled"
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out my new website!!
Use the Propeller icon!!
Follow me on Twitter! Search "Microcontrolled"
They had those nice little NO switches that can mount easily on a breadboard for 4 cents each.
I ordered 100 of those, 100 1N4007 diodes, a couple of thousand resistors at 1 cent each, 100 PN4401.
Most small parts like resistors, ceramic caps..etc were one or two cents each.
I hate to pass up a deal
BTW, I found the RALINK 802.11b/g adapter linux driver source and I have a compatible device, but don't have time for it just yet.
Good progress Micah [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
May the road rise to meet you; may the sun shine on your back.
May you create something useful, even if it's just a hack.
I almost forgot about a thought I had to reduce cog usage (I looked at your code before bedtime, had the thought, and forgot about it the next couple days- see what you think)
You currently use 2 cogs to receive one bit at a time every 2 instructions. After receiving 16 bits with one cog that cog has some time to write the data to hub.
Using the "mov x,ina" instruction, you can read multiple bits at the same time- provided that they're waiting for you on the Propeller's IO pins. Using delay lines with multiple pins, you can use this trick to read multiple bits at the same time. I would start with reading data into the cog's ram and spooling it back to hub ram when the receive is finished.
Good luck!
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Co-author of the official Propeller Guide- available at Amazon
Developer of ViewPort, the premier visual debugger for the Propeller (read the review here, thread here),
12Blocks, the block-based programming environment (thread here)
and PropScope, the multi-function USB oscilloscope/function generator/logic analyzer
Thanks!
Delay lines would definitely help trade cogs for pins. But part of the fun IMHO is to do this with no external active components. If I'm going to buy a delay line chip, might as well make it a USB host controller chip [noparse]:)[/noparse]
I've been gradually bugfixing the host controller core and making it more robust. Found a couple fairly serious bugs while working on the USB storage driver. At this point, I think the FT232 and storage drivers are pretty much complete, it's just a matter of using those drivers to bugfix, polish, and optimize the host controller itself. Storage was a bit inconvenient for debugging purposes.. the FT232 driver should make it easier to test arbitrary packet lengths and contents.
The Subversion repository is at:
http://svn.navi.cx/misc/trunk/propeller/usb-host/
--Micah
- HCI (the low-level Host Controller Interface protocol that gets sent over USB)
- Device discovery (Inquiry, setting local name/class)
- ACL packets
- Basic support for L2CAP connections, echo response
- Work-in-progress SDP server (service discovery)
I've been testing it mostly using tools from Linux's BlueZ stack, especially sdptool and l2ping. The l2ping performance seems reasonable- 10-20ms latency with about 2% packet loss. I suspect all of that packet loss is due to CRC errors on the received USB packets, due to the corners I had to cut in the bit-banging USB receiver.
So, it's at about the point where I'm thinking about what other protocols to implement after I finish SDP. I think I'll create a sort of low-level socket interface that allows attaching hub-memory buffer lists to L2CAP or RFCOMM connections. I'm interested in hearing from you about what applications this stack might be useful for. My ideas so far:
- Communicating with other Propellers or small embedded systems (L2CAP + Custom protocols)
- Talking to the Wiimote (L2CAP + HID)
- Serial port emulation (RFCOMM)
- Sending/receiving files and messages? (OBEX)
What do you think? Is anyone else likely to use this Bluetooth stack? If so, what for?
I have some BlueTooth dongles and other devices and can see some interesting possibilities with audio and generic PAN wireless device communications.
Have you implemented a socket layer interface? Some Linux L2CAP examples use sockets (I haven't researched this much). Sockets would easily enable other devices such as RALINK WIFI. Are there alternatives to sockets?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
May the road rise to meet you; may the sun shine on your back.
May you create something useful, even if it's just a hack.