Interfacing Prop to Single Board Computer (RPi, BBB)
ags
Posts: 386
I'm starting a project that aims to use the power of a Single Board Computer (SBC - e.g. Rasberry Pi, BeagleBone Black) to free up Prop resources currently supporting functionality like a web server, SD card reader, RTC interface, etc and push that to the SBC. The Prop would then be dedicated to perform real-time functions it handles so well - decoding data, formatting (bit-banging) and sending as output with precise timing (down to the last clock cycle) - something that the SBC running Linux isn't capable of doing. The idea is to have the SBC source the data (from SD card, Ethernet, USB client, etc) and send it to the Prop using it's gpios. I would rely on the precise timing afforded by the Prop for correct final data output timing. I'm thinking of implementing some sort of flow control so the Prop would signal when buffers need refilling; the SBC would monitor and send data as needed. Seems reasonable.
At a high level, I'm stuck figuring out what this SBC/Prop interface would be. I need to be able to stream a minimum of 5Mbps to the Prop, controlled so as to neither overflow or starve the Propeller's relatively small (<32kB) buffer. I currently have a prototype running that is all Propeller-based. It reads data from an SD card and feeds cogs that take care of the formatting/bit-banging and timing for outputs (just about 2Mbps data rate). This works today - but I need to double the amount of data throughput and don't see a path there with the current Propeller-only design.
I'm looking for suggestions on what type of interface I should be considering between the SBC and Prop. (I'm favoring the BBB as the SBC). It seems that using gpio (SPI, UART, I2C) will be too slow on the Prop side to sustain 5Mbps data rate. I'm also not clear on what the implications would be on the loading of the SBC. If I end up with custom drivers (not something I've done before) or LKMs, I could bog down an otherwise powerful ARM Cortex-8 core mostly waiting on the Propeller to signal that it's ready for more data. Clearly I have a lot of learning (I look forward to that).
As I said, I'm looking for suggestions about an overall architecture to pursue. Thanks in advance.
At a high level, I'm stuck figuring out what this SBC/Prop interface would be. I need to be able to stream a minimum of 5Mbps to the Prop, controlled so as to neither overflow or starve the Propeller's relatively small (<32kB) buffer. I currently have a prototype running that is all Propeller-based. It reads data from an SD card and feeds cogs that take care of the formatting/bit-banging and timing for outputs (just about 2Mbps data rate). This works today - but I need to double the amount of data throughput and don't see a path there with the current Propeller-only design.
I'm looking for suggestions on what type of interface I should be considering between the SBC and Prop. (I'm favoring the BBB as the SBC). It seems that using gpio (SPI, UART, I2C) will be too slow on the Prop side to sustain 5Mbps data rate. I'm also not clear on what the implications would be on the loading of the SBC. If I end up with custom drivers (not something I've done before) or LKMs, I could bog down an otherwise powerful ARM Cortex-8 core mostly waiting on the Propeller to signal that it's ready for more data. Clearly I have a lot of learning (I look forward to that).
As I said, I'm looking for suggestions about an overall architecture to pursue. Thanks in advance.
Comments
Sounds like a lot of bother to me to go to that length as I am doing all that on a single Prop now and I still have at least 5 cogs and memory to spare! What exactly are you trying to do then?
Am I missing something here? Unlike a single-core processor the Prop has 8 cores so that it can have precise timing on one, Ethernet on another, protocol on another and so on. My Ethernet and SD run on the main Tachyon console cog but even with dedicated cogs there are enough to go around. You don't really need any spare, if it takes all 8 cogs to make it work then the Prop has fulfilled it's requirements. If that's not enough, just add another Prop perhaps.
I was purposefully attempting to leave the solution space as open as possible. Perhaps the question I should ask is "what is the best way to create a high-speed (>5Mbps) interface between a Propeller and a non-RTOS device?".
It seems that I can't get the speed I need (onto the Propeller) with a single-bit-wide data line. I thought of using a 8-wide data bus, but then I'm not clear if flow control (on the master side) would either have too much latency (starving the Prop buffers) or drag down the single-core master. I also thought about using a standard interface (USB, perhaps) but as that eventually is single-bit interface to the Propeller that doesn't help. I am wondering if there is such a device as a USB client that has a large internal buffer and that handles flow control with the USB host, and provides a data bus interface to the received data. I have no idea if such a thing exists, or if it is a reasonable solution.
Edit: I have found the FTDI FT245R which is just what I was trying to describe. It provides all the timing/interface for a USB device, and an 8-bit wide FIFO to read received data. I still don't know if this is the most effective way to go, but it is a possibility. Still looking for other options, or confirmation of this as a reasonable strategy.
I've looked into the PRUs a bit (including the instructions (it's great that the SoC on the BBB is fully documented)). I'm not sure how to hook them into a process that will be reading data from an SD card or RAM or somewhere in userspace. I also am not clear on how to create a parallel output port (8 bits) without writing my own driver. I'm fairly sure I still need that to be able to achieve the necessary data rate to the Propeller, regardless of how I get it there. I've seen (maybe) some capes that appear to have done just that to drive 3D printers. I don't have experience writing Linux drivers, and that's when I thought of the USB interface.
Imagine you have an ARM chip running Linux and it has shared memory or DMA or whatever interface to an on chip Propeller core.
So now you have "firmware" to write for those prop cogs and perhaps a Linux device driver to load that firmware and subsequently talk to it.
In the case of the PRUs its a big investment of time and effort into something totally non-portable. I don't see anyone going for it unless they have a market for millions of some custom gadget.
So if you want to use it from the Linux side you need some special driver to at least talk to it. The rest is down to creating "firmware" to run on the PRU.
Like I said, you can imagine it as an ARM with a Propeller on the same silicon. You need a Linux device driver to fire up and communicate with the Prop, and you need a Spin/PASM/C firmware to load the COGs with.
So yes, would't it be cool if one day Parallax could licence Propeller cores to all those guys making ARM SoCs for Linux based embedded systems?
We could be using a "standard" Prop architecture everywhere!
The idea of writing a Linux device driver to talk to a PRU, for which you have written firmware to talk to a Propeller, for which you have also written code. Starts to sound very cumbersome though.
@Heater it sounds like a lot of fun and learning to me. And as David says, until we have an ARM/Propeller SoC, this would be a great way to have cake and eat it too.
As said earlier, for the same purpose (sending data to a slave Propeller for bit-banging/precise timing) I was looking into using the simpler, standardized USB connectivity. That would require at least one (FTDI) chip on the slave board, and still may have too much latency to work (for me). I just saw some specs that indicate that the minimum round trip communication for a USB 1.1/2.0 communication is 1ms (that's forever). I presume on the BBB side all the polling for CTS (to the Propeller through USB) would be offloaded from the main processor - although I haven't confirmed yet.
I was also looking into direct manipulation of the GPIO, but parallel (8 data bits + ready bit) not serial. I can't find much on that topic. I suspect it will require a Linux driver. And DT overlay. Maybe more? But more importantly, I also am concerned that it would bog down the ARM core, mostly wasting time with constant polling to see when the Propeller is ready for more data. Unless I can use interrupts for that.
I'm working from the basic assumption that if coded in PASM, one cog will max out at about 1MB/sec data throughput - that is, 8 parallel bits read in and stored in hub RAM. Plus or minus. Am I missing something there? If not, that's the goal I'll set for the sustained data rate to be provided by the BBB (with no buffer under-runs on the Propeller, with a buffer size of whatever is left of 2kB in cog RAM or maybe up to 4kB in hub RAM if necessary).
BTW, in case anyone is reading this and taking what I'm writing as concrete fact, or even if a reader feels compelled to tell me I don't know what I'm talking about... I already know I don't know what I'm talking about. This is my first foray into the world of hacking Linux on a BBB. I'm open to learning from others, and welcome that of course.
...and one more thing: IIRC I read that the PRU instruction set is a subset of about 40 instructions shared with the Cortex-8 core. I didn't bookmark that page and haven't been able to find it again though.
Edit: Actually, there are exactly 45. http://processors.wiki.ti.com/index.php/PRU_Assembly_Instructions
What do you mean? You have lots of reading and learning and planning to do. Having the actual hardware with you will just be a distraction. :-)
I'm surprised and pleased that I was able to discover the PRUSS. The more I learn about it the better it seems - until the misery of actually using them becomes real. I hope we can collaborate on this project.
The instruction set is not shared with the ARM core; it's completely separate. It's completely deterministic, one clock for every instruction @ 200 MHz, unless you address the memory in the ARM. There is a blob device driver in Unix space that is callable from user space to communicate with the PRUSS, so you may not have to write a device driver. My experience is that there's quite a learning curve to use it from linux, although it looks like a lot of fun to program the PRUSS itself.
Programming the PRUSS is done with a one-pass assembler, no C, and no linking, although it does have normal pre-processor directives, and you can include other files, so it can be modularized to a certain degree.
Thanks.
Well, it seems that Tackyon Forth makes the Propeller 'scary good'.
I'd like to summariize a few salient points.
A. The Propeller One (and likely the Propeller Two as well) is the ideal interface extender for all and any Single Board Computer or SOC.
B. And in many cases with Peter J's Tackyon Forth code, the whole project might fit on just a Propeller.
I don't have a link; I've got it all on my computer now. If you google "am335xPruReferenceGuide.pdf", and look for other things in the same place, it should get you more than started. Beware that the newest version of the am335x spec. document (not the Pru-specific one) has the PRU stuff stripped out, and you need to get an older one (spruh73c.pdf, not 73h).
TI has been weird about them from the beginning. They are advertised as a feature of the chip but they don't seem at all interested in supporting them in any way what-so-ever. I think I remember one comment to the effect of not supported for 'hobby use', whatever that means? As hobbyist buyers are the only ones I know of that sounds a lot like "we just aren't supporting them".
I know that BBB and TI have ties but it is true that this is not a TI product. That can always backfire on you though because you never know which person you turn your nose up at may be the one that launches that new product that uses your fan-dangled part. It's not so much the idea that they won't build goodies for the BBB community it's the fact that they seem to be reluctant to even share what resources exist with a kind of "you'll shoot your eye out" attitude about it. Perhaps I am miss-reading the intent but I definitely got the impression they were very stand-offish about the PRUSS. I ended up deciding a prop and a serial link would fill the need easier right now.
...and now we've come full circle (or at least part way around). So while the PRUSS is interesting and I do want to learn more about it, I don't need to force doing everything the hard way (that seems necessary with enough frequency without me insisting on it). I was first wondering if I could use an existing interface from the BBB to the Propeller to get data from BBB to Prop for bit-banging in real time. With the speed and memory limitations of the Propeller, I have to have pretty good flow control to keep the Propeller buffers full (avoiding starvation) without overflow (avoiding data loss). With Linux not a RTOS, how does one effectively manage the data flow? Did you accomplish this with your serial interface, or do you have the luxury of just using wait/sleep on Linux and then wake/resume, send data and repeat without issue?
Sorry, Didn't see your question earlier. No my use of the Propeller is the other way around. The Propeller just does it thing as fast as it can and my Linux program spits commands at it [SN:R:0] - Sonar, Read, Sensor 0 and the prop spits back whatever the current sonar distance reading is [210] - 21.0 inches. The other Subsystem is a clock generator. So [CK:W:0:8100] - Clock, Write, channel 0, 8.1MHz. That's a fire and forget deal. Planning to add a GPIO system as well so I can say [GP:W:0:FF].... you get the picture. It's all stuff that would be sampled at 20 to 40hz so nothing that has to be more than 115200bps.
With memory-mapped userland I/O I know you can get toggle rates in the MHz on the ARM boards depending on the performance of the library. The issue will always be what happens when some other task stomps on the CPU for a while and your userland toggle gets the school-bus stop sign. Kernel driver?
Imagine trying to write something like full duplex serial by bit-banging on the GPIO of a Pi.
Your kernel module could probably do that, provided yo disabled interrupts permanently so that incoming bits and bytes are not missed and the outgoing bit timing is not delayed by being rescheduled.
But then of course you have halted the entire operating system just to do that bit banging job reliably. Even whatever program you have that might use the driver has no time to run!
Lets see.... a nibble wide bus would need to average 1.25Mhz to reach 5Mbps, so that's like 8 micro seconds. If you can devise a way to avoid usleep or it's kin to determine when to send the next nibble and clock that seems potentially doable.
If it has to be an async line of some kind, I don't see how you get there (5Mbps) from here. SPI can get up to a pretty high clock rate on Beagle/Pi but I found that byte to byte transmission times can vary a lot and are often so long that clock rate increases are not significantly helpful. I think at 1Mbps the delay from byte to byte was close to 1/3 of the time to transmit all the bits of each byte.