Using the Propeller as a Multi-Master SPI Bus Arbitrator

Wulffy · 2007-09-11 05:17

To·save myself a bunch of retyping, I have included a post made to a different forum.

I decided to post herein to solicit the feedback of those intimately familiar with the Propeller device to see if it could indeed be a viable solution for an Multi-Master SPI Bus Arbitrator for a system topology that will have several 'Masters' and several 'Slaves'.· In reading the following, I am sure that you will be able to glean that I am looking at having several MCU units as my 'Masters' - they are ARM based SBCs and Stamps.· The slave devices will be items such as MicroMegaCorp's uM-FPU, GHI Electronics' uAL-FAT SD, and a blazing-fast, zero-wait, near-unlimited write FRAM memory device.

During the course of my dialogue with several veterans in the business, the prop came up as a possible viable-candidate for selection as a MM-SPI Bus Arbitrator.· Being totally ignorant with respect to the device, it's strength and weakness', and it's real eligibility for consideration in this role,·I'm requesting objective feedback from·you kind folks, as to if you feel the Propeller would accel in this capacity.

My target topology is currently a 3 Master, 3+slave SPI bus arbitrated by either code and hardware onboard the masters, or if that is not feasible, the use of a dedicated SPI communications arbitrator with a device such as the Propeller.

Please review and advise.· Thanks in advance.· Have a wonderful day!· -t

I am sure that this thread will prove to be a bit esoteric, but I am struggling with something that I'd like to throw out to the group for review and consideration.

My issue is this: I am looking at implementing common-memory on a Multi-Master SPI bus. I am looking at having three masters. I'd like to provision for a total of four (a minimum) or more, to facilitate future growth.

I have read a lot and understand that, in it's native implementation and by it's very nature, SPI and a Multi-Mater topology is somewhat mutually exclusive.

I plan on having 3+ SBCs in my system. Without having the luxury of interrupt driven events, no RTOS capabilities, and a strong need to facilitate High-Speed Asynchronous Transfer of data to and from one module to the other, I have decided to try and tackle what some texts have described as "rare and awkward, and are usually limited to a single slave".

One oh the hardware-layer elements·that I am looking at using to effect this is SPI-Based FRAM. The primary reasons for this choice is three-fold:

Longevity. First and foremost, due to the very high quantity of read/writes that I am looking at, having 10^12 to "unlimited" write cycles will serve to ensure that once deployed, I won't run into issues with reliability that Flash based devices would be faced with, possibly failing after as few as 10^5 or 10^6 write cycles.
Zero-Wait state. The data is stored as fast as I can possibly shovel it down the device's throat, without having to stall while waiting for the memory device to retain the data - no refreshes, no battery backup, no delays related to writing to slow flash.
Speed. I2C, by design, facilitates multiple masters, but I2C is slow. SPI, conversely, is darn quick, and I feel is the better choice given what I want to use this for, especially with the Speed of SPI coupled with the Zero-Wait memory - should prove to be brutally fast, if it can be made to work...
Accordingly, I have decided that I want to jump this high hurdle...

I am going to have one of my MCUs acting as a Telemetry Host - interfaced to a MaxStream Xtend transceiver. It will be responsible for querying the common-memory to retrieve the parameters that are to be transmitted. Additionally, it will also receive commands from the ground control station for the airborne subsystems and store them in the common-memory. The Telemetry Host MCU could also be the Flight Data Recorder, if I determine that it has enough 'idle' time when I do my work-load studies on the ship's systems.

I will have a 2nd MCU acting as a Navigation Computer Unit (NCU). By virtue of it's name, I am sure that you can probably extrapolate it's role in life - GPS/IMU interface-based 3D-Waypoint navigation.

I initially planned on also have a third MCU acting a flight data recorder and also as a movable surfaces controller, listening to the Rx's ppm and demuxing it to the servos/actuators, and I may very well proceed in this fashion. The one thing that may cause me to reconsider this is the fact that there are some pretty simple and robust COTS solutions showing up out there with PIC-based dedicated servo controllers, or the new ASIS? that MX has a lead on (I haven't yet ping'd MX for the vitals on that - I'll let him get comfortable with the hardware first, before I go bugging him about it...). If these COTS solutions prove to be sufficient and reliable enough, then I won't re-invent the wheel and I'll try to push the flight data recording back onto the Telemetry Host MCU, and implement the COTS solution for Servo Demuxing, Control, and Failsafe. This is all going to be driven by the loop iteration rates that I can get out of each of the subsystems...

That bring me full circle to the root-reason for the implementation of the common-memory device. My hardware choice and language selection doesn't yield a combination where I feel that I will have the resources to have coordinated synchronous communication directly between the various subsystems without substantive overhead penalties. I feel that this overhead would be unacceptable.

Being able to have each subsystem having a control loop that is NOT predicated on relying on a separate subsystem's commands or data communications will yield the most efficient and desirable results - i.e. I feel that:

Having the NCU collecting data from the GPS/IMU, doing the 3D or 4D navigation calculations, storing the needed reactions to the high-speed common-memory and moving on to do it all over again will serve to yield a very high loop rate for this subsystem
Having the Telemetry Host being able to go to the high-speed common-memory to immediately store any received commands, and then retrieve the various parameters that require transmission, without having to wait for other systems to get to the point in their loop to directly provide the data to it, will serve to ensure that the the control loop for the Telemetry Host is at an adequately high enough rate.
Having the Movable Surfaces controller being able to fetch the required corrections, independent of the source of said corrections, will surely serve to ensure that the control loop iteration rate is sufficiently high.

Basically the above serves to illustrate that the operations of the various subsystem's loops can become pseudo-autonomous and independent of the other subsystems, with the storage/extraction of commands and information to/from the FRAM SPI slave device.

...

Utopia would be the availability of a SoC device that has three or four+ SPI·master-input ports with multi-port FRAM, and a slave SPI bus. ·Again, I have searched and have yet to find such a device... Granted I could implement something with FPGA systems, but that has a whole set of penalties that I don't think I am willing to pay.

With the zero-latency of FRAM, with the high speed of SPI, and with the inter-subsystem independence that this approach provides, I feel that this is a very compelling reason to try to implement what I am considering.

...

So, now that I have over-bloodied the frigging horse [noparse]:)[/noparse], can those of you who may have some first-hand knowledge of Multi-Master SPI (MM-SPI/SPI-MM) arbitration implementation please reply with some suggestions as to how the successful arbitration of 3+ masters with single/ multiple slave devices might be able to be achieved?

Some have mentioned that a Parallax Propeller might be the way to go with the processor array that it possess', other have suggested FPGAs with embedded processor cores and memory. I'd like to think that there might be a way to successfully implement this without having to rely on any additional external components. SPI bus collisions need to be totally avoided. Unrecoverable system crashes are not acceptable. it seems to me, ignorance admitted, that the use of one or more GPIO lines in between the Masters, and some simple check and balance code on each Master, should be a means with which to successfully implement said arbitration with a very high degree of reliability and a (very-?)small performance penality...

Thanks for reading my diatribe. Please review and advise with any viable suggestions that you might have, or have experienced success with previously.

Again, thank you!

-t

p.s. In addition to the SPI FRAM device, I also have an I2C FRAM device coming, just in case I can not make the MM-SPI implementation work as needed/intended...

Skogsgurra · 2007-09-11 07:27

I surely didn't understand all that you wrote.

But I have noticed one thing: If you have a few microseconds to spare, the Propeller seems to be able to do almost anything you want it to do.

My questions:
Is it the SPI as I know it (four lines) you have in mind?
What delays are acceptable before bus access is granted?
What data rates are we talking about? GBPS, MBPS, kBPS?

If the answers are: "Yes, SPI", "a few microseconds" and "MBPS or lower" then the answer, I think (don't hold it against me) that it shall be an easy matter to do what you want to do. Having a byte interface or broader would certainly speed tings up. The Prop-II will have 64 bits of I/O, so it is easy to have at least six or seven byte-wide busses connected.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Lawson · 2007-09-11 14:58

en.wikipedia.org/wiki/Arbiter_%28electronics%29

I understand that SPI slaves devices ignore anything that happens on the bus when they're CS, Chip Select, line dissabled. So for a single master to multiple slaves, another CS line for each slave is all that's needed. To have multiple masters to multiple slaves, a system is needed for one master to gain exclusive access to the SPI bus. This requires an Arbiter and a way to tell the masters when they have control and when to wait for control. No need to actually switch the clock and signal lines. ( "bus arbiter" chips should exist...)

my 2 bits,
Marty

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Lunch cures all problems! have you had lunch?

Wulffy · 2007-09-12 02:43

Skogsgurra said...
I surely didn't understand all that you wrote.

But I have noticed one thing: If you have a few microseconds to spare, the Propeller seems to be able to do almost anything you want it to do.

My questions:
Is it the SPI as I know it (four lines) you have in mind?
What delays are acceptable before bus access is granted?
What data rates are we talking about? GBPS, MBPS, kBPS?

If the answers are: "Yes, SPI", "a few microseconds" and "MBPS or lower" then the answer, I think (don't hold it against me) that it shall be an easy matter to do what you want to do. Having a byte interface or broader would certainly speed tings up. The Prop-II will have 64 bits of I/O, so it is easy to have at least six or seven byte-wide busses connected.

Thanks for the reply, Skogsgurra.

Answers are indeed 'Yes, SPI as Motorola defined it years ago (CSEL,MISO,MOSI,SCLK). Yes, a few microseconds is acceptable. And, yes, MBPS and slower (actually 600Kbps to 800Kbps).'

The deployment of (and resulting speed benefits from) byte+ wide data paths are simply not feasible with the utilization of the IO as it exsits in my proof of concept efforts.· This is the one thing that is frustrating with SPI is the n+3 utilization of the IO (n=# of slave devices).

If I were able to have an arbitrator be able to handle the slave IO, and communicate with the masters via a simple bus, then that would certainly serve to free up some IO on the masters (and hog a lot of IO on the arbitrator [noparse]:)[/noparse]...

Again, thank you for your time and feedback.· It is appreciated.

-t

Wulffy · 2007-09-12 02:48

Lawson said...
en.wikipedia.org/wiki/Arbiter_%28electronics%29

... To have multiple masters to multiple slaves, a system is needed for one master to gain exclusive access to the SPI bus. This requires an Arbiter and a way to tell the masters when they have control and when to wait for control. No need to actually switch the clock and signal lines. ( "bus arbiter" chips should exist...)...

Fully understood, Lawson.· This is the root reason for my post: "Will the Propeller be a wise choice as a bus arbitrator (arbiter)?"· That is what I am looking for feedback on.

Or, are you suggesting that I look away from the Propeller and look to a device that is specifically designed for said task?· If so, I have looked·for an arbiter specifically designed to be deployed in an SPI based topology.· As of yet, I have not had any luck locating one, hence my considering creating one using the Propeller...

-t

Post Edited (Wulffy) : 9/12/2007 2:53:01 AM GMT

Skogsgurra · 2007-09-12 06:50

It looks well then.

Make those "a few microseconds" one or two microseconds. Using one cog for each channel and a fast SRAM (plus some semaphores) and the "hub cog" to start/coordinate things plus four wires for each SPI connection would make a very efficient 7 channel SPI router/hub/arbitrator at a very low cost.

I think that Chip had this application in mind when he created the propeller

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chad George · 2007-09-12 12:21

I've been working on the same problem in a current project.

I toyed around with using a small pic or cpld, but in the end I decided that the pic might be to slow and the cpld might be overkill and too complicated. I've never used them before and keep wanting to try, but I think the app will have to absolutely require it before I go there.

So I resolved to use the PIC as I already knew how to do it and it would be cheap.

I was already using half of a 74HC139N to decode my slave select lines (to solve the N+3 problem that was already mentioned) and I thought of a way to use the other half to replace the PIC.

Basically, I add two additional lines to each master, a bus request (REQ) and a bus acknowledge (ACK). Sorry the line names came from when I was using the PIC.
The REQ lines feed the address input to the decoder. The appropriate output of the decoder is the ACK signal. My theory (I haven't actually done it yet) is that when a master wants the bus it sets the REQ and when the ACK line goes from high to low its got the bus. If it has the bus (REQ high) and its ACK line goes low then someone else is asking for the bus. When its done then it drops its REQ and the next guy will get a high-low on its ACK line.

I think it'll work and it seems to scale pretty well, but I'd be interested in any criticism or problems anyone sees in this setup.

EDIT: My own first critique was that it is susceptible to the classic "dining philosophers" problem so I added simple mechanism to check if the bus should be available. If it is available and the bus master does get it right away then there was a conflict and some kind of small random wait time resolution can be used to prevent locking everything up.

EDIT (again): The above fix was completely wrong as was so kindly pointed out. I couldn't come up with a plan that shares the busy signal with the the request line. Adding a third control line fixes the problem, but can be discarded if a less than optimal acquisition strategy is acceptable.

-Chad

Post Edited (Chad George) : 9/12/2007 2:38:46 PM GMT

Ken Peterson · 2007-09-12 12:50

Not sure I understand the use of the diodes and resistors. Seems to me that if you have anything other than 00 on address lines, the Y0 input goes high which pulls both address lines high through the diodes, which might contend with the Req pins on the Propeller since they are set as outputs. It might even start to oscillate. ...or am I missing something?

Ken

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

Chad George · 2007-09-12 13:06

Yep your right, I just saw that too.
That's not what I intended to happen. I'll fix it... somehow.

Chad George · 2007-09-12 14:42

If there are a lot of bus masters and the bus will be fairly busy then it may be best to add a third control line that can detect when the bus isn't being used by anyone. Otherwise I think it can be discarded and each master releases the REQ after a short period if it doesn't get control of the bus right away. A small random wait before trying to get the bus again should be sufficient to prevent a system lockup condition.

I think strategies like this are used in other multi-master protocols.

Chad George · 2007-09-12 14:51

Another idea is that if not all the possible slave select addresses are being used, then something like address 000
could be used to detect when nobody else has control of the bus. Assuming the select lines are pulled to ground and masters tri-state their
outputs when they don't have the bus anymore.

Not 100% foolproof so it couldn't be used for the mutex itself, but might be good enough to prevent most bus collisions and give an external
condition to wait for when the bus is really needed (ie waitpeq) while not using up additional control pins.

Paul Baker · 2007-09-14 16:45

Hi Wulffy, what you want to do should be possible though you will need to modify how SPI works to be able to include arbitration. At minimum you will need to add a feedback line for each master, this line will indicate to the master whether or not they have access to the requested resource. SPI traditionally uses a Chip Select method in order to pick a slave. If you use this method, each master will need two additional lines to enumerate the 3 slaves. This means 3 control lines in addition to the 3 SPI or 18 lines for the 3 masters and 9 lines for the 3 slaves (Chip select is not needed, each slave has it's own channel and is therefore always selected), this means 27 pins are needed for this method. An alternate method would be to use a 3 bit preamble (1 start bit + 2 address bits) for the master to indicate which slave it wishes to address, this method would reduce the pin count for each master to 4 (SCLK, MISO, MOSI, Feedback) for a total of 21 pins for the method.

The arbiter can be written as a master centric or slave centric. The easiest to implement would be a master centric blocking arbiter, where a cog is dedicated to each master to handle communications to a requested slave. The lock bits would serve as the arbitration method where a lock bit is assigned to each slave, if the bit is clear the slave is free for communication, if the bit is set the requested slave is busy with another master. A cog upon receipt of a request from its assigned master will attempt to lock the requested slave, if it is already locked the cog will continue to make the lock request until the slave is in it's possesion. Once the slave is possessed (either through the first request or a waiting process) the cog communicates that the master now owns the slave via the feedback line. The master now proceeds with it's normal SPI communication and the cog transmits the SCLK and MOSI lines from the master to the slave and transmits the MISO line from the slave to the master. When the SPI transmission is complete the cog unlocks the lock bit to free the slave for use with other masters.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Mike Green · 2007-09-18 14:45

Paul,
Your suggestion will work for most devices, but many SPI devices use CS to initialize the data transfer so the device can't remain selected all the time. This only increases the number of lines for the slave devices since the arbiter can select the device whenever the slave is assigned to a master and deselect it when it's released. This would increase the number of I/O pins needed to 30 which is awkward since provisions would have to be made for the multiple use of I/O pins 28-31, better to use a preamble for device selection for a total I/O pin count of 24.

My personal preference for a system like this would, in the absence of a FPGA, be to use discrete data selectors for the SCLK, MOSI, and MISO lines and let the Propeller do the arbitration and control much as Paul suggested. My reason is that, once a master and slave are connected, the actual data transfer can proceed at whatever speed the two devices can handle while the Propeller, while fast, would have to copy the data between the master and the slave. It would take several hundred nanoseconds to do each bit ... quit a lot of time.

You'd need a 4-input selector (74HC253) for SCLK and MOSI for each slave and a 4-input selector for MISO for each master. For 3 slaves and 3 masters, that would take 3*2 + 3*2 = 12 pins plus 3*4 = 12 pins for the request/acknowledge/select for each master plus 3 pins for the CS lines for a total of 27 I/O pins. This would require half of a dual 4-input selector for each master and both halves for each slave for a total of 5 cheap external devices usually in 16 pin packages. Expanding beyond 3 slaves would require a different scheme because of the limited number of I/O pins available.

You might also consider using the SX-48 given that it would only have to arbitrate the SPI access rather than do the transfers as well. It has 36 I/O pins and that would allow you to increase the number of slaves to 5 (5*2 + 3*2 + 3*5 + 5 = 36).

Post Edited (Mike Green) : 9/18/2007 5:17:07 PM GMT

Paul Baker · 2007-09-18 17:10

Ah good idea Mike, hadn't thought about using discrete muxs, it would not only drastically reduce the pin count down to just·the control signals (arbitration and chip select) but it would reduce the programming complexity as well not having to shuttle the data through the propeller.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Using the Propeller as a Multi-Master SPI Bus Arbitrator

Comments