Four serial ports instead of four copies of single serial object
kwinn
Posts: 8,697
I am surprised at the number of times I see people using multiple copies of FullDuplexSerial instead of one of the versions of the FullDuplexSerialFourPort object. Not only does this require a cog for each copy, it is at least as complicated if not more so than using the four port object.
There is almost no difference as far as using one of the four port objects when compared to the single port object. The only additional requirement for sending or receiving data is to specify which port should be used. They all have the same basic functions for sending and receiving data, and the four port objects have several additional useful functions.
Here is a tidied up and updated comparison from when I first needed more than one serial port.
There is almost no difference as far as using one of the four port objects when compared to the single port object. The only additional requirement for sending or receiving data is to specify which port should be used. They all have the same basic functions for sending and receiving data, and the four port objects have several additional useful functions.
Here is a tidied up and updated comparison from when I first needed more than one serial port.
Comments
I frequently use the four port serial object but it does have a downside of limiting the max baud. I've had trouble when using two ports at 115,200 with the four port serial driver.
I completely agree, knowing how to use a four port driver is very useful.
Hey Duane,
Have you established a maximum baud at which comm's on all four ports can be rock solid?
I haven't. I'm pretty sure the max speed will depend on which version of the driver one uses. ksldt recently posted a faster 4-port driver. It would be interesting to see the max rates for the driver.
Mickster, it is a hard question to answer because of the way the 4-port driver sequentially bounces from port to port. The longest time spent servicing one port is about 50 clock cycles, but that happens only when the hub sync is at its worst when it hits the point of writing a received byte to the hub or fetching a byte from the hub for transmission. The more common path length is 30 or 35 clock cycles. Of course, "rock solid" depends on worst case.
If only one port is enabled for receive and transmit and without flow control, and clkfreq=80MHz, then the directions are sampled at an interval of about 650ns, but that jumps to about 1250ns if they both happen to hit the worst case sync of hub access at the same instant. Compare that to a baud rate of 115200, where one bit time is 8681ns.
(Rough estimates alert!)
If two ports are running, the worst case is more like 2500ns, and that is uncomfortably close to the 8681ns bit time at 115200. I would rather have at least three sample times within a bit time, better more. It is low probability that all of the tasks, 2 * rx and 2 * tx are going to hit the worst case at the same time, but in a streaming situation, it is bound to happen.
With 4 ports active, same conditions apply. Estimate 5000ns sampling interval worst case. Even more variability as asynchronous bytes come and go. But say, allow 5 samples per bit on the fastest serial port, so 25000ns. That corresponds to a bit rate of 40kbaud. So, call 38400 and under "solid" and 9600 "rock solid".
Once, to interface to an old modem at 1200bps. As for some of the speed concerns noted in the posts I have to admit the fastest I have had to use all four ports for is 38.4Kbps, and that has worked without problems. Building heating and lighting automation as well as SCADA applications rarely need high speeds. Reliable communications is the primary consideration.
The biggest problem is with the variability of the receiver's sampling. The sampling makes use of a five instruction loop that determines whether it's time to sample. In that loop, if it's not yet time, then the other co-routines are given a slice. This means that if the loop checks the time at one clock before the sampling time, then it can incur the worst case delay of four transmitters and three receivers worst case paths. In my versions of the drivers, those paths are 51 clocks, worst case. So we're looking at 357 clocks of possible delay before the sampling occurs. If the sampling was going to be at the middle of the bit time, then being 357 clocks late says that we'd better have bit times that are at most 2 x 357 or 714 clocks per bit. At 80MHz, that's 112044 baud.
Of course, we can't sample at the last possible moment, so my version of the driver pulls back the sampling window from mid-bit to 22% into the bit's time. At 115200 baud, 22% is 152 clocks. This is the earliest possible sample point. The latest possible sample point is 152 + 357 or 509 clocks, which is about 73% into a 115200 baud bit's time.
And that's the real problem - the sampling point can be anywhere between 22% and 73% of the bit's time. One must add some other latencies to this number, most of which are related to variability in having detected the start bit's falling edge. Those are a constant that must be added to both the 22 and 73. As it turns out, those can be quite long, too. In fact, as high as the same 357 clocks. Fortunately, the worst case cannot happen in successive passes through the other threads, it only occurs once every 10 bit times and so if it happens during start bit detect, it cannot happen during sampling. But the effect does push the sampling out significantly.
Ideally, a UART takes three successive samples right at the 50% bit time and votes about whether the bit was a 0 or a 1. You can see that the soft UART is about as far from ideal as one could imagine. In a noisy environment it's just not going to work at baud rates anywhere near 100K. Even 57600 is questionable.
Worse yet, this same variability discussed above in the receiver is present in the transmitted data. While there's no cumulative drift in the timing of successive bits, there can be the same 357 clock variability in driving a bit onto the wire. So a propeller using the quad port driver talking to another propeller running the same quad port driver has a worst case where the transmitter is 357 clocks late and the receiver is sampling at only 22% of the bit time. In that case, the theoretical maximum baud rate is 357/22% or 1623 clocks per bit with zero margin. That amounts to 49299 baud. Clearly one needs more than zero margin - I'd guess that 38400 would work reliably if your electrical environment is good.
That said, it's a fairly low quality approach to a UART. I'd strongly suggest adding CRC to your transmitted data and protocol support for retries in the face of data errors. In my applications, I'm increasingly using SPI-interfaced UARTs instead of the software UART and if you want even pedestrian performance and/or want reasonable reliability, I'd suggest you do, too.
Thanks for posting that information, it's good to have this kind of information on the limits of the driver. It sounds like the reliability of my projects have resulted as much from luck as from good planning and design.
Agreed. The overhead of the transmitter and receiver when not in use is about 1/2 that of an in-use port; 12 clocks for an idle receiver and 35 clocks for an idle transmitter - so that's 47 clocks vs 102.
The worst case path for single channel operation actually is much different from that when everything is running. It's the path from receiving the last bit of a byte through buffering the character and getting back to the start bit detector. I believe you'll find that path is a bigger problem than the per-port overhead of idle ports. I can check it this evening. It's that path length that caused me to not actually receive the stop bit, but use the stop bit's time to buffer the character. I believe that change make my four port version faster than the old single port. That said, I have a new single port that's not-quite-ready for prime time that I've tested at 1.5MBaud and seems ok. I'm actively using it at 460800 with billions of bytes served and no problems found.
Again, the analysis of of all this is enough to choke a horse and another strong contributor to using external UARTs unless one plans to figure out how to do exhaustive testing. The best plan I've put together for that is to do 100s of hours at a higher baud rate with no problems found and then back off the baud rate for production. Yuck.
The gold standard full duplex serial does wait for the stop bit, but it does not test its validity. It allows only 1/2 bit time for the long path housekeeping, but it will enter a stream of nulls into the receive buffer in response to a Break or in the event of a shorted/floating receive line. That defect has caused people grief when prop-plugs are left disconnected. Some forumistas have turned that into a feature, to detect Break.
At least your version waits for the stop condition after it does the housekeeping in the 1.5 bit times, before it continues to look for the next start bit. With a hardware uarts, you would have a framing flag to check in hardware or in a status register. Some uarts even provide a Break flag.
I guess I've always figured that if you have a noise problem, the place to fix it is at the source of the noise, not downstream. OTOH, if your comms link is RF, all bets are off, and you do what you have to do in software to minimize the effects of the inevitable noise.
-Phil
You're right that voting doesn't eliminate certain kinds of problems - but it is a great statistical relief against many kinds of problems. In that regard, it's similar to parity or ECC. They are all statistical approaches to resolve problems at a very low level but none are sufficient to provide truly reliable communications. Eliminating noise is, similarly, a statistical approach, but one cannot eliminate noise completely in most real world environments. Lightning strikes in airplanes, alternator noise in cars, static discharge in industrial ... or simply pulling the plug out for a few seconds.
When it comes to the soft UART, my point remains that the timing of sampling is pretty Smile poor. If one were to read any article on UARTs and then look at how the quad port driver works, they'd be pretty alarmed. And doing exhaustive analysis in an attempt to determine maximum baud rates when two instances are talking to one another would likely be shocking. It works ok at pedestrian data rates, I'm not sure there's much more that can be said.
For what it's worth, I don't understand the benefit of sampling at 3x the baud rate. The common thing to do is sample three times quickly, as close to the center of the bit time as practical. That often means the receiver runs at 16x the baud clock and samples in clocks 7, 8 & 9 (43%, 53% & 59%). I can't see any value in sampling at 16% and 84% of a bit time at all if one is also sampling at 50%?
The point I was trying to make above, though, was that in a noise-benign situation, sampling a bit once is good enough. And if that allows you to cram more soft UARTs into a single cog and maintain speed, then it has an advantage in certain situations. In short, not all software UARTs have to be perfect or even nearly so, as long as they're good enough for the application at hand.
As to the 3X-baudrate sampling, that's the minimum sampling rate required to guarantee that every sample lands inside a bit window. And it only applies when sampling for the start bit. Once the start bit is detected, your next sample occurs four sample times later, then three thereafter. If you want to be anal about it, you can resample the start bit one sample time later, then it's a three-sample-time interval for all bits thereafter. But in a noise-free environment, it's not necessary.
BTW, for transmitting, I always use 1.5 or 2 stop bits. That makes it easier on the receiver if it ever gets out of sync or starts up in the middle of a transmission.
-Phil
Yes and no.
Wider spread of the samples also places higher demands on jitter, but you are right they should be wider than noise.
I think that is why the industry seems to settle on 3 samples at BaudT/16 - the idea is they are spread enough to avoid a single noise impulse, but not so spread as to bump up clock precision / jitter demands.
That effect would be something of a killer on a slave that did AutoBaud.
Is there merit instead of doing a two-cog Multi-Channel design ? - where one COG manages N x Tx and the other manages N x Rx ?
That should give much better Transmit jitter performance, as well as improve the overall margins.
Of course it uses 2 COGS, but that is still 2 less than a 4 copy solution, and the 1 COG design remains for those cases that can operate at modest baud rates.
And, I still don't follow your 3x sampling point. All the posted serial drivers sample only once. There's no over sampling or voting. The sampling timing is completely based on a computation, based on the cnt register, of when the next bit time would be had the start bit been detected with low latency. However, as I explained above, start bit detection is all over the map, which means that the data bit sampling window is also all over the map - but further delayed by the variable latency of the other threads.
The timing debacle is illustrated in this document, which I'd posted previously.
They are in most UART implementations for a good reason.
I've been working on a one port, dual core implementation that has maximum throughput, but ti's actually depressingly slow - around 1.5MBit. I'm looking at using the counters to shift data in and out, but that doesn't appear to help. The worst case paths are all about buffering data in the receiver underlapped with the stop bits. The buffering code is 39 clocks, worst case and 39/80000000 is only 2.051MBaud. Getting back into the start bit detector adds several more instructions, it all comes down to about 53 clocks and I see no way to reduce that latency.
One merit I can see with split Tx/Rx, is the TX jitter issue goes away (completely?), and it then works better with AutoBaud slaves.
dual core ? - Do you mean 2 cogs here, or is this on one Cog ?
http://www.microchip.com/wwwproducts/Devices.aspx?product=MCP2561FD
Microchip spec ±14 kV ESD and -58V to +58V Pin-survival
They spec these from 14.4kbps to 5Mbps (the lower limit is a bus-lock timeout monostable)
CAN transceivers can be used for half-duplex links, and previously the older ~1MBit ceiling meant RS485 was needed for above 1M speeds.
CAN devices have a offset threshold, so need less power in a passive state than RS485, and they need less pins to connect.
Jitter isn't a problem, worst case latency is the problem. You can't get to fixed latency with anything other than a single channel, half duplex implementation - either transmit or receive. You suggestion was about N Tx in one core - and that will continue to have variable latency.
Options for 1.5 or 2 stop bits are a carry over from ages old implementations that could not move data from the receive shift register into the processor without interrupt latency. They have absolutely nothing to do with anything other than managing receiver latency in unbuffered implementations. Modern UARTs have FIFOs to cover latency - and deep FIFOs. Even a single byte holding register past the shift register give 10 bits of relief on over run. I've not seen a UART without a holding register in perhaps 20 years.
Nope. You are missing the case of continual Tx/Rx, with skewed Baud clocks.
This is quite easy to test for, just send 300~500 byte blocks, duplex, between units, and see what happens.
With a modest 1% Baud skew, (Rx faster than Tx) that can give an extra 3-5 whole bytes of unwanted elasticity.
Adding that 0.5 or 1 extra stop bit, removes that effect entirely. interrupt latency does not come into this at all.
We had a system with a ceramic resonator fail due to this effect. Of course, it was well within BIT_Level specs, but there can be more to system design, than bit-level specs.
Solution was either a crystal, or added stop bits.
Latency in Byte send, does not affect AutoBAUD, it is the jitter on Tx that I was seeking to eliminate.
As you can see, adding just a second stop bit, cuts the resync time to about a third of what it would be with one stop bit.
But this simulation really wasn't necessary, if you think about it. Obviously, 10 stop bits would resync in one frame time. So how about 9 or 8 or 7 or 6? It should be clear that if it takes n>1 frames on average to resync with 1 stop bit, and 1 frame with 10 stop bits, the average number of frames should start to decrease somewhere between 1 and 10. It doesn't just go from n to 1 all at once.
-Phil
-Phil
Ok, now I see what you're talking about when you say sampling. But I believe you're conflating two separate issues. One is the manner in which start bits are detected, which need not be sample-oriented at all. The other is the timing at which a bit within the data portion of the frame is determined to be a zero or a one. These two are completely distinct. And, we're using the word "sampling" in two very different ways. You're using "sample" where I'd say "test", I'm using "sample" where I really mean statistical sampling - several samples taken and used collectively to predict the correct value.
For start bit detect in a hardware design, you can just detect the falling edge. My one port receive only implementation uses a waitpne for this, so you might say the sampling is at 80MHz.
The sampling-and-vote technique to which I was referring has to do with the determination of whether a bit is a zero or one within the data portion of the frame. If you have reasonable confidence in your start bit detection (and, again, the propeller full duplex drivers are horrid here) and reasonable ability to determine the mid point of the bit (and, again, the propeller full duplex drivers are horrid here) then a single sample is good and a 2/3 vote is slightly better. I don't see any benefit in evaluating samples that are well displaced from the ideal center of the bit cell.
The example you give has horrific start bit detection, off by as much as 33% of the bit time. But still, you're not doing a sample-and-vote of the data bits, it simply delays a proscribed time and tests the bit.
Similarly, I see what you're talking about with respect to stop bits having effect on synchronizing a receiver to a non-idle data stream. This isn't what I'd thought you were talking about at all. And yes, more stop bits will help, but they have a pretty large impact on overall throughput even going to 1.5 stop bits adds 5% overhead. For serial data streams where you can't control the synchronization otherwise, it is a fine approach. Framing error detection also goes a long way here.
As it turns out, the performance of the transmitter in these full duplex drivers is really bad measured from stop bit to start bit as it takes many clocks and two passes through the "other" code to determine whether there's another byte to transmit. This has the effect of prolonging the stop bit even when only one stop bit is desired. At maximum baud rate, that delay is about 2.5 bits, so you get 3.5 stop bits whether you want them or not. Of course at slow baud rates, the effect is much smaller, but still, the stop bit is prolonged.
The nice thing about industry standard terms is that the reader can determine their meaning without the writing having to explain.
Regarding autobaud, you're conflating many things. First, any autobaud algorithm has many limitations. None has infinite resolution. Most of the implementations of which I'm aware do a good job of disambiguating industry standard rates. None synchronize to random rates. In the end, the algorithm has some tolerance for bit time detection and the question is whether the transmitter can meet this tolerance. As there are no standards for this, the ability of the algorithm when paired with a given transmitter running at a selected baud rate to accurately determine that baud rate isn't assured.
And, back to the question of jitter or latency, we're talking about the transmitter where there is no jitter, only latency. The loop that shifts bits has some variable, per-bit latency that is not avoidable. That variable latency is made much, much worse when it's an N port driver as the loop must yield time to the other cooperatively scheduled threads and those threads' path lengths are not constant. Each of these cause non-cumulative latency in generation of a given bit. To the receiver, that variable latency is seen as jitter, but it is not the only source of jitter at the receiver.
Yes. Obviously. But we're not talking about hardware design here, or even a receive-only situation. When you're mixing transmit and receive in a software UART -- or four transmits and four receives (what this thread is about, after all) -- you don't have the luxury of waiting for anything, like an edge. You have to sample at a fixed rate, round-robin fashion, and move on.
-Phil
But for full -as in simultaneous- duplex operation, or multiple transmitters/receivers in a single cog (which was the point of this thread) WAITs are not permitted. So Phil's start bit sampling approach is exactly the right answer, and ought not be dismissed so cursively. Horrid ? Indeed!
Cheers,
Peter (pjv)