Full Duplex Serial: Is it really full duplex?

DavidZemon · 2015-08-26 18:39

Honest full duplex on the Propeller would require no less than two cogs, and likely three. Without compromise, one cog by itself is needed with no other function than receiving and storing received data in the HUB. A second cog is then needed to transmit data without interrupting the receive routine. That is the bare minimum. Then, there's a good chance you want your transmit cog running by itself, reading from a shared queue being fed by a third cog - that third cog being your application. This would best mimic a full duplex hardware UART.

My question is simple: to what level of "full duplex" is the Full Duplex Serial object? Is it actually half-duplex running on two pins, such that it can only transmit or receive at any one point in time? Does it quickly flip flop between checking the transmit queue and receive line to see what needs to happen next? Does it start two dedicated cogs for transmit and receive?

Heater. · 2015-08-26 19:00

The good old original Full Duplex Serial object is indeed full duplex. It can be clocking out a character at the same time it is clocking another one in. FDS has a cunning "coroutine" mechanism, using JMPRET, in it's code to make this possible.

This is only good for up to 115200 baud as far as I can tell.

In the prop-gcc sources you will find my example of FDS written in C. It is also limited to 115200 baud the last time I tested it.

I believe since that time others have created a Full Duplex Serial object that is faster. In only one COG.

What you are missing is that there is enough tolerance on the detection and generation of edges that one COG can do it all, no matter if it is busy sending an edge when a new edge comes in or vice versa.

Whether this is "honest" or not I don't know. It may well be stretching the tolerances specified in the standards but if it works it works

DavidZemon · 2015-08-26 19:28

Oh that's very interesting. A great solution for the Propeller. And exactly what I needed to know to make my design decisions for PropWare's buffered UART.

Heater. · 2015-08-26 19:42

Speaking of buffered UART. I recall that the original FDS had 16 byte Tx and Rx FIFOs. Others have since enlarged on this. Sorry I have no links to hand.

DavidZemon · 2015-08-26 19:48

That will be easily taken care of in PropWare. The buffer and size of the buffer will just be parameters to the constructor and default values will be available for simple use cases. Easy configuration.

Heater. · 2015-08-26 20:23

A buffer size that is a power of two will always yield the fastest code. The inpu/output FIFOs of FDS are pretty cunning. Program can read or write a FIFO whilst the FDS COG can do the same. No locks required. Magic!

Seairth · 2015-08-26 20:43

Heater. wrote: »

Magic!

If FDS isn't the perfect example of Clark's Third Law, I don't know what is!

Duane Degn · 2015-08-26 21:05

Heater. wrote: »

A buffer size that is a power of two will always yield the fastest code.

I don't think that's the case. Tracy Allen used cmpsub which takes care of moving pointers back to zero just as fast as an and.

I'm pretty sure buffers can be any size without a performance cost.

DavidZemon · 2015-08-26 21:07

Heater. wrote: »

The inpu/output FIFOs of FDS are pretty cunning. Program can read or write a FIFO whilst the FDS COG can do the same. No locks required. Magic!

Oh now you are talking magic. Hmmm.... I wonder if I'm going to have to actually poke through FDS to figure out how that works. I wrote a Queue object in PropWare for easy handling of this stuff but it most definitely requires a lock. And there's no way to use it inbetween reading bytes from a UART without missing some data, so I was going to have to do something else for the receiver anyway.

Andrew E Mileski · 2015-08-26 21:11

For what it is worth...

A received async serial bit is supposed to be sampled at the middle of the bit period. The idea being to stay away from the noisy edges where transitions occur.

There really is no set-in-stone amount of how far away from the edges to stay, other than slew rate that is specified in the specs, and even this tends to be commonly violated at "high" bit rates like 115200 over "long" cable runs of several metres.

Every start bit, the receiver re-syncs to the transmitter, which prevents the timing error from accumulating beyond a single frame.

Example: If the safe area to sample is 50% of the bit period (± 25% from center) and we have 10 bits in a frame, the max sample-time error per bit must not exceed ± 2.5%.

Start: ± 2.5%, Error = ± 2.5%
D0: ± 2.5%, Error= ± 5%
D1: ± 2.5%, Error= ± 7.5%
D2: ± 2.5%, Error= ± 10%
D3: ± 2.5%, Error= ± 12.5%
D4: ± 2.5%, Error= ± 15%
D5: ± 2.5%, Error= ± 17.5%
D6: ± 2.5%, Error= ± 20%
D7: ± 2.5%, Error= ± 22.5%
Stop: ± 2.5%, Error = ± 25%

The stop bit is sampled to detect framing errors (and a break signal), but sampling the stop bit can be optional. The next start bit must not be missed.

At 115200 bps, ± 2.5% is ± 2880 bps, or ± 27,777 cycles at 80 MHz. Whatever cycles are leftover after receiver sampling, allows for processing time to handle transmitting.

Hence this scheme tends to fall apart at higher bit rates, when there's not enough time to do both receive and transmit within an acceptable margin of sync error!

Sometimes one can tolerate a lot more error on transmit than receive, because real UARTs (like in your PC) do oversampling (typically 16x) as well as sample weighting and / or majority-rules, so they can operate a lot closer to the noisy bit period edges, and hence can tolerate more sync error.

Peter Jakacki · 2015-08-26 21:15

FDS has a lot of jitter which doesn't cause too much of a problem if it's not operated too fast or talks to a proper UART which is normally the FT232 style serial interface to USB. If you want to talk to another FDS especially at higher speeds (even though 115,200 isn't particulary fast) and/or over longer links then the errors may make it unreliable.

The decision I made in Tachyon is to dedicate a cog just to receive and also to make sure that it didn't leave all its buffering until the stop bit as some transmissions can be back-to-back with only a single stop bit which doesn't leave a lot of time at 3M baud, so therefore the checks and buffering are interleaved with receiving data bits.

The other decision was that transmit buffering (and unbuffering) at higher speeds can take a lot of time when compared to just sending it and also that since it's possible for the application cog to "just send it" then that's what it should do. In a way the application cog is a dedicated transmitter and so the timing is very precise although the period between characters is up to the application. Note that Spin is unable to transmit from the Spin cog at higher speeds although it is possible at lower speeds but that doesn't stop us from having this capability "built-in". The bit-bash opcode (emit) in Tachyon is only 10 instructions long and I prefer higher speeds because at 2M baud for instance it only spends 5us sending a character and the whole opcode fetch/execute takes 6.6us. Sadly I mostly set my binaries at 115,200 for users due to the mix of SLOW terminal emulators that are used. Even TeraTerm goes up to 921,600 at least although I use minicom on Linux and that has no problem with high speeds.

I also have serial objects which are "auto-duplex" which once they detect a start bit they are locked into receiving a character and when they are transmitting a character they are locked into that. The reason for this is simply because most serial communications is effectively "half-duplex" in that we receive a command for instance and we send back a response. So once it is receiving a character it normally sits there undisturbed by the application to receive the whole packet which is then processed by the application and then a response is sent. That's being pragmatic.

BTW, Clark's third law really means that advanced technology, just like magic, disappears in a puff of smoke!

Heater. · 2015-08-26 21:19

@Duane Degn,

I do believe you are right. I was forgetting the extra magic power that comes with the Propeller. No so easy on lesser machines.

@SwimDude0614

Provided your queue only has one writer and one reader, like for example a program using FDS, then you don't need any locks. I highly recommend spending some time to study FDS. It's magic but it is understandable. Even when you understand it it's magic, maybe more so.

ksltd · 2015-08-26 22:52

Think of it this way. You get 20 million instructions through a core per second. There are four receiver and four transmitter threads, so you're allowed 2.5 million instructions per thread. If you're running at 115200 baud, that give you about 21 instructions per bit. It turns out that's doable.

There's a thread here some detailed analysis of baud rate limitations of the 4x serial implementation.

The implementation included in that thread works better than most others. However, sampling windows do move around substantially and this limits performance. I do not believe it can be further improved. The worst situations are when the module is talking to itself as there are problems in both receive and transmit. If either end of the link is a real UART, the problems are substantially reduced.

Be aware that the comments in the OBEX implementation(s) are poor, misleading and in some cases wrong. The versions in the thread (above) have substantially better comments and better performance.

Some random notes to clarify confusion herein:
There is no magic
They really are full duplex
There is no need for any locking with single producer/single consumer in any language where you get atomicity of load/store on buffer-offset-sized things.
The buffer sizes are arbitrary with no performance hit
The buffers are passed to the object, not declared within.
Cable length is a function of driver/receiver hardware, not software
These implementations do not sample the stop bit and do not detect framing errors.
The entire analysis, above, regarding sampling error is interesting but not applicable. What's key is that per-character accuracy is kept to within about 50% of a bit time and that there is no cumulative drift.
The implementation supports higher baud rates at the cost of decreased transmit throughput.
I know of no hardware UART that "over samples at 16x"; most sample 3 times and vote.

Good luck!

jmg · 2015-08-27 00:45

ksltd wrote: »

I know of no hardware UART that "over samples at 16x"; most sample 3 times and vote.

Usually UARTs sample at 16x the baud rate but vote on samples number 7,8,9 to give a form of single impulse rejection.
They should also check the start bit is still low after 1/2 bit time, as another noise filter step, and start sampling for the next start edge, as soon as they confirm Stop bit samples (eg 9/16th of Stop Bit)

ksltd · 2015-08-27 18:04

Oversampling is a term specific to signal processing. That's not what UARTs do.

In order for any UART to operate, there must be some mechanism for the receiver to synchronize to the clock domain of the transmitter. In a pure digital design, this requires a sampling rate that is at least the number of bits-per-symbol faster than the bit rate. This is necessary so that the error in sampling doesn't accumulate and miss-sample the last bit(s) of the symbol.

16x is an easy number in digital systems because it's a power of 2.

That said, the 2-of-3 voting and the additional qualification of start and stop bits is not required of a UART; these are not things that "should" be done. Those techniques have a long and sordid history, but they have no effect on interoperability in the absence of noise and clock drift. In an era when system clocks were derived from RC circuits, they were a statistical hedge against errors. In today's world of laser-cut crystal oscillators that are accurate to a few PPM, they're just a curious oddity.

In the presence of competent block level error checking, they have always been completely superfluous.

In any case, the soft UART implementations for the Propeller do none of these. Neither do they support parity, break detect nor framing error detection.

As mentioned, above, there's a fairly comprehensive analysis of the timing of the soft implementations in the other thread. From the perspective of the 16x world, the sampling of the first data bit occurs around clock 4. Sampling of the last data bit occurs as late as clock 14. This drift is unavoidable and is related to variation in the code paths of the other 7 threads.

Heater. · 2015-08-27 18:54

As far as I can tell there is no synchronization between clock domains of transmitter and receiver in UART communications. That is why there is an "A" in there, for "asynchronous".

The best you can do in a UART is sample the signal fast enough to see the bits of a byte moving bye. Perhaps initiated by an edge on the signal, perhaps not. When you are done with an incoming character you are done. Start bit, data bits, stop bits, that is it. There is no multi character, "block level", synchronization going on in the UART world. Certainly "over sampling" is employed by many such UARTs in an effort to minimize errors.

What you are describing is "synchronous" communication. Where a clock has to synchronize with the incoming data bits and maintain that synchronization over many hundreds or thousands of bits. This is normally done with a phase locked loop syncing up on a packet sync sequence. In the synchronous world we see block level checksums and error correcting codes.

At the end of the day we don't care. The rude and crude software UARTs in the Prop work fine. As long as it works. If you need communications over long distances and noisy lines perhaps it's time to do something else.

Personally I don't worry about parity or framing error detection much. My error checking is done after I received a bunch of bytes that look like a message I'm interested in. If the checksum on that message does not work the message is rejected. The only thing a parity or framing error on bytes does is perhaps make it a bit quicker do detect the error and send a NAK or perhaps decide I don't respond at all.

DavidZemon · 2015-08-27 19:53

ksltd wrote: »

In any case, the soft UART implementations for the Propeller do none of these. Neither do they support parity, break detect nor framing error detection.

Careful with your claims there. You're talking about software, which is ever changing. I can say with 100% certainty that there is UART software which supports parity written for the Propeller. I wrote my own version and, as I was writing it, I remember running across other members here which had written their own versions.

Duane Degn · 2015-08-27 20:31

SwimDude0614 wrote: »

Careful with your claims there. You're talking about software, which is ever changing. I can say with 100% certainty that there is UART software which supports parity written for the Propeller. I wrote my own version and, as I was writing it, I remember running across other members here which had written their own versions.

Yes, I have a parity version of the four port object and I believe Tracy's version checks framing errors.

ksltd · 2015-08-27 20:59

Heated:
How about if you go study the wiki articles for both synchronous and asynchronous serial communication.

Therein you'll learn that start bit detection is exactly the receiver synchronizing to the clock domain of the transmitter. That's the entire function of the start bit - to allow the receiver to synchronize to the transmitter; it has no other purpose.

You'll also come to learn that synchronous serial communications has no start and stop bits.

Swim and Duane:
Mea Culpa. In my sentence, "the" should have been "these".

DavidZemon · 2015-08-27 21:14

@ksltd

Synchronous article, second paragraph, second sentence:

No start or stop bits are required.

Heater. · 2015-08-27 21:29

ksltd,

How about if I have been involved in developing systems that use asyn and sync communication for decades? Been involved in the design of hardware and software that deals with both?

You are sort of right to say that a start bit syncs some kind of timing in the receiver. Of course it does else we would never be able to receive anything over a single serial line. That is far removed from the idea of "synchronizing clock domains".

In a synchronous receiver a PLL will sync a clock to the incoming data. It will maintain that sync over hundreds or thousands of bits in a packet of data.

The whole async things falls down in the face of a very noisy line. When you cannot tell if you even have a start bit or just more noise. Then you need a lengthy sync sequence to to get the clock synchronized and recognize you have a start of a packet. Then you are going to need a bunch error detection/correction to be sure you did actually get a valid packet, not just more noise.

We can move on to discussing plesiochronous digital hierarchies if you like.

Phil Pilgrim (PhiPi) · 2015-08-27 21:42

The significant difference between synchronous and asynchronous comms is that, in the former the clock is transmitted with the data, either as a separate signal or embedded in the data stream itself (e.g. self-clocking Manchester encoding). You always have to have some means of synchronizing to the symbol edges, but that has nothing to do with the clocking that distinguishes sync from async. The symbol edges can be defined by start and stop bits in either domain, or by a "magic" sequence, such a the FLAG character used in SDLC mode, which contains six consecutive 1 bits. This sequence does not occur in the data stream, due to automatic zero-insertion.

One common synchronous mode that uses start and stop signaling is I2C.

Asynchronous is called that because a received character can come at any time, irrespective of any assumed clock edges. In older TTY equipment, mechanical transmit "clocking" began when a key was struck, which could be at a completely random time, not yoked to any particular clock.

-Phil

Heater. · 2015-08-27 21:54

Phil,

Asynchronous is called that because a received character can come at any time, irrespective of any assumed clock edges.

That is an interesting way to look at it. Because for sure a synchronous packet can arrive at any time as well.

The question then is about those assumed clock edges. Where do they come from? In the async case the first edge of the start bit provides the trigger and all bit timing follows that. For as long as it can.

In the sync case all edges contribute to the syncing of the PLL which is then used to clock in the data. As you say, the clock is in there with the data. Like Manchester encoding. In that case a single edge is not enough to indicate the start of data. It needs some kind of sync sequence.

DavidZemon · 2015-08-27 22:08

Phil Pilgrim (PhiPi) wrote: »

One common synchronous mode that uses start and stop signaling is I2C.

I thought the start/stop bits were there so that I2C could support multiple masters, not for any synchronous reasons.

ksltd · 2015-08-27 22:09

Swim -
We're in heated (pun unavoidable) agreement; we both have said that synchronous communication requires no start or stop bits.

Heated -

You appear to be confused, allow me to help.

Synchronous communications does not rely on PLLs. Read the article on Wiki. Then take a close look at protocols like HDLC, SDLC, STR, BiSync and SNA that were designed in the 60s and 70s and built upon synchronous communications links. These links rely upon fill symbols (SYN) that are transmitted when the link is otherwise idle. Because of their known bit pattern, the receiver can synchronize to the transmitter relatively quickly. Because the fill may be identical to the symbols that one wishes to transmit, a data link escape (DLE) symbol is also specified by the protocol; receipt of a DLE indicates that the following symbol(s) require special translation. So DLE followed by SYN translates into the SYN symbol and DLE followed by DLE translates into the DLE symbol, etc.

That to which you've referred are clock-recovery schemes, employing bit stuffing (CAN) or transcoding techniques like 10b 8b. These techniques do not require any synchronization between the transmitter and receiver; instead, the transmit clock is reconstructed by the receiver from the receive data. They could not be more different from synchronous or asynchronous serial communications as rather than struggle to establish and maintain synchronization they eliminate the need altogether. Clock recovery schemes are the mainstay of most modern high speed communication links including 1- and 10-gigabit ethernet, AMD's Hyper Transport, USB 3, PCI-Express, HDMI and many others.

lonesock · 2015-08-27 22:19

In FFDS1 I use a counter (NEG mode?) to monitor the input pin. I oversample the bit period by only a factor of 2, but when I get a start bit I can tell if I'm in the front half or back half of the bit by comparing the accumulated counter value to 1/2 the period. Based on that info I may delay my input sampling by 1/2 a bit period. This means I can't get uber-precise input sampling, but I can guarantee I am at least 1/4 of the bit-period away from the edge. I get robust data at bit rates < 0.5 Mbps full duplex in one cog, 80 MHz clock.

Jonathan

Heater. · 2015-08-27 22:35

ksltd,

You appear to be confused, allow me to help.

Please try not to be so condescending and rude. It does not help your argument.

Synchronous communications does not rely on PLLs.

Sort of true...

Then take a close look at protocols like HDLC,...

Yeah, yeah, did all that back in the 1980's.

These links rely upon fill symbols (SYN) that are transmitted when the link is otherwise idle. Because of their known bit pattern, the receiver can synchronize to the transmitter relatively quickly.

Hmm...does that sound like clock synchronization to you? The synchronization is done even before there is data. Sort of like a PLL maybe?

As far as I recall SDLC and such did indeed use bit stuffing (wikipedia will put you straight on that). You have to do that if you want to recover the clock from the data.

All that DLE character and other escape schemes belong to the world of byte by byte async communications.

Phil Pilgrim (PhiPi) · 2015-08-27 22:45

heater wrote:

All that DLE character and other escape schemes belong to the world of byte by byte async communications.

Quite right. I use DLE escaping all the time with RS232 when I'm transmitting data in binary (non-ASCII) form.

-Phil

Heater. · 2015-08-27 23:14

I'm starting to think this whole sync vs async boils down to something very simple.

They are both "asynchronous" with respect to your receivers idea of time. Data arrives when it arrives. Ready or not. No matter what the receiver thinks about time.

On the one hand you have incoming characters framed with a start bit and stop bit(s). You know when to start listening because you have a start bit. You can then clock in a bunch of bits at regular intervals from that time. It sometimes puzzles me why that does not fail. Surely if you start listening half way through a character in the middle of and endless stream of characters you could be confused into never latching on to a actual start bit? Luckily most protocols do not do that.

On the other hand you have incoming random bits. Or more specifically edges. You have to sync some clock to those edges in order to extract bits. You have to look for special sequences of bits like SDLC flags, "01111110", in order to see that something special is happening.

Escape sequences are of the byte oriented async world. They are still with us everywhere today. From the escape sequences sent to my terminal to the HTML "entities" like the "<" used to escape "<" on this forum. Thank you Bob Bemer, inventor of the escape sequence in 1956.

Edit: Hysterically funny, if it were not so sad. I'm talking about escape sequences and the frikken forum escapes them so that "& l t;" (without the spaces) looks the same as "<"

This whole, escape sequence thing is turning all our information into gibberish. Aided and abetted by the crazy unicode thing.

Phil Pilgrim (PhiPi) · 2015-08-27 23:22

In synchronous comms, the clock is not always encoded in the data. Sometimes it's sent over a separate channel. I2C and SPI are examples of this. The basic differentiation is that synchronous comms rely on a single bit-synchronous clock, whether it be sent from the transmitter or the receiver. In async comms, no clock is shared. Each device provides its own bit-synchonous clock, which does not necessarily match the frequency or phase of the other device's clock exactly.

-Phil

Heater. · 2015-08-27 23:33

Phil,

Ah yes, if the transmitter supplies the clock on another channel like I2C and SPI that is what I call synchronous.

Of course in all synchronous schemes the transmitter does supply the clock, perhaps in the in the same line as the data.

I now think I can summarise this as:

Async = Very sloppy timing requirements, happy if you can keep up with 8 bits at a time.

Sync = Much tighter timing requirements, try to stay in sync for hundreds or thousands of bits at a time. We will guarantee to give you edges often enough to make that possible.

Full Duplex Serial: Is it really full duplex?

Comments