Reduced Latency 1X and 4X Serial

ksltd · 2015-03-04 06:31

I did re-read the thread you mentioned, but I've never really understood the way jitter is described there. An asynchronous serial receiver ought to be substantially more tolerant of frequency mismatch and/or bit width variation than 0.7%. Especially since the basic design of all these drivers is that per-bit timing errors aren't cumulative. And the design also ensures that bits are never transmitted early, only (and often) late.

If we assume a 16x sampling approach for most UARTs, with a 3 clock sampling window in clocks 7-9, then a given bit's timing can be late by as much as 6/16ths of a bit time or 37.5%. Even if we toss another 2/16 into the mix for start bit detect latency and drift, we'd have tolerance for 4/16ths of a bit or 25%.

Do you know where the 0.7% number came from or why anyone would strive to do better than, say, 5%?

Tracy Allen · 2015-03-04 09:27

The first post in that thread laid out very clearly that the OP needed to interface the Prop to a scientific instrument that had the requirement for the 0.7% accuracy. He had no control over the design of that instrument. True, one has to wonder why that instrument was so intolerant, but there you have it. One of the many hangups that one has to resolve in the real world to get things done.

ksltd · 2015-03-04 09:42

Attached is a timing diagram and some analysis of the transmitter in the 4X driver. Bad news is that it shows that the best case baud rate that can be sustained in the face of worst case latency is 107K. That's about half of what I'd claimed previously, the result of realizing that 100% of the worst case latency has to be less than 50% of a bit time, not 100% as I'd thought.

The good news is that the worst case latency can be reduced enough to push this over the 115200 baud threshold. The bad news is that decreasing the latency will increase the inter-symbol overhead of the transmitter and that will decrease throughput - and the decrease is substantial at high baud rates.

Right now, the inter-symbol overhead is 3x worst case latency or 3x 333 ticks, which is some 12.5uSec at 80Mhz. That will increase to 5x ~280 or about 17.8uSec. At 115200 baud, the current implementation essentially sends 2.5 stop bits, the revised implementation would be sending 3 stop bits; both of those are worst case numbers.

All this begs the question of whether the thing should be optimized for higher throughput at, say, 57600 baud or lower throughput at 115200 baud. My own preference is to make the higher rates work as the reduced throughput at 115200 will still be greater than the increased throughput at 57600. If anyone cares, now's a good time to speak up as I won't be polishing these turds much longer.

Serial_4X Timing Analysis.PDF

ksltd · 2015-03-04 09:46

Tracy, our posts crossed on the wire. Ok, so that's some perverse requirement; got it.

But getting back to the question on the table - do you agree that we don't care about the variation between best case and worst case latency (jitter) and that what we do care about is minimizing worst case latency?

jmg · 2015-03-04 10:24

ksltd wrote: »

Right now, the inter-symbol overhead is 3x worst case latency or 3x 333 ticks, which is some 12.5uSec at 80Mhz. That will increase to 5x ~280 or about 17.8uSec. At 115200 baud, the current implementation essentially sends 2.5 stop bits, the revised implementation would be sending 3 stop bits; both of those are worst case numbers.

All this begs the question of whether the thing should be optimized for higher throughput at, say, 57600 baud or lower throughput at 115200 baud. My own preference is to make the higher rates work as the reduced throughput at 115200 will still be greater than the increased throughput at 57600. If anyone cares, now's a good time to speak up as I won't be polishing these turds much longer.

I would agree. Extra Tx stop bits are generally simply tolerated, (USB Bridges add them, usually above 500k~1MBd, depending on brand) but it is the receive side that needs to be able to RX reliably (with all phases of other channels actrivity) with one stop bit.

jmg · 2015-03-04 10:31

Tracy Allen wrote: »

The first post in that thread laid out very clearly that the OP needed to interface the Prop to a scientific instrument that had the requirement for the 0.7% accuracy. He had no control over the design of that instrument. True, one has to wonder why that instrument was so intolerant, but there you have it. One of the many hangups that one has to resolve in the real world to get things done.

If it is using AutoBAUD, and doing so on a single bit, then certainly it will have a low tolerance to Prop TX edge jitter.

By comparison HW UARTS universally have edges locked to within ns levels to the baud clock, and even new transmissions start on this clock boundary. In Crystal based UARTS there will be ppm levels of jitter, and in the RC Clocked and Sync'd XTAL-less USB Bridges,the jitter will be very low either side of a clock re-adjustment, usually USB frame related.
Right at an adjustment it may be 0.1%~0.2% of correction once locked.

Tracy Allen · 2015-03-04 12:07

I agree too, the worst case lies and dies on the long duration end of the latency scale. It matters most when you need to push the limits to high baud rate, and it rarely matters at all at the low baud end. It is important to divide the performance into areas of sure thing, getting fuzzy, really pushing it, and forget about it. We all prefer a sure thing.

You've allowed 1.5 bits for the longest path, the jitter will reduce that leeway by an amount that can be calculated. Is that what you've done? I haven't read your analysis in detail. I'm feeling fuzzy, not turdish.

Peter expressed an interest in the code for incorporation in Tachyon. I'd be interested to hear what he has to say about the tradeoffs.

FDS4port is less capable than yours (at low stop bit count at high baud rates), because it tests the stop bit and rejects framing errors. It has only 0.5 bit time vs 1.5 to execute the longest co-routine, minus the jitterbug delays contributed by the other enabled ports.

With regard to the stop bit, your pasm code waits for the ina level expected from the stop bit (high) and then returns to the top of the loop to await the next start bit. If for by some mistake or event the receive line becomes shorted to ground, the program will lock up on both the receive and transmit side. Unlikely, but I just bring that up to show how I think about these problems. I deal with pretty rough and tumble field instrumentation where cables may thread through hostile environments.

Reception at high sustained rate with exactly one stop bit is sometimes necessary. In that case I might opt for a dedicated half duplex receiver. For example, I'm recalling a field installation with Memsic Iris wireless nodes, which streamed back asynchronous packets of ~100 bytes at 115k6 with exactly one stop bit. FDS4port could receive those fine with clkfreq=80MHz and two other ports enabled. However, to achieve a system power consumption of less than 1mA while still receiving those packets, it was necessary to use clkfreq=5MHz. That required a half-duplex object.

ksltd · 2015-03-04 12:20

It won't lock in the case of break reception ... it will wait for the break to end. Other threads will continue to process just fine. Interesting to note that if the last data bit is a 1, then it will return well before the stop bit. It starts that check shortly after the sampling point of the last data bit.

I discarded the framing error check because it constrains maximum baud rate. The worst case path is from last bit received, through the buffering path and back to the start bit detect. It's critical to use the stop bit time to do the buffering or max baud rate suffers.

FYI, the half duplex receiver I'm cooking looks like it will do 1.7M - and it's super low power as it mostly sits in wait instructions ... a double bonus. At 57600, I'd think it will be waitcnt or waitpne or waitpeq 99% of the time even at 5MHz.

jmg · 2015-03-04 13:19

ksltd wrote: »

I discarded the framing error check because it constrains maximum baud rate. The worst case path is from last bit received, through the buffering path and back to the start bit detect. It's critical to use the stop bit time to do the buffering or max baud rate suffers.

It is acceptable to specify 2 stop bits for some baud rates, (and as I explained above, even preferable in packed-duplex cases with longish packets).

All the UARTS I use here, allow 2 stop bits, and also a parity bit option, should you need 3 stop bits.
Ideally a SW module should give a Max Baud, for each setting option.

Reduced Latency 1X and 4X Serial

Comments