Anybody aware of high accuracy (0.7 % or less) serial full duplex driver?

Stephen Moraco · 2011-02-20 22:17

Hi all,

I've pulled and measured nearly all serial objects I could find in OBEX and all of them have extreme variance in bit widths within each character transmitted. I have need for a higher accuracy full duplex driver and I wonder if anyone has attempted this. I'm currently using two cogs (tx, rx) to achieve this and I really want to drop this to one. I'm getting 0.4% variance in bit widths at the moment. This is somewhere near the practical limit so I've no need to get better accuracy but I do need to get into one cog. Eventually, I may have to tackle this but for now I'm deep in other work so while I have a solution in place and busy elsewhere I thought i'd submit this to see if anybody else has already tackled this problem.

Thanks for racking your brains on this subject ;-)

Regards,
Stephen, KZ0Q
--

Phil Pilgrim (PhiPi) · 2011-02-20 22:32

What kind of baud rates are we talking about here? And, just out of curiosity, why do you ned such high accuracy? Most serial receivers are very forgiving of variances in bit width. With only eight bits, plus start and stop bits, you don't need to stay in synchronization very long before the next start bit resynchronizes everything.

Oh, wait -- just noticed your sig. Is this for radio transmission?

-Phil

Stephen Moraco · 2011-02-20 23:18

Phil,

We're talking to a very unforgiving commercial device (well known one) and it's very finicky... baud rates are low... 115_200?, 57_600, and 19_200.
And by finicky i mean it refuses to respond unless we're in spec.

As long as we're speaking constraints...
- bits from start-bit through duration of stop-bit must be precise widths...
- between-character times do not matter so much. I do have timeliness constraints within messages of course (from first byte of response to last).

Given the baud rates this should be a possible task... but my prototype time was exhausted...
so before I start into this at a later time I just wanted to make sure I'm not following in someone's larger, stronger, more capable,
footsteps??? ;-)

Oh, and good call <grin> there are radios in the project but in this case I'm not talking to one.
In fact the serial back-end of the radios are forgiving enough to work with the full duplex serial object.
(which means their onboard uP buffers incoming data before commiting to RF...)

-Stephen

Phil Pilgrim (PhiPi) · 2011-02-21 01:25

I'm not sure where the variance you're seeing is coming from. At 115200 baud, a bit is 694.44 clocks at 80 MHz. Truncating this yields a 0.06% error. Adding a 100ppm crystal inaccuracy is still just another 0.01%. Have you tried FullDuplexSerial from the Propeller library? If so, what kind of error did you observe? What are you using to measure the bit periods?

-Phil

Stephen Moraco · 2011-02-21 01:38

Phil,

I'm currently a member of the large scale Logic Analyzer team at Agilent (for two more weeks until I retire). Anyway, surprise of surprises I'm using a logic analyzer sampling at 2.5 nS (way overkill).
I store only at transitions to get deep measurement depths... lots of edges stored in 2MB depth.

Yes I measured the code for FullDuplexSerial... The bits can be nearly 2x widths when tx and rx are both running... I see at least two error sources... the context switches occur without any checking for is there enough time left in the current bit period to switch and return and they don't seem to have predictable return intervals exacerbated by the 7-22 clock period readlong writelong instructions occurring while switched out. The net result is that bit periods are all over the map when both TX are RX are steadily busy.

This explanation make sense?

-Stephen, KZ0Q

Stephen Moraco · 2011-02-21 01:49

Phil,

Since there our four "lobes" of context switched code I didn't have time to sort out the maximum run length of each lobe. It may be that if we simply check before switching contexts that we have enough time to switch and return then end the current bit that most width variance disappears... The problem I see is that if we don't switch we are also likely extending the bit length in the other lobe we would have switched to.

In the end, this kind of duplex driver needs statistical switching (equal fixed length lobes) to properly treat both sides. I'm not sure that we can do that... but it's an interesting advanced assembly experiment, no?

-Stephen, KZ0Q

Leon · 2011-02-21 02:57

How do the timings compare with those for a conventional MCU hardware UART? Presumably, they have no significant variability

Rayman · 2011-02-21 04:16

Have you tried using 2 cogs, one for TX and one for RX?

Sounds like that might fix the problem you're seeing...

mynet43 · 2011-02-21 05:24

I just ran across this thread. I've had a very similar problem when talking to a Reach Technology serial touch screen display.

I've measured the screen serial output with a scope and it's right on. I did the same with the Propeller serial output and found the rates seem to vary a little. I'm not sure what's causing this, as I never looked into it.

I worked around it by lowering my baud rate from 115,200 to 57,600. I'm convinced there's a problem.

Please let me know if someone's working on a solution.

Jim

Kye · 2011-02-21 07:24

Have you tried mine yet?

I use the waitxxx instructions to transmit data. http://obex.parallax.com/objects/397/

It should be more precise than FDS but people keep saying it has problems that I've never expierenced. Give it a shot if you would like.

mynet43 · 2011-02-21 08:44

Hi Kye,

Thanks for the info. I'll give it a try. This may solve the problem.

It seems like we should also fix FDS

It's so widely used that it should be working right.

Jim

kuroneko · 2011-02-21 16:21

Kye wrote: »

It should be more precise than FDS but people keep saying it has problems that I've never expierenced.

Sorry, this driver is broken by design. I looked at the RX part yesterday to help one of those people saying it has problems (corrupted data). There is a timing condition (start bit falling edge vs start bit scanning) which means you sample incoming bits while they change, NG.

Ariba · 2011-02-21 17:35

It should be possible to use the video generator to transmit a byte, this would make a very precise bit timing with no jitters.
And you get more processing time for the receive task in the same cog.

Andy

localroger · 2011-02-21 17:37

None of the serial objects were written with this kind of timing accuracy in mind, because in most cases it just isn't necessary. If you are in the processing block for TX when the time interval for RX arrives or vice-versa, you will see errors of hundreds of clock cycles.

Unfortunately, you don't have control over the timing of RX edges, so solving this isn't straightforward in a single cog.

If it was that important I would devote a cog each to TX and RX, which would allow you to do pitch perfect timing for each. However, I would have to add that anything that needs that much accuracy on the other end of the wire is just plain badly designed. Serial is by its nature intended to absorb timing differences of over 5% without difficulty, and most people who design software UARTs take advantage of that.

You could probably even use the standard FDserial object this way by simply not using the TX in one cog and RX in another; without the other function getting in the way timing for the function you're using should be pretty close.

mynet43 · 2011-02-21 20:08

@ localroger,

I'll show my ignorance here

I thought the serial I/O was designed to sync with the middle of the bit half cycle, based on the baud-rate. The timing should be 'good-enough' then to never sample outside of this half cycle for any reasonable message length.

My observation with a scope is that the timing varies quite a bit, enough to sometimes lose sync at 115K baud.

Somehow I think we can do better. I think Kye tried to do this, by using the waitxxx routines, except that, from the above post, he apparently has a minor bug in his code.

Jim

Tracy Allen · 2011-02-21 22:24

This is interesting, and disturbing too. I use the 4 channel UART driver by Tim Moore (pcfullDuplexSerial4FC). It offers 4 send/receive UARTs in one Cog with flow control. On one hand, Tim claims that it runs faster than FDS, but on the other hand, it does have more "lobes" (interesting term) to handle, and I am thinking, uh-oh, maybe it is really pushing the envelope in the bit-jitter area.

I have a LeCroy 9374L 1GHz digital 'scope that does statistical analysis of repetitive signals. I fed it the uart outputs to try to duplicate Stephen's observations.

I started off with a loop sending either the byte $FF or the byte $00 repeatedly. That is,
--$FF=start bit space only, expected width 8.68µS at 115200 baud
--$00, start + 8 space bits, expected width 78.125 µs at 115200 baud.

For fullDuplexSerial v1.2 transmitting $ff, my 'scope finds, after at least 2000 samples:
--$ff: 8.74 µs mean, variation 8.40 to 9,6, sigma 0.298 13.8% max bit width variation.
--$00: 77.983 µs mean, variation 77.883 to 78.202, sigma 0.114, 0.4% max byte width variation.

The mean length was off from the correct value by 0.6% for the single bit, but with lots of jitter around the mean. The whole byte is much better, the mean within 0.4% of the expected value, and 0.4% jitter bracketing the mean (bracketing=max-min, not standard deviation).

Stephen, is that in the same ballpark as what you were seeing? I think my report is worse. I also looked at the byte $55 (alternating high and low), and saw jitter on all of the bits, but as the numbers suggest, the total byte length was more consistent than the individual bits. I haven't tried it with the uart receiving at the same time. Consistency of reception is another issue.

Kye, I ran yours with the same test.
bit=$ff: 8.51 µs mean, variation 8.43 to 8.59, sigma 0.056, 1.8% max bit width variation.
byte=$00: 77.710 µs mean, variation 77.596 to 77.806, sigma 0.06, 0.27% max byte width variation.
Better than FDS in that test!

Then pcFullDuplexSerial4FC. Harder because of all the options and potential interactions. I ended up surprised by counter-intuitive results. I was expecting it to be worse when more uarts were added. The pasm code is all in one cog after all. Nonetheless, it turned out that the timing is best when all 4 uarts are enabled, independent of whether they are transmitting or not.

1 uart enabled, 115200 baud, no flow control.
bit=$ff: 8.884 µs mean, variation 6.186 to 11.201, sigma 1.2631, 57.8% max bit width variation.
byte=$00: 77.880 µs mean, variation 77.158 to 80.005, sigma 0.958, 3.6% max byte width variation.
Ouch! It makes me think I am doing something wrong, so I rechecked it a couple of times.

I tested also with combination of 2 and 3 uarts, but I'll skip to 4 uarts:

4 uarts enabled, 115200 baud, no flow control
bit: 8.202 µs mean, variation 8.186 to 8.212, sigma 0.002, 0.29% max bit width variation.
byte: 78.603 µs mean, variation 78.58 to 78.609, sigma 0.010, 0.03% max byte width variation.

All 4 uarts operating at 115200 showed very similar statistical results. Lowering baud rate on one or two channels from 115200 to 9600 did not affect the timing on the 115200 ports. Nor did leaving a port quiescent, so long as it was enabled. The start bit was 5.5% too short, but that was made up for in other bits (viewing $55) to arrive at the good total byte duration without either as much bitwise or overall jitter. Receiving bytes on one port and funnelling through to another port did not affect the timing, although I did not give it much of a workout for interference while receiving. Nor did I look at possible effects of enabling flow control. I don't know what to expect there. Anyway, my current thinking when using pcFullDuplexSerial4FC is always to enable all 4 ports.

Serial ports I am dealing with are a lot more forgiving than Stephan's. But I have heard reports from someone I work with of trouble at 115200 and now I'll have more of a basis to look into it.

AntoineDoinel · 2011-02-21 22:45

Stephen, since you're already using two cogs, have you tried the code on the following page?

http://insonix.ch/propeller/prop_obj.html#Serial

there are separate transmitter and receiver on the beginning of the page.

Being separate and not multiplexed with the receiver, the sending loop in the above code seems strictly deterministic to me.
Only had a quick look at it, but maybe Tracy can do measurements on that too?

Regards
Alessandro

P.S. just noticed the code that I'm suggesting was made by Ariba... hmmm if he didn't suggest you the same he sure knows better. Well if not for the serial tx/rx, that page is still full of goodies :-D

Ariba · 2011-02-21 23:27

I did not suggest my drivers because I think Stephen does something similar already:

I'm currently using two cogs (tx, rx) to achieve this and I really want to drop this to one

Andy

Edit: Thank you AntoineDoinel for the kind words about my site

Stephen Moraco · 2011-02-21 23:46

Leon wrote: »

How do the timings compare with those for a conventional MCU hardware UART? Presumably, they have no significant variability

Leon, this is a good question. Many others can chime in I'm sure... ;-)
My experience (and overly general answer) is this:

Each MCU one uses needs a crystal. Then, generally the MCU generates an internal clock from this external clock. Some MCUs then offer variations of the internal or sometimes even the external to its Serial module (for lack of a better name). So we have some multiple of / or division of the main clock arriving at the Serial Module. Yes, as you were saying the module generates fairly precise bit timings. However the error of this timing has to do with the clock frequency arriving at the Serial module and the desired baud rate that one wants to use. When you think of the typical baud rates being from 75, 300, 1200, 2400, 9600, 19200.. and so on (up to 3Mbit we can do with the FTDI chip) you can imagine that some of these rates, given a single fixed clock, have different error percentages. So when we pick a part to use and the crystal to use, we study the rates we need and the tolerances and the configurations of the clocking to the serial module we need to get a good match for our application with the goal being the lowest error rate at the baud rate we need.

Now, here's where the propeller is sweet!

Theoretically, we don't have this problem at all. Since we have a software UART we can create wonderfully accurate bit edges beyond what nearly any well behaved application would need. This is why you saw Phil wondering why I would be seeing errors since the propeller can do so well! I'm with Phil, I was surprised, too, especially when I had to peer at the traffic to understand why my pretty equipment wouldn't listen to my propeller speaking serial!

So, good question... we were surprised... but now that many of us are thinking of the problem and potential solutions I'll bet one of us will come up with a better answer than we have today!

What makes my application less mainstream on this forum is that I'm not using the propeller for experimentation... I'm building production code for it and am using the propeller in production hardware. I'm starting with what all of us start with... objects from OBEX and objects I've created and hints and suggestions from our forums. An then I tune each object to get the performance I need for my application.

I'm hoping that some of this tuning will apply to experimentation and then I'll contribute the code back to OBEX and/or share techniques with all of us here at the forums. This is also why I bring up these issues on the forums... because we all can learn from them, right?

Like I said, good question. Thanks for asking!

-Stephen, KZ0Q

Stephen Moraco · 2011-02-22 00:04

Tracy Allen wrote: »

This is interesting, and disturbing too. I use the ...

Stephen, is that in the same ballpark as what you were seeing? I think my report is worse.

Tracy, it's very cool that you ran these experiments and shared the results. Thank you!

I'm thinking the width issues are actually worse when we have Rx traffic at the same time as Tx... because now we are causing MAIN RAM accesses for TX and for RX and during anywhere in any bit relative to the other direction... (I'm guessing it's a major source of the jitter you are seeing).

I'm encouraged by your results with the pcFullDuplexSerial4FC driver after enabling all four uarts. However, I know specifically that my intolerant commercial device measures the width of the start bit which must be within tolerance before it will communicate with me. So now I'm worried by your report of the short-sheeted start bits. ;-( The lack of a free lunch here is troublesome...

-Stephen, KZ0Q

Ale · 2011-02-22 03:41

Is not possible to run the propeller with a multiple of the bit time you need and use one COG per serial line ? After all UARTS do exactly that... and a 5MHz crystal is a mis-fit.

kuroneko · 2011-02-22 04:30

@Stephen: If you want something done simply claim that it's officially impossible. Usually you'll have a solution within a week.

@All: How is this error value calculated? Say a bit lasts for 100 cycles, if I'm one cycle off is that 1%?

localroger · 2011-02-22 06:01

Stephen Moraco wrote: »

However, I know specifically that my intolerant commercial device measures the width of the start bit which must be within tolerance before it will communicate with me.

Before proceeding, I cannot emphasize this enough: This is not how serial is supposed to work. Whoever designed this device is an idiot. And I say that as someone who has written software UARTs for at least a dozen different microprocessors, many of which have functioned for years in hostile industrial environments.

That said, I can think of a way to approach a driver that would work for you. On the output side you need to prepare your bits ahead of time and use WAITCNT directly followed by OUTA to ship them. You should be able to get 12.5 nanosecond accuracy that way.

On the receive side, you need to interleave your sample taking and processing in such a way that it doesn't interfere with your output preparations. The rule of thumb (at least according to people who know how the protocol is supposed to work) is that you need to sample at 4x the baud rate so that you can target your timing to the middle of each bit. I'd suggest a pattern of OUT, 1/8, read, 1/4, read, 1/4, read, 1/4, read, 1/8, OUT. You need to make sure all receive logic plus output logic can be done in 1/8 baud interval; at 115 kbaud that gives you room for about 40 PASM instructions. That should be do-able, but you could always move RX to another cog if it's a problem; TX only should be simple.

That said, I won't be doing this because I have absolutely no need for it. Nearly everything in my industry runs on RS232 and I've never encountered a device this picky, probably because if anyone built one nobody would use it because it wouldn't work with anything else.

pjv · 2011-02-22 09:53

@LocalRoger;

I would suggest that oversampling at 4X is a bad idea, or at least not as good as oversampling at 3X. Better yet is 5X. With an odd number you can get better statistical sampling in the middle of a received bit. With even nubered oversampling you cannot hover at the bit center. This all of course is not true if one does hardware edge detection such as WAITPE, but then that would negate any full duplex in one cog.

Cheers,

Peter (pjv)

localroger · 2011-02-22 10:10

pjv, at 3x oversampling you will generally hit the start bit on two samples. Without more information you don't know whether to use the first or second, and either could be anywhere within the bit.

At 4x you will generally hit on three samples and by taking the middle you will be at least 25% of the bit time away from an edge. I use this when timing is tight because it is the minimum practical sample rate, and in practice I've never had a problem with it since serial re-syncs every 10 bits anyway.

More samples are of course better, and more practical at lower baud rates.

pjv · 2011-02-22 10:58

@LocalRoger;

Not so.

With 3X oversampling you are guaranteed to hit the start bit in the first third of the bit. Then you sample the next bit (first data bit) 4/3 of a bit later, and that will land you always within the middle third of the data bit, so guaranteed to be at least one third from either edge. You then sample the remaining bits at 3/3 (full) bit intervals.

For any odd oversampling rate, my method guarantees a better "middle" sample than a corresponding even numbered sample. As you indicate, 4X assures sampling at least 25% into the bit, whereas 3X sampling assures you at least 33% into the bit.

So....... I now think I have better expained my rationale, and suspect that you agree.

Cheers,

Peter (pjv)

Phil Pilgrim (PhiPi) · 2011-02-22 11:30

I agree with Peter. This has come up before, BTW: http://forums.parallax.com/showthread.php?89169&p=615171&viewfull=1#post615171

-Phil

Tracy Allen · 2011-02-22 11:38

A test of Ariba's serialSend.spin. Not surprisingly, the basics, very accurate timing:

Sending either the byte $FF or the byte $00 repeatedly. That is,
--$FF=start bit space only, expected width 8.68µS at 115200 baud
--$00, start + 8 space bits, expected width 78.125 µs at 115200 baud.

For SendSerial, after 1000 samples:
--$ff: 8.737 µs mean, variation 8.728 to 8.740, sigma 0.003 0.14% bit width variation.
--$00: 78.136 µs mean, variation 78.127 to 78.140, sigma 0.003, 0.012% byte width variation.

--mean start bit width is 0.65% above expected.
--mean byte width is 0.014% above expected.
--I also tested sending byte $55, alternating high/low bits. Bit0 to bit7 all test at 8.675 µs mean, sigma 0.003, and it adds up, 8.737 + 8 * 8.675 = 78.137, which agrees with the total byte width. The start bit is about one instruction (50ns) too long. The data bits are right on, 8.675µs compared to 8.680µs ideal. Note that 12.5ns divides 8.675µs evenly, but not 8.68, so that is the best you can do. (and I can't vouch totally for my 'scope timebase).

AntoineDoinel · 2011-02-22 17:09

Some more thoughts:

1. The problem is most likely unidirectional: the device will not communicate if incoming data is not exactly timed, but should it ever respond, I don't think the prop would have any problem receiving with the standard full duplex object.

2. When bit-banging, non divisibility of the bit time is less of an issue as you go higher with the clock. 20MIPS is usually more than adequate to talk at 115200 (or faster) with most well behaved serial devices.

3. Even with a lower clock, you could still do a sort of "dithering" on bit times to avoid accumulation of the error (I've tried this doing half-duplex 115200 on a PIC12F675 with the flaky internal 4MHz oscillator).

I didn't pay enough attention to the original question, reading it again I see Stephen is trying to actually get rid of the second cog, I thought that it was ok to use two.

But then, in this specific case, the time multiplexing method used in FullDuplexSerial (jmpret to the other section to see if there's something to do) is always going to mess up timing, making my point 2 less relevant and point 3 completely unapplicable.

Given the requirements of single cog, full duplex, and precise timing, the most likely solution I see is using some form of strict time multiplexing, such as pjv's scheduler (or some other COG threading technique) coupled with Ale's suggestion of a divisible clock frequency (i.e. a 3.684MHz crystal with 16x PLL).

Then you can implement TX and RX separately, and still have timebase as an exact divisor for bit time.

Regards
Alessandro

localroger · 2011-02-22 17:15

Point taken, and well explianed pjv.

On edit, I've been using 4x since roughly 1988 probably because I saw someone else use it. It works fine because, by design, the standard is forgiving of timing errors.

Kye · 2011-02-22 17:21

Maybe its time for a half duplex driver? One that can receive or send with perfect bit timing and is fast.

Anybody aware of high accuracy (0.7 % or less) serial full duplex driver?

Comments