yet another variant, fullDuplexSerial4portplus
Tracy Allen
Posts: 6,664
This takes off from Duane Degn's thread on the 128 and 512 byte buffer modifications of Tim Moore's pcFullDuplexSerial4fc.spin. Having multiple ports accessible from different cogs has been invaluable for my own work, so I'm a big fan, and want to make my own small contribution.
new features:
-- Independently sized buffers for all 4 rx ports and all 4 tx ports.
Sizes declared as constants at compile time.
For example:
-- cts and rts flow control are operational with the correct polarity (bug fix).
-- my mods do not affect the size or speed of the pasm, or the course of the co-routines, so this should run at least as well as the original.
-- added a couple of spin methods, for information purposes, e.g. rxHowFull(port), and the Init method returns the aggregate size of the above buffers. I considered making the partition of buffer sizes be parameters in the addPort method, but I don't think that is something I would need or want.
There is an accompanying demo program that can be set up on a pcb like the quickstart with just two jumper wires.
Wiring summary:
jumper from p11 to p8 ascii data
jumper from p10 to p6 flow control
The demo sets up port 3 in its own cog to transmit a fixed string message on pin p11. Port 2 in a different cog receives the message on pin p8. It has rts flow control enabled, pin p6, which goes back to the cts pin p10 for port 3. Debug port 0 shares the cog with port 2 and sends out the message received along with information about the status of the port 2 head and tail pointers. One can observe the action of flow control on a 'scope, or disable it to see the effect on the integrity of the messages, or experiment with parameters like the rts threshold or the baud rates and timing.
I am labeling this version 0.2. I am still looking into the issue of jitter and other hiccups.
edit: Version 0.3 -- Serial ports or tx or rx directions that are not defined or not enabled are no longer included in the scan. Jitter is considerably reduced and it is no longer necessary to define all 4 ports to attain the lowest jitter. Buffer size can be reduced to zero for unused ports or tx or rx directions. Code is added in receive to detect framing errors, and misframed bytes are not added to the receive buffer. Still testing.
new features:
-- Independently sized buffers for all 4 rx ports and all 4 tx ports.
Sizes declared as constants at compile time.
For example:
[SIZE=1] [/SIZE][SIZE=1]TX_SIZE0 = 16 ' enter in the needed size in bytes for each rx and tx buffer TX_SIZE1 = 4 ' these values are arbitrary, just to show that they can be different TX_SIZE2 = 16 ' not necessarily binary or less than any particular limit. TX_SIZE3 = 64 RX_SIZE0 = 100 RX_SIZE1 = 8 RX_SIZE2 = 200 RX_SIZE3 = 64[/SIZE]The aggregate number of bytes required is computed and added to the object size. Part of the buffer is within object footprint used by pasm variables, but the object can still be restarted.
-- cts and rts flow control are operational with the correct polarity (bug fix).
-- my mods do not affect the size or speed of the pasm, or the course of the co-routines, so this should run at least as well as the original.
-- added a couple of spin methods, for information purposes, e.g. rxHowFull(port), and the Init method returns the aggregate size of the above buffers. I considered making the partition of buffer sizes be parameters in the addPort method, but I don't think that is something I would need or want.
There is an accompanying demo program that can be set up on a pcb like the quickstart with just two jumper wires.
Wiring summary:
jumper from p11 to p8 ascii data
jumper from p10 to p6 flow control
The demo sets up port 3 in its own cog to transmit a fixed string message on pin p11. Port 2 in a different cog receives the message on pin p8. It has rts flow control enabled, pin p6, which goes back to the cts pin p10 for port 3. Debug port 0 shares the cog with port 2 and sends out the message received along with information about the status of the port 2 head and tail pointers. One can observe the action of flow control on a 'scope, or disable it to see the effect on the integrity of the messages, or experiment with parameters like the rts threshold or the baud rates and timing.
I am labeling this version 0.2. I am still looking into the issue of jitter and other hiccups.
edit: Version 0.3 -- Serial ports or tx or rx directions that are not defined or not enabled are no longer included in the scan. Jitter is considerably reduced and it is no longer necessary to define all 4 ports to attain the lowest jitter. Buffer size can be reduced to zero for unused ports or tx or rx directions. Code is added in receive to detect framing errors, and misframed bytes are not added to the receive buffer. Still testing.
Comments
A fix should be pretty straightforward.
-- Un-defined serial ports or tx/rx directions are no longer included in the scan. Jitter is considerably reduced and it is no longer necessary to define all 4 ports to attain the lowest jitter.
-- Buffer size can be reduced to zero for unused ports or tx/rx directions.
-- Code is added in receive co-routines to detect framing errors, and misframed bytes are not added to the receive buffer.
Still testing. If anyone has comments or bug reports, I would definitely like to hear them.
In the initialization, I patched the instructions for unused segments from a jmpret to a simple jmp, e.g. for port 3:
I'm still puzzled as to why Tim's original code for blocking out unused ports and tx/rx directions does not work. It seems to me that it should work, but tracing the co-routines can be like finding an exit from a Klein bottle. Tim had commented out the relevant patches in the OBEX release and also in the later 8/14 release, but why? Both of those releases came after he added the flow control, by forum request. Here is the OBEX code for port 3, and the if_z is commented out that would patch the co-routine vectors starting from port 3 tx and upward thru all 8 vectors.
I love this object, and the modifications you made definitely improve the experience.
thanks again,
Massimo
Tim's object is great but there are definitely a few strange things on in it. I will look at your changes when I have more time.
start bit statistics
...with one single port defined
......pcFullDuplexSerial4fc duration 5.6-9.2 µs, sigma 0.53
......fullDulexSerial4portPlus duration 8.0-8.41 µs, sigma 0.04
...with 4 ports defined
......pcFullDuplexSerial4fc duration 8.0-10.4 µs, sigma 0.46
......fullDulexSerial4portPlus duration 6.4-9.6 µs, sigma 0.34
null byte plus start bit statistics
...with one single port defined:
......pcFullDuplexSerial4fc duration 76.8-80.0 µs, mean 79.7 µs, sigma 0.96
......fullDulexSerial4portPlus duration 76.8-77.23 µs, mean 77.2 µs, sigma 0.08
...with 4 ports defined
......pcFullDuplexSerial4fc duration 78.0-80.4 µs, mean 79.6 µs, sigma 0.73
......fullDulexSerial4portPlus duration 77.6-80.4 µs, mean 79.6 µs, sigma 0.63
This makes a lot more sense. The single port version is expected to have lower jitter than the four port version, and it was highly counter-intuitive to have to enable 4 ports in order to minimize the jitter.
@ Massimo, Grazie for looking at it. I wonder if the issue you brought up in this thread, would now resolve without having declare all four ports?
@ Duane, I don't think I would have jumped into it if you hadn't taken the plunge first!
I tested v0.3 and it works perfectly with the "offending" configuration you cited.
It wasn't necessary to create the dummy ports.
Just to be sure I tested the standard 4 ports version and I got garbage again.
Just remember to set a reasonable buffer for all the ports when you'll post the object in the obex. Port 1 has "1" as buffer, and it could be misleading.
Thanks again,
Massimo
Good point about the buffers when OBEX'd. The best thing at that point might be to make them the same as the original, 16 bytes for each TX and 64 for each RX buffer. People who need larger, or smaller, buffers can figure it out from there. Now the size of an unused buffer can be reduced to zero, and the needed buffers can be made a multiple of the packet size expected, as large as needed within available memory.
For instance NMEA string parsing requires rx only.
Thanks again,
Massimo
I took out most of the numeric methods and put them in their own object, dataIO4port, which has uarts : "fullDuplexSerial4port" declared within it. So instead of a call
uarts.dec(1, myLong),
you do
dio : "dataIO4port"
dio.dec(1, myLong)
Because of the way the buffers are defined as DATa, you can declare and use the same methods and serial buffers from within different objects. The dataIO4port object also contains routines for numeric and string input borrowed from PST.
The reason for the bifurcation is that I usually use a different object for numerics, building a record and sending the whole thing out using the uarts.str(port, myString) method rather than individual numeric methods.
I finally had a need to try your updated (v0.3) driver. Your code checking for a stop bit really improved the way the drivers works.
This has helped me lot so far.
Thanks!
I added my end of message "flags" trick to to your driver so now the serial driver works just the way I was hoping my earlier driver would have worked. The messages are nice and clean. I was getting a lot of garbage with my driver.
The end of message flags sort of behaves like a StrInCheck method so I don't need to dedicate a cog to watch for an incoming string.
Thanks again for your great additions to this very useful object.
I've been using 0v3 for some time, with 1 and 2 ports running. I also tested it starting and stopping the object for reduced consumption, and the object works reliably.
Massimo
I've now uploaded the object to http://obex.parallax.com/objects/856/
There is only one small change that brings it to version 1.01. The init method now returns a pointer to the internal data structure, specifically, the first long in the data structure is the size of the rx buffer. Various internal variables such as the head and tail pointers are at fixed offsets, so this provides an easy way to peek at those locations if need arises.
Just dropping it in as a replacement, solved a problem I had in exchanging System Exclusive data with a Roland D-50 synthesizer.
Probably it was just a matter of buffer size, but the improved jitter accuracy is also a big plus.
I did a comparison test between the Parallax provided FullDuplexSerialPlus, and FullDuplexSerial4portPlus_0v3. FDS+ worked fine at the higher baud, but the 4portFDS did not (see output below).
I made sure that the buffers on 4portFDS were very large. I also tested with the original 4portFDS, and there were similar (incorrect) results.
I am testing by sending it the attached test_input.txt file via the serial connection by the following command sequence: Where the C-* is control. This command sends the file test_input.txt through the serial port at 230400 baud with a 1 ms delay between characters. I've tried increasing the delay between characters as high as 100ms, and it still has the same result.
Correct Output (with FDS+):
Incorrect Output (w/ 4 port driver):
When I try the same code at 115200 baud it works perfectly. This is the case for both 4port and FDS+.
I also tried regression testing with the original four port driver (here) and it had the same issues.
Any solutions, or explanation of why it won't work correctly for me? I've never used ascii-xfr before, but I assume that since it works with FDS+ it should work with 4portFDS too.
Well, fullDuplexSerial4port has a lot longer dance routine to scan 4 ports. It can't be otherwise.
Each bit duration at 230400 baud is 4.34 µS. With clkfreq=80MHz, that is 347 clock cycles or about 86 pasm instructions. When fullDuplexSerial4port is simply sitting there idle, just looking for incoming or outgoing, it has a cycle of 4*6=36 pasm instructions. One of those instructions for each port is a RDLONG for checking the tx head pointer, so with waiting for the HUB access it comes out equivalent to more like 4*10=40 pasm instructions (4*39=156 clock cycles). Right there you have a potential jitter of 1/2 bit time in detecting the start bit, even if the incoming bytes arrive at a leisurely pace. After it detects the onset of the start bit, it takes 16 more clock cycles to set up a 1/4 bit time delay of 86 cycles, and in execution it will hit that delay 156 clock cycles later, the first time back around after checking tx and rx for the other 4 channels. In short, the synchronization to the bit times could be atrocious at that baud rate. And it will be even dicier when the other channels are active at the same time or when flow control is active.
Another delay and uncertainty arises when a complete byte is received. At that point there is a long code segment that involves 3 HUB accesses (but the time between them hits the sweet spot). That would matter more when bytes are received head to tail and again when multiple ports are active at the same time.
The pattern you reported surprises me a little--That it works as well as it does. I'd expect more of the alphanumeric characters to be garbled, and I'd expect more framing errors. The final version on the OBEX discards all misframed bytes (those lacking a proper stop bit).
I know Tim Moore added those comments about top speed to the original pcFullDuplexSerial4FC, and I left those in, but they really do need to be revisited.
In my limited testing, I found 115200 bps is about as fast (I think the limit is a little higher than 115200) as Spin can move data. Since I know SRLM already knows this, I still thought it worth pointing out in this thread for others interested is fast serial communication with the Prop.
Yep, I haven't forgotten about the limitation of spin. For everyone else, the Spin limitations shouldn't produce garbled data. Instead, if the spin loop in my last post is too slow it will just drop characters, rather than changing the actual bits. And if the rx buffer has enough room for the entire input, the speed of spin doesn't matter (since the PASM driver can put the data in the buffer without worrying about spin speed).
I've done more testing, and have come to the conclusion that most of the FullDuplexSerial* drivers cannot handle serial data with no spacing between characters. All of the serial drivers seemed to choke when they had to receive and transmit at the same time. Most of them just produced jibberish output.
My testing is mostly concerned with what I'll call "fully saturated" serial data. This is serial data with no delay between characters. I don't know what most of the embedded systems use, but the two computer programs that I am using (cutecom and ascii-xfr via picocom) both have a character delay option, specified in ms. A delay of 1ms seemed to work with most of the drivers at the higher baud rate (230400), but what is the use of that? It limits throughput to less than 1000 cps. At that point you might as well transmit at 9600 baud instead of 230400.
My conclusions: fully saturated duplex serial data at 230400 baud is not possible with the available drivers.
The best that I was able to come up with was a combination of the PASM driver here with the variable buffer size FDS in post 28 here. With this combination, the driver is able to receive saturated serial data at 115,200 baud and transmit at 115,200 baud, with only a little bit of skipping (most likely due to spin latency). It is also able to receive saturated data at 230400 baud into the buffer, then transmit it (while not receiving).
I don't have an oscilloscope to test and see what is actually getting sent, so I'll probably go find one and do that. Although that might take a while.
For my future projects, I will be using the FastFullDuplexSerialPlusBuffer.spin object (attached to this post) running at 115200 baud, with a 256 byte buffer. This seems to be the fastest that fully saturated duplex communications can be run at.
----
All testing was done on a Quickstart at 80MHz clock speed.
The only thing is that there is a typo in the docs, at the top is says that P5 is the CTS pin but in the init it actually calls for P6.
Thanks again, Tracey!
Jonathan
I just tried to set up 2 instances of the object but hang when I Init the second.
Any ideas on running 2 versions ?
You need to make a copy of the object, give it a different name, and change something in the new program to make it different from the original. Change the size of one or more of the data buffers, that is sufficient.
The OBEX version is here: http://obex.parallax.com/objects/856/
Be sure to check that the size of each buffer is appropriate for your project. The buffer sizes are declared in the CONstants section.
I had the different names...
Why is it necessary to make the "change" in the different instances of each program..ie: the buffer ???
Thanks for your work! Also, thanks Duane and others; it's appreciated!
I think I found a bug: when I change Demo1 to define DEBUG as port 3 instead of port 0, it seems to hang trying to get the input here:
PUB getReps
fds.str(DEBUG,string(13,13,"Please enter number of repetitions: "))
return dio.decIn(DEBUG)
====edit: this bug was found... ====
the problem for the above was in dataIO4port.spin, in PUB StrInMax, when the character is received,
uarts.rx(0) should be: uarts.rx(port).
Changing the subject,
I've been playing with converting this to cpp via spin2cpp, which was successful, and trying to have multiple ports (12, to be exact). In order to get the buffers not to overlap, I copied fullduplexserial4ports.spin to different file names, converted them to cpp, and then use the new names to define new objects within my demo1 class. I got fairly far along. But when I'm changing it to try to receive via Rxcheck, I see data when I shouldn't. I've verified that the ports in question have input pins that are being held high, so I'm not sure why data is getting there. So I'm back-tracking to try to do something in spin first. That's when I found this issue with reassigning DEBUG from 0 to 3 causing the duo.decIn(DEBUG) to fail. It works fine when I put it back to 0.
I'm not sure if you've already done this or not.
Another thought is with Spin it takes more than just changing an object's file name to have it separate copy of the DAT section included in RAM. I recall with earlier versions of the 4-port serial some change to the code had to be made in order to use more than four ports. I don't recall if this is still required with Tracy's code.
Duane is right about having to change something in the code, not just the name, in order to create separate instances. It can be a minor change, for example, change the last dummy DATa byte from
extra byte 0
to
extra byte 1
and that will be sufficient. Doing so duplicates all of the code as well as the data.
The second line has to do with RTS flow control not CTS flow control, and should be moved down 7 code lines so that it appears after the test for rts flow control.
This only affects port 0, and only when rts but not cts flow control is enabled on that port.
Wow, I didn't really expect to see a reply, nice that you guys replied; thanks!
Re-reading Duane's post, I'm editing what I had in here a few minutes ago. I set the buffers quite high, that's not the problem. I'm also controlling how much data gets sent to the ports, so I can be sure it doesn't overflow (at least for this initial testing). Thanks for the suggestion though. While I couldn't read a character at a time, I was able to read in a <cr> terminated string via the StrIn method. I found a limitation of it, the max string size is set to 32, that caused me a bit of confusion when I first encountered it.
Regarding having to change some code besides just renaming, I appreciate the reminder; I did see that in reading through the forum. I was just starting to experiment with the C++ version. My thought was that enough changed when it got converted from spin to C++ to solve needing to make another change to the code. But I'm keeping that in mind!
For some reason I'm having issues with trying to just receive and send a character when I use char := fds.rx(port). I'm sure it's something stupid that I'm doing wrong. If I don't figure it out myself in the next hour, I'll add another post asking for help.
Tracy, thanks for the other fix too. The plan currently is to do 38400 baud for all serial ports, none of them will have flow control. One of the attached devices will send binary, so I need to be able to transmit and receive NULLs reliably. Which means I have to solve the reception of NULLs (when an input is held low) rather than generate a framing error issue, if it still exists.
I saw an earlier post by SRLM indicating that the drivers weren't capable of handling serial data with no spacing between the characters, and also that they seemed to choke when receiving at the same time as transmitting. Does anyone know if this is true at 38400? It wasn't clear if that only happens at higher speeds. Has anyone been using this object to send and receive on multiple ports at the same time? It seems as if most folks are using this for testing one or two ports.
Lastly, I'm still trying to wrap my head around all of the memory usage. (just a comment, not necessarily related to this discussion: The application I'm working on does not need to share across objects for debugging, so DAT isn't really required, I don't think.) Since I need at least 9 ports, which means three cogs will be running this code, I'm wondering if it's really an advantage to partially use the HUB space taken by the footprint of this object, or if it would make more sense to just separate the receive and transmit buffers to a different, known HUB area for each instantiation and just have one footprint of this code that gets loaded into each of the cogs?
OK, one more question... how do you disable ports that are not going to be used? Is simply passing -1 for the RX and TX pins to "fds.AddPort" enough?
regards,
Fritz