Fast Full-Duplex Serial, 1 Cog - a.k.a. FFDS1
lonesock
Posts: 917
Hi, everybody.
I'm ready to upload this to the OBEX, but I would like a little more peer review before tempting a prop newbie to try my code if it's buggy. [8^)
The main FFDS1 features are basically the components of the name. [8^)
* fast : 460_800 baud on a 80 MHz prop
* full-duplex : FFDS1 can handle full speed block TX and RX at the same time
* no jitter : um, no jitter
* FFDS1 uses only a single cog.
Any and all feedback appreciated...bugs, features, documentation, etc.
thanks,
Jonathan
P.S. This version is a bit updated over the prelim version posted a while back on another thread. This will be the official place to get FFDS1 until such time as it goes into the OBEX
I'm ready to upload this to the OBEX, but I would like a little more peer review before tempting a prop newbie to try my code if it's buggy. [8^)
The main FFDS1 features are basically the components of the name. [8^)
* fast : 460_800 baud on a 80 MHz prop
* full-duplex : FFDS1 can handle full speed block TX and RX at the same time
* no jitter : um, no jitter
* FFDS1 uses only a single cog.
Any and all feedback appreciated...bugs, features, documentation, etc.
thanks,
Jonathan
P.S. This version is a bit updated over the prelim version posted a while back on another thread. This will be the official place to get FFDS1 until such time as it goes into the OBEX
Comments
Allow a Constant Value set for Tx/Rx bit counts ?
Edit: Allow a Constant Value set for Number of Stop bits ?
Even tho the 'standard' PC serial is 8 bits, there are instances where this would be used Prop-Prop, and there, up to 32 bits is easily handled.
This can save 'which byte' message fragment handling.
The Baud precision of course is tighter than 8 bits, but easily met with a Xtal Prop.
A defined STOP bit count can help pace Send speeds, and give remote units a known time to (eg) flip RS485 direction in half-duplex designs.
Other extensions could be parity/checksum style check bits, and/or an address bit handler.
Address bit can be force parity on a PC, or a host uC with 9 bit mode, or it can be managed in the serial cog
We did what I called a twisted ring daisy chain of many small micros running at high bauds.
Here the protocol rule was simple Address-bit-edge based :
N bytes following an ABE =\_ were mine, and in those slots, a reply was inserted, with AB held high. All other cases, are simply echo.
The AB edge thus moves as the packet travels around the ring.
- result is a long TX message drops N bytes to every node, in chain-order, and the incoming data has reply info from all nodes.
If you send more than Node count, those bytes arrive back unchanged, so you can check total installed chain size.
Duane J
I thought 465116 Baud was quite good for Full Duplex in 1 COG ?
(btw, I really like your ring protocol.)
In terms of features to add, I can't really go too crazy with this, not and still hit 460.8 kbps at 80MHz (which is one of my main goals). I was hoping to maybe add in software flow control, but even that might get too crazy. [8^)
Regarding speed, I could not think of a faster way to do jitter-free RX as well as TX in one cog. The cog samples at 1/2 bit period intervals, and RX uses a counter to see how much of the start bit has been captured once it arrives. If it has been on < 1/4 bit period, then I wait one more 1/2 bit period before sampling RX. This guarantees that I sample RX at least 1/4 bit period from either edge. I know other people have done high-speed drivers, and jitter-free drivers, and one-cog drivers (off the top of my head I recall the PBnJ driver, and Kye had one too).
thanks,
Jonathan
I just didn't see the underline and read it as 460 to 800 baud.
I apologize for my stupidity.
Duane J
Jonathan
The problem seems to be somewhere in the PASM receive function, and it has to do with receiving too much data at once. Here is main.spin:
As you can see, it simply echos the characters you send it. When I type in characters by hand, it works fine. But when I try to send a bunch of characters (such as the characters in the code block above), it produces bad output:
This output changes each time I run the program.
I've also tried running two instances of the FFDS1 object with one Tx'ing into the other's Rx, and it works fine (as expected: that's what the test program does). Transmit also works fine: I can transmit all sorts of things without error.
As a final test, I've tried hooking it up to a GPS outputing at 9600, and get similar results.
thanks,
Jonathan
Jonathan
Finally I found some time for testing your FFDS1. Thanks for sharing this great object! It'll be assimilated into my codebase ;o)
Some little things I'll change right away, some changes might come after further testing.
Right away changes:
1. reordering the variables -> the variables needed by the PASM-part should be in sequence and separated from variables only needed by the SPIN-part. This way I can use memory allocated during runtime (rudimentary memory management) instead of memory allocated by the compiler.
2. doing the setup of PASM in PASM and not via injection in the start function -> same reason, separation of cognew and start-function
3. moving the 'wait for end of transmission' from end of function to start of function -> I think in a lot of use-cases, you output a buffer and do other things after that. So, why block the 'do other things'-part for the whole transmission time? This only makes sense if you directly want to overwrite the transfer-buffer. But for this case you have the waittx-function. (To be honest, the waittx function currently does not make sense because tx already waits and it's not save to use it for syncing across COGs).
This is a point which might be of general interest. Of course I see that it is more beginners-friendly to keep it like it is, but I also think that it'll increase the net-transfer-rate and overall program-speed if doing it the other way around. Maybe it makes sense to have both versions?
Possible changes:
Maybe it's faster to generate the whole output-string before sending (hex,dec,bin) instead of calling the single character tx-function.
I found those kind of weaknesses in the FDS when experimenting with the raspberry. Transmission worked without problems up to ... I don't remember, but some hundred kbit/sec, but after 115200 the net transfer-speed did not increase because the SPIN-part simply is to slow to deliver the bytes fast enough. That's why your driver is a great improvement.
I've converted it to C++, and done my testing there, but I believe that the functionality is the same. To be specific, I have not tested with the Spin version of the program. In any case, here is the C/C++ version:
The Problems
1) It doesn't seem to wait. I don't know why this is, but it definitely does not.
2) In the case that, during the period while the function is waiting, CNT wraps around it will exit prematurely (this is the same for Spin for sure).
I wrote the following (in C/C++) to correct these issues:
I couldn't figure out what might cause problem #1, though.
Thanks.
How do you build the test? Do you have a Makefile for this or at least a command line? Just so we are on the same page.
propeller-load -r ffds1
propeller-elf-gcc -g3 -DTEST ffds1.c ffds1m.S -o ffds1
propeller-load -D clkmode=xtal1+pll8x -D clkfreq=96mhz -e ffds1
The board is a QuickStart board, but I desoldered the 0-ohm resistor on the crystal, and attached my own 12MHz crystal to the solder pads. I had done this to play with some USB things, but haven't gotten around to it on this board yet. Anyway, the hardware works with spin programs, but I haven't figured out what I need to change to make C work yet. I suppose, for now, I'd better just switch the hardware back.
Jonathan's code is written tight, so messing with it is upon your own cog-nition. But take a look at how it is done in fullDuplexSerial4port. It is not a heavy load. In between received characters, you are testing for a new start bit. If one is not detected, then you jump to a routine that compares the difference between the head and tail pointers to the programmed % of buffer full. If above that percent, it sets the RTS to its stop state and then goes on back to looking for the start bit.
Some outside devices take some time to respond to RTS and may empty their xmt buffer before halting transmission, so the buffer has to be big enough to account for that. The RTS process will mess up if you are operating on the edge of the speed zone, where input characters are arriving head to tail with one stop bit. It would be fine at lower speeds or when incoming characters are paced.
A parameter in the init method in fullDuplexSerial4port selects the pin to use for RTS, or -1 if none. If none is selected, the initialization code changes the JMP to the RTS processing to a NOP.
Thanks for the advice. The RTS and CTS acronyms confuse me: isn't RTS an indicator to the host (the Propeller in this case) to disable transmission. I'm getting this information from the RN-42 Bluetooth datasheet:
Basically, I want to make sure that I'm not losing bytes when sending by sending too much from the Propeller. Your second paragraph would be a solution for CTS? Or, are RTS/CTS like RX/TX: it's relative to where you are looking from?
So, the solution that I came up with is as follows:
In some preliminary tests with a button acting as the master on the RTS line, this seems to work (at 460800). Does anybody see any timing problems?
edit: some of the format of the PASM is a bit different: it's from a C++ GAS driver, which is why there are no local labels and the variables are capitalized. And that's why comments use // instead of '.
The RN-42 bluetooth module should by all rights be a modem, DCE device, but the description has it tagged as a DTE, at least for the handshaking lines...
FullDuplexSerial4port works from the standpoint of DTE, so cts is an input for control of flow from the tx pin. The cts pin state is combined with the state of the xmit buffer. In the following, the uart initialiation patches if_z to either if_z_or_nc or to if_z_or_nc , depending on the desired polarity, or leaves it at if_z if cts control is not desired.
I think it should set the rx pin to input when the start method is called.
Maybe you were referring to how all the DIRAs are OR'd together? That does present some interesting problems. My solution is to set the RX pin to input, then test DIRA to see if it really is input. If it is not (another cog has it set to output), then Start() fails and returns the appropriate error code.
How do you distinguish between a pin driven from an external source and driven by another cog? Besides, you can only read your own dira ...
I guess it's more of a problem with the Propeller hardware, then. It would have been handy to be able to read what the DIRA register was set at (after all the OR'ing) so that these cases could be accounted for.
Still, I think it would be a good idea to set the RX pin to input, even if that covers only 25% of the cogs.