Fast Full-Duplex Serial, 1 Cog - a.k.a. FFDS1
Hi, everybody.
I'm ready to upload this to the OBEX, but I would like a little more peer review before tempting a prop newbie to try my code if it's buggy. [8^)
The main FFDS1 features are basically the components of the name. [8^)
* fast : 460_800 baud on a 80 MHz prop
* full-duplex : FFDS1 can handle full speed block TX and RX at the same time
* no jitter : um, no jitter
* FFDS1 uses only a single cog.
Any and all feedback appreciated...bugs, features, documentation, etc.
thanks,
Jonathan
P.S. This version is a bit updated over the prelim version posted a while back on another thread. This will be the official place to get FFDS1 until such time as it goes into the OBEX
I'm ready to upload this to the OBEX, but I would like a little more peer review before tempting a prop newbie to try my code if it's buggy. [8^)
The main FFDS1 features are basically the components of the name. [8^)
* fast : 460_800 baud on a 80 MHz prop
* full-duplex : FFDS1 can handle full speed block TX and RX at the same time
* no jitter : um, no jitter
* FFDS1 uses only a single cog.
Any and all feedback appreciated...bugs, features, documentation, etc.
thanks,
Jonathan
P.S. This version is a bit updated over the prelim version posted a while back on another thread. This will be the official place to get FFDS1 until such time as it goes into the OBEX
Comments
Allow a Constant Value set for Tx/Rx bit counts ?
Edit: Allow a Constant Value set for Number of Stop bits ?
Even tho the 'standard' PC serial is 8 bits, there are instances where this would be used Prop-Prop, and there, up to 32 bits is easily handled.
This can save 'which byte' message fragment handling.
The Baud precision of course is tighter than 8 bits, but easily met with a Xtal Prop.
A defined STOP bit count can help pace Send speeds, and give remote units a known time to (eg) flip RS485 direction in half-duplex designs.
Other extensions could be parity/checksum style check bits, and/or an address bit handler.
Address bit can be force parity on a PC, or a host uC with 9 bit mode, or it can be managed in the serial cog
We did what I called a twisted ring daisy chain of many small micros running at high bauds.
Here the protocol rule was simple Address-bit-edge based :
N bytes following an ABE =\_ were mine, and in those slots, a reply was inserted, with AB held high. All other cases, are simply echo.
The AB edge thus moves as the packet travels around the ring.
- result is a long TX message drops N bytes to every node, in chain-order, and the incoming data has reply info from all nodes.
If you send more than Node count, those bytes arrive back unchanged, so you can check total installed chain size.
Duane J
I thought 465116 Baud was quite good for Full Duplex in 1 COG ?
(btw, I really like your ring protocol.)
In terms of features to add, I can't really go too crazy with this, not and still hit 460.8 kbps at 80MHz (which is one of my main goals). I was hoping to maybe add in software flow control, but even that might get too crazy. [8^)
Regarding speed, I could not think of a faster way to do jitter-free RX as well as TX in one cog. The cog samples at 1/2 bit period intervals, and RX uses a counter to see how much of the start bit has been captured once it arrives. If it has been on < 1/4 bit period, then I wait one more 1/2 bit period before sampling RX. This guarantees that I sample RX at least 1/4 bit period from either edge. I know other people have done high-speed drivers, and jitter-free drivers, and one-cog drivers (off the top of my head I recall the PBnJ driver, and Kye had one too).
thanks,
Jonathan
I just didn't see the underline and read it as 460 to 800 baud.
I apologize for my stupidity.
Duane J
Jonathan
The problem seems to be somewhere in the PASM receive function, and it has to do with receiving too much data at once. Here is main.spin:
CON _CLKMODE = XTAL1 + PLL16x _XINFREQ = 5_000_000 '******************************************************************************* OBJ debug : "FFDS1_66.spin" CON debug_rxpin = 31 debug_txpin = 30 debug_baud = 115200 PUB Main debug.Start(debug_rxpin, debug_txpin, debug_baud) waitcnt(CLKFREQ + CNT) repeat debug.Tx(debug.Rx)
As you can see, it simply echos the characters you send it. When I type in characters by hand, it works fine. But when I try to send a bunch of characters (such as the characters in the code block above), it produces bad output:
Terminal ready *** file: main.spin ascii-xfr -s -c 0 main.spin ASCII upload of "main.spin" *** exit status: 0 CN _LME XA1 L1x _IFE =50_0 *************************************** BJ�(V!$��e�r�� eu_xi 1 O P X-��(V�10euxi 3 U�}`i,dbgtp,dgbu) � ctCKRH�(*��($��dbgT(eu.x Thanks for using picocom
This output changes each time I run the program.
I've also tried running two instances of the FFDS1 object with one Tx'ing into the other's Rx, and it works fine (as expected: that's what the test program does). Transmit also works fine: I can transmit all sorts of things without error.
As a final test, I've tried hooking it up to a GPS outputing at 9600, and get similar results.
thanks,
Jonathan
{417} jmpret lockstep_ret, tx_jump {418} jmp #rx_cleanup
has been replaced by{417} tjz phsb, #rx_main wr
The original version used this sequence to restart the receiver with the criteria that a hubop needs a lockstep jmpret following within 2 insns:wrword rx_ptr, update_head_ptr [COLOR="#FFA500"]jmpret lockstep_ret, tx_jump[/COLOR] jmp #rx_cleanup rx_cleanup mov phsb, #0 rx_main jmpret lockstep_ret, tx_jump
IOW, simply removing the first jmpret wouldn't have been enough. While it did work for my test setup hubop restrictions are usually there for a reason. Which brings us to tjz which does the jump (shadow[phsb] always zero) but also writes back (wr) to phsb therefore clearing both shadow and counter register.Jonathan
Finally I found some time for testing your FFDS1. Thanks for sharing this great object! It'll be assimilated into my codebase ;o)
Some little things I'll change right away, some changes might come after further testing.
Right away changes:
1. reordering the variables -> the variables needed by the PASM-part should be in sequence and separated from variables only needed by the SPIN-part. This way I can use memory allocated during runtime (rudimentary memory management) instead of memory allocated by the compiler.
2. doing the setup of PASM in PASM and not via injection in the start function -> same reason, separation of cognew and start-function
3. moving the 'wait for end of transmission' from end of function to start of function -> I think in a lot of use-cases, you output a buffer and do other things after that. So, why block the 'do other things'-part for the whole transmission time? This only makes sense if you directly want to overwrite the transfer-buffer. But for this case you have the waittx-function. (To be honest, the waittx function currently does not make sense because tx already waits and it's not save to use it for syncing across COGs).
This is a point which might be of general interest. Of course I see that it is more beginners-friendly to keep it like it is, but I also think that it'll increase the net-transfer-rate and overall program-speed if doing it the other way around. Maybe it makes sense to have both versions?
Possible changes:
Maybe it's faster to generate the whole output-string before sending (hex,dec,bin) instead of calling the single character tx-function.
I found those kind of weaknesses in the FDS when experimenting with the raspberry. Transmission worked without problems up to ... I don't remember, but some hundred kbit/sec, but after 115200 the net transfer-speed did not increase because the SPIN-part simply is to slow to deliver the bytes fast enough. That's why your driver is a great improvement.
PUB RxTime(ms) : rxbyte | tout {{ * Waits for a byte to be received or a timeout to occur. > ms : the number of milliseconds to wait for an incoming byte < returns -1 if no byte received, $00..$FF if byte e.g. if (c := RxTime( 10 )) < 0 }} tout := clkfreq / 1000 * ms + cnt repeat rxbyte := RxCheck while (rxbyte < 0) and ((cnt - tout) < 0)
I've converted it to C++, and done my testing there, but I believe that the functionality is the same. To be specific, I have not tested with the Spin version of the program. In any case, here is the C/C++ version:
int32_t Serial::GetCTime(int32_t ms) { int32_t tout = ((CLKFREQ / 1000) * ms) + CNT; int32_t rxbyte = 0; do { rxbyte = GetCCheck(); } while ((rxbyte < 0) && ((CNT - tout) < 0)); return rxbyte; }
The Problems
1) It doesn't seem to wait. I don't know why this is, but it definitely does not.
2) In the case that, during the period while the function is waiting, CNT wraps around it will exit prematurely (this is the same for Spin for sure).
I wrote the following (in C/C++) to correct these issues:
int32_t Serial::GetCTime(int32_t ms) { int tout = (CLKFREQ/1000)*ms; int rxbyte; int totaltime = 0; int previous_cnt = CNT; int current_cnt; do { rxbyte = GetCCheck(); current_cnt = CNT; totaltime += current_cnt-previous_cnt; previous_cnt = current_cnt; }while ( rxbyte < 0 && totaltime < tout); return rxbyte; }
I couldn't figure out what might cause problem #1, though.
extern _COGMEM volatile unsigned int _CNT __asm__("CNT");
That's unlikely to work. Don't know right now what the compiler is going to make of it, most likely an unsigned expression (never < 0). In your second example you side-step this by assigning the unsigned CNT to a signed int.Thanks.
How do you build the test? Do you have a Makefile for this or at least a command line? Just so we are on the same page.
propeller-load -r ffds1
thisobj.Cog = 1 + coginit(8, &fds_entry, &thisobj.Write_buf_ptr);
This will at least produce meaningful PASM code. However, the code entry address resolves as 0 (as far as the object dump tells me). Which means it's either resolved at load time or has to be prepared differently.propeller-elf-gcc -g3 -DTEST ffds1.c ffds1m.S -o ffds1
propeller-load -D clkmode=xtal1+pll8x -D clkfreq=96mhz -e ffds1
The board is a QuickStart board, but I desoldered the 0-ohm resistor on the crystal, and attached my own 12MHz crystal to the solder pads. I had done this to play with some USB things, but haven't gotten around to it on this board yet. Anyway, the hardware works with spin programs, but I haven't figured out what I need to change to make C work yet. I suppose, for now, I'd better just switch the hardware back.
Jonathan's code is written tight, so messing with it is upon your own cog-nition. But take a look at how it is done in fullDuplexSerial4port. It is not a heavy load. In between received characters, you are testing for a new start bit. If one is not detected, then you jump to a routine that compares the difference between the head and tail pointers to the programmed % of buffer full. If above that percent, it sets the RTS to its stop state and then goes on back to looking for the start bit.
Some outside devices take some time to respond to RTS and may empty their xmt buffer before halting transmission, so the buffer has to be big enough to account for that. The RTS process will mess up if you are operating on the edge of the speed zone, where input characters are arriving head to tail with one stop bit. It would be fine at lower speeds or when incoming characters are paced.
A parameter in the init method in fullDuplexSerial4port selects the pin to use for RTS, or -1 if none. If none is selected, the initialization code changes the JMP to the RTS processing to a NOP.
Thanks for the advice. The RTS and CTS acronyms confuse me: isn't RTS an indicator to the host (the Propeller in this case) to disable transmission. I'm getting this information from the RN-42 Bluetooth datasheet:
15 UART_RTS UART RTS, goes high to disable host transmitter Low level output from RN-42 0 - 3.3 16 UART_CTS UART CTS, if set high, disables transmitter Low level input to RN-42 0 - 3.3
Basically, I want to make sure that I'm not losing bytes when sending by sending too much from the Propeller. Your second paragraph would be a solution for CTS? Or, are RTS/CTS like RX/TX: it's relative to where you are looking from?
So, the solution that I came up with is as follows:
Tx_main.tx_byte // set up for sending out a byte rdbyte Tmp, Write_ptr add Write_ptr, #1 // force the stop bit or Tmp, #$100 jmpret Tx_jump, #Lockstep [b] //SRLM: Add RTS support mov INA, INA and INA, Maskrts wz if_nz jmp #Lockstep [/b] // sign extend the 1 into all upper bits shl Tmp, #(32-9) sar Tmp, #(32-10) mov PHSA, Tmp // 10 bits (start + 8 data + stop) makes 20 half-bits mov Half_bits_out, #20
In some preliminary tests with a button acting as the master on the RTS line, this seems to work (at 460800). Does anybody see any timing problems?
edit: some of the format of the PASM is a bit different: it's from a C++ GAS driver, which is why there are no local labels and the variables are capitalized. And that's why comments use // instead of '.
The RN-42 bluetooth module should by all rights be a modem, DCE device, but the description has it tagged as a DTE, at least for the handshaking lines...
[FONT=arial narrow][COLOR=#020FC0][SIZE=1]15 UART_RTS UART RTS, goes high to disable host transmitter Low level output from RN-42 0 - 3.3 16 UART_CTS UART CTS, if set high, disables transmitter Low level input to RN-42 0 - 3.3 [/SIZE][/COLOR][/FONT]
FullDuplexSerial4port works from the standpoint of DTE, so cts is an input for control of flow from the tx pin. The cts pin state is combined with the state of the xmit buffer. In the following, the uart initialiation patches if_z to either if_z_or_nc or to if_z_or_nc , depending on the desired polarity, or leaves it at if_z if cts control is not desired.
[SIZE=1][FONT=courier new] transmit jmpret txcode,rxcode1 'run a chunk of receive code, then return 'patched to a jmp if pin not used txcts0 test ctsmask,ina wc 'if flow-controlled dont send rdlong t1,tx_head_ptr '{7-22} - head[0] cmp t1,tx_tail wz 'tail[0] ctsi0 if_z jmp #transmit 'may be patched to if_z_or_c or if_z_or_nc [/FONT][/SIZE]
I think it should set the rx pin to input when the start method is called.
Maybe you were referring to how all the DIRAs are OR'd together? That does present some interesting problems. My solution is to set the RX pin to input, then test DIRA to see if it really is input. If it is not (another cog has it set to output), then Start() fails and returns the appropriate error code.
How do you distinguish between a pin driven from an external source and driven by another cog? Besides, you can only read your own dira ...
I guess it's more of a problem with the Propeller hardware, then. It would have been handy to be able to read what the DIRA register was set at (after all the OR'ing) so that these cases could be accounted for.
Still, I think it would be a good idea to set the RX pin to input, even if that covers only 25% of the cogs.