Serial I/O Throughput -- A Comparison.
greybeard
Posts: 65
I have a project that uses serial communication to pass data back and forth from a Windows Application and a propeller based board. Currently it uses the JCCogSerial module and runs well at 230400 baud. I recently wanted to increase the throughput and started examining the use of SimpleIDE to compare throughput performance. I have always held the opinion that C code would increase the throughput performance.
I build a very simple C project that echoes a string, i.e. receives a character, add the character to a string, and retransmit that string back to the host when a Carriage return/line feed is received. I used both a C3 unit and a PropStick for the target systems and the Parallax Serial Terminal that comes with the Propeller Tool because it handles baud rates in multiples of 115200 baud. I first used the SimpleIDE build-in FDSERIA driver.. The results were disappointing. the echoed string has many unprintable characters at 230400 baud. Additional testing indicates that it target can send strings reliable a bauds much higher indicating that the problem with the echo is related to the receipt of the code by the target.
I next used Spin2Cpp to convert JDCOGSERIAL and JDCOGSERIALDEMO and used these instead of FDSERIAL. This code started dropping bytes and introducing spurious characters in the echo and 460800 baud. Additional testing indicated the target could repeatedly send a fixed string without with out dropping characters. Again suggesting that the received part of the code is the culprit.
Finally, I used the JDCOGSERIAL spin code as the driver for a spin program to serve as the echo engine. Surprisingly, It would echo strings without dropping and introducing spurious characters up to 921600 baud.
Why did I do this??? It was my intent to convert the Spin Code, in the propeller based board, to C to increase performance. These results suggest that performance may actually be degraded.
I build a very simple C project that echoes a string, i.e. receives a character, add the character to a string, and retransmit that string back to the host when a Carriage return/line feed is received. I used both a C3 unit and a PropStick for the target systems and the Parallax Serial Terminal that comes with the Propeller Tool because it handles baud rates in multiples of 115200 baud. I first used the SimpleIDE build-in FDSERIA driver.. The results were disappointing. the echoed string has many unprintable characters at 230400 baud. Additional testing indicates that it target can send strings reliable a bauds much higher indicating that the problem with the echo is related to the receipt of the code by the target.
I next used Spin2Cpp to convert JDCOGSERIAL and JDCOGSERIALDEMO and used these instead of FDSERIAL. This code started dropping bytes and introducing spurious characters in the echo and 460800 baud. Additional testing indicated the target could repeatedly send a fixed string without with out dropping characters. Again suggesting that the received part of the code is the culprit.
Finally, I used the JDCOGSERIAL spin code as the driver for a spin program to serve as the echo engine. Surprisingly, It would echo strings without dropping and introducing spurious characters up to 921600 baud.
Why did I do this??? It was my intent to convert the Spin Code, in the propeller based board, to C to increase performance. These results suggest that performance may actually be degraded.
Comments
So far your tests say nothing about the relative performance of C and Spin. Or at least very little. You are using different drivers in each case.
I imagine all the drivers you mention use PASM for the actual bit banging of the serial protocol. That PASM should be the same no matter if it is used from C or Spin. I also imagine the PASM part sets the limit on speed therefore things should perform the same no matter if used from C or Spin.
I'm going to ignore the Spin2Cpp converted code as I have no idea what kind of code that actually produces and it's not something I would want to use seriously.
Now. JDCogSerial and FDSERIAL are very different. From this thread http://forums.parallax.com/showthread.php/131646-Difference-between-FullDuplexSerial-and-JDCogSerial I read that:
"JDCogSerial keeps its serial buffer in the cog's memory while FullDuplexSerial keeps its buffers in the hub memory...it claims speeds > 750KBps."
Clearly keeping buffers in COG creates a much faster serial driver that FDSERIAL.
I guess what you need is a hand written C wrapper around the JDCogSerial in much the same way as the FDSerial.
We are told that JDCogSerial can achieve much higher baud rates than FDSerial. The guess is that this is because it keeps it's buffers in COG rather than HUB RAM.
That makes sense. At least when dealing with bursts of data that are small enough to fit in those COG buffers.
But, what happens when the incoming data is a continuous stream? That data in the COG buffers has to be written out from COG to HUB at some point for use by other COGs.
It seems to me that at that point using COG buffers no longer helps you, you are having to continuously write to HUB anyway. In fact I start to think COG buffers should be worse. Why, because you have more work to do, first keep the received data in a COG buffer, then read it from that buffer and write it to HUB.
I would have guessed that for continuous data streams HUB buffers would sustain higher baud rates.
Has anybody tested and compared this sort of thing?
At 80Mhz, the maximum half-duplex async I could see a cog do is 10Mbps, and I would not want to do that for critical systems (ie start bit, 8 data, stop bit) ... 10Mbps async should be OK at 100MHz.
5Mbps half duplex would be fine at 80Mhz with unrolled hand tuned code.
5Mbps full duplex may be (barely) possible at 80Mhz.
With good enough pasm code, 2.5Mbps full duplex should work (one port per cog) @ 80Mhz, and maybe 3Mbps @ 100Mhz.
Note, Beau has code for effective 14Mbps @ 80MHz, however that relies on 34 bt packets (start bit, 32 data bits, stop bit)
FYI, I think the latest FTDI usb/ser chips could do 2.5Mbps
The claim seems to be that JDCogSerial can achieve much higher baud rates than FDSERIAL. Where I assume FDSERIAL is Chip's standard issue FullDuplexSerial. I have no idea what JDCogSerial is.
We are talking regular UART here not some weird 32 bit packet thing.
Now, I can believe that stashing bytes in a COG buffer is quicker than using a HUB based FIFO. For short bursts. But I find it hard to believe it works out for sustained data rates.
I have no idea about this JDCogSerial either.
Speculative reasons may be.
1. the FTID buffer fills up and doesn't have room for transmission.
2. the text window (screen) is too slow and looses characters.
3. a problem in the receive routine of both.
4. PASM code give higher priority to incoming data and starves out the transmission.
5. A problem with Spin2CPP (doubtful but still possible)
I'm going to let the pros handle this one. I have the data I need to for now. I'll revisit the problem later when I have the time and patience for it.
Thanks guys. Have fun with this one.
- All tests performed with XTAL @ 80 MHz
- Max speed [baud]:
- Send
- CMM: 1,428,550
- LMM: 1,428,550
- Receive
- CMM: 740,720
- LMM: 740,720
- Max transmit speed [average bitrate of PropWare::UART::puts() w/ 8N1 config]:
- CMM: 339,227
- LMM: 673,230
- TODO: Determine maximum baudrate that receive_array() and receive() can read data in 8N1 configuration with minimum stop-bits between each word
Do take note of the fact that "max speed [baud]" is not the same and "max transmit speed". There's a delay between each character as it reads in the next byte does its necessary prep work - so for high speeds like these you end up with many more than 1 stop bit in 8N1 configuration. So, taking into account those pauses, you can achieve a maximum effective baud of 673,230 using LMM or 339,000 baud with CMM.Also note, I haven't tested max receive speed - only max receive baud.
Lots more information and full docs can be found here.
The full duplex driver parallax provides has a lot of jitter on the TX line. If you want to get zero jitter you have to use a waitcnt approach. This involves a single task in a cog running at 4X the baud rate running the RX code 3/4ths the time and the TX code 1/4th the time. http://obex.parallax.com/object/246
The application uses small packets in somewhat periodic fashion. Since there is a lot of traffic, I wanted to increase transfer rates so the app would have more time to attend to other matters. I found I could transmit from the PROP to HOST at a very high rate. I could transmit for HOST to PROP at a very high rate. But the my main interest it to exchange date between HOST and PROP at a high speed. When I set up test for round trip transfer, I found the reliable baud rates when down appreciably for bot JCOG and FSERIAL.
When I set up a test for the prop to echo packets (50 bytes or so) immediately as they are received, I found the round trip throughput was reduced to about 292000 using the Windows Serial API and the FTID dll before I started loosing bytes. Note, that the app does not use any handshaking and that could also be a major limiting factor.
I take this to be upper limit for my application. Obtaining Higher baud rates will require major design changes in software and hardware.
Again. Thanks for the feedback
RX is handled in s seperate COG.
I have no numbers on a sustained data rate for TX alone,
and not for echo either.
But 3 Mbit/s TX is at least impressive, even for burst writes.
All tests performed with XTAL @ 80 MHz
I'll be updating the receive routines soon
Is the 2,680,144 bps the average sustained Tx speed - Indicates Byte times of 13.266 Bit times (=4.266 stop bits, on average)
FYI
Transmit is fairly easy as it is unbuffered and at the higher baud rates you can go over 4Mbaud but don't expect it to be a standard baud rate. It makes far more sense at high baud rates to not buffer as reading and writing to a buffer can take far longer than actually bit-banging the character. For sustained high-speed operation without additional stop bits between characters it would be necessary to run the transmit routine in it's own cog possibly in half-duplex fashion with the receive routine and use a single character buffer to minimize overhead. Of course I'm talking about running app code from hub and not from the same cog as the transmit.
For receive the maximum baud rate for buffered receive with only a single stop bit between characters (no delays) is close to 3Mbaud although 2M is a bit more of a standard baud rate.
The FT232R chip is limited to 3Mbaud so it wasn't possible to test this interface beyond that.
Small MCUs can add 2 stop bits reasonably easily, some can do 3 stop bits.
3.0625MBd is a 'standard value' for a EFM8BB1 for example, but it can also do 4.08333MBd and 2.45MBd, 2.04166MBd, 1.75MBd etc... (24.5M/2N)
In Prop terms, @ 80MHz 3.0625MBd is 26 SysClocks bit time. (0.47% Baud error)
That's an average over 58 characters. The extra stop bits is due to the read from hub ram and other overhead in for each word. That's where the minimum delay of ~1us comes in.
Small MCUs are all pretty much byte based UARTS (with the exception of Infineon XMC1000 series)
Some do have small FIFOs, and if more than 2 stop bits are needed, they all can run a timer-paced Send.
Because it has configurable stop bit width and parity, and all start, stop parity and data bits have to fit in a single register. A bump to 24 bit max would probably make sense though
That limits you to 29-30 bits, and the Prop has a natural node at 3 x 9 bits = 27 bits, with a couple of .flag bit options to use all 29-30b of data field.
If shifting out from a buffer of multiple 32 bit words in cog, you could figure on 5 instruction loop at 4 clocks per instruction, so that is 20 clocks per bit, which makes for 160 clocks for the first byte, plus another 20 clocks (to keep consistency) for the stop bit, that is 180 clocks for each byte, 180/8 = 22.5 clocks per bit, at 80MHz that is 80,000,000/22.5 = 3555555.556bps. So I would like to see a loop that can do over 4Mbps while maintaining 8-N-1 and allowing a multi word in cog buffer.
Do you really care if it maintains only one stop bit? If not, you should be asking about max throughput in bits per second, not baud. If your receiver relies on no more than 1 stop bit, it's definitely a hindrance
Sometimes you want 2 Stop bits for data-creep reasons, and most MCUs can manage 2.
Going above 2 however, is certainly not so easy.