Putty & Prop Plug -- what is the highest Baud rate feasible?
Hi all,
I'm messing about with a serial driver I'm writing and I'm trying to push the speed up to it's maximum Baud rate. I'm using a Prop Plug (circa 2009 ish) and Putty for my terminal software.
Now, Putty seems to be very happy writing out data at 2,000,000 baud, but push it any further than that and the actual rate measured suddenly falls back to what appears to be 115200 Baud.
I think that the Prop Plug is not to blame (because all the bytes do seem to appear, just at a much lower rate than expected).
Can someone just quickly check to see how fast they can run Putty's serial Baud rate?
I'm using 8 data bits, no parity, 1 stop bit, no flow control.
If it turns out to be a bug / limitation in Putty then I'll have to find a faster terminal program, which is fine I guess.
Comments
The Ft232 can run at 3mbaud, and I think putty is 921600
I know I run PST on the prop at 1_000_000 baud, but 2_000_000 it gets funky
Usually modern Terminal programs will pass to the driver any baud rate you type in, (eg I have a UART+Terminal here that accepts and delivers 7868852 as valid) and the vendor drivers will then decode what they consider legal.
Ideally a driver rounds correctly to the nearest granular baud value, but some jump to a default value, if given something they do not understand.
Way back, FTDI allowed a kludge of baud aliasing, so maybe you have bumped into that ?
http://ftdichip.com/Documents/AppNotes/AN232B-05_BaudRates.pdf
Now that is interesting. I'll have a look at this tomorrow. Thanks for the info
Not sure about the Prop plug, but the limits of Real-Term (or the UART on my PC) are at 2 MB. I have a PIC16F15324 running at 4 MB talking to another PIC without any issue, I imagine the Prop can at least do the same speed.
Yeah, the Prop is perfectly happy at speeds way above 2 MHz, it's just that an ATMega1284P's hardware USART maxes out at 2.5 MHz (from a 20 MHz xtal), and the next lowest Baud (on the AVR) that Putty+FTDI driver can cope with is 1 MHz (from the internal RC osc @ 8 MHz).
Hmm, that's got me thinking. I wonder what the theoretical maximum Baud for the P8X32A is (for both stable TX and RX). I'm sure it's at least 4MHz, without having to go above 80 MHz system clock.
Tried some naive PASM code just now. P8X32A is trivially able to transmit RS-232 style at 6.5 MHz (8 data bits, no parity, 1 stop bit) from a single Cog.
Receive will be a bit slower I suspect.
8.N.2 might be more practical, and a triggered RX might do good burst speeds with some rules ?
The XR21B1420 I'm testing, supports baud rates of 480M/N, so it can get to 0.2% of 6.5Mbd
@jac_goudsmit made an 8Mbps transmitter a few years back:
https://github.com/parallaxinc/propeller/tree/master/libraries/community/p1/All/Txx - Fast Serial Transmitter for PASM and Spin
I haven't tried it yet myself, but it certainly caught my eye when I saw '8 megabits per second'
Ha, thanks for that link to the new location, I couldn't even find it myself when I searched for it a while ago. I also recently made my own Github repo for this project at https://github.com/jacgoudsmit/TXX. I think the code is probably pretty much the same between the two locations at the time of this writing.
Yes, TXX is capable of using up to 8mbps as baud rate. However, the FTDI chip on the Prop Plug (and various other Parallax products like the FLiP) can only handly baud rates up to 3 mbps.
Furthermore, the maximum throughput of the FTDI is much lower than 3 mbps. I recently did some tests by using the core character transmitting code from my TXX module to transmit a continuous stream of "1234567890" at 3 mbps to see how much delay I had to insert between the characters in order to not overrun the receive buffer of the FTDI chip, and I got to a disappointingly low number of something like 77 kilobytes per second maximum throughput.
If you're not limited by 3rd party chips in your project, the throughput can be much faster. I think Beau Schwabe made a high-speed serial transmitter/receiver between Propellers a long time ago, and he could get the speed up to 16 megabits per second if I'm not mistaken. I think he basically had two Propellers that were clocked by the same crystal and that simplifies things a little bit because you don't have to worry about things such as synchronization and clock drift.
===Jac
CabbageComms (tm)
I need you guys to sanity check an idea I had at 3 o'clock this morning.
How about this as an idea for making a symmetrical 20 MHz RS-232 transceiver in one cog...
Since each Cog executes most normal instructions in 4 clock cycles, we can configure CounterA to sample the serial RX_PIN at the same rate: 20 MHz.
Theory of operation (for the RX part)...
The problem is I have no way to test this - I have no means to generate 20 MHz RS-232 traffic
Anyway, here's the TX method (very similar but uses counter mode %00100 (NCO single ended)...
Would this work? My addled brain seems to think it might, but I'm not certain and I can't prove it (that's why I haven't written any code properly).
What do you guys think?
(Runs out to buy an oscilloscope, sig-gen and high speed logic analyser )
EDIT: I forgot the start and stop bits on the TX side, but I think that can be added without affecting the Baud rate by appending them to each side of the byte being sent before enabling the counter.
I'm being dense... of course I can test it. I can just use PLL_X1 instead of X16, if it works it'll simply run at 1/16th the speed and should scale back up to 20MHz perfectly.
I'll write some code..............
EDIT: This is actually looking promising
EDIT #2: I've got TX working at 1.25 MHz using PLL_1X, so running the same code with PLL_16X should give 20 MHz.
You can test RX with simple square waves, at least to some level.
RX also needs to consider the next-byte arrival time, most UARTS only give you up to 2 Stop bits, but you can force parity to sneak a 3rd stop bit.
The vast majority of PC UARTS go up to 12MBd, even HS-USB ones. ( A niche Exar part claims 15MBd)
20MHz would be useful for P1 to P1 custom links, in which case more than 8 bits may be useful.
Well, I've hit a wall with this. The TX side works well (because it's entirely deterministic within the timing domain of the sender).
However the RX side is causing me great trouble. It does detect the start bit reliably, but the subsequent 8 samples are basically jumbled garbage.
I think part of the problem is that WAITPEQ takes 6 or more cycles before it resumes normal execution. It is not clear how many clock cycles have elapsed since the falling edge was detected. What a pain.
I could make this work if I simply abandon the RS-232 signalling standard and use perhaps 2 start bits, but then it would lose all the benefits of being compliant with normal terminal software. sigh
Fast TX could still have uses for SPI connections, eg like HC595 strings ? - This might work for 1,2,3,4 HC595 in series ?
The tricky part may be stopping the SPI clock at the right time - maybe correctly timed STCP signal is ok ?
On a HC595 string, there is a bit of tolerance on start clock phasing, as only the last N*8 bits are latched.
ie if above becomes
ROR PHSA, #1 // clocks in middle of this bit
ROR PHSA, #1
ROR PHSA, #1 //the last bit (MSB of the original byte)
SET STCP // latch in the previous N*8
I'm not sure if the PHSA delay differs from the Pin-delay, but there are 2 sysclks until the next CLK edge, and you could sneak a 3rd sysclk tolerance by making the SPI 595 CLK edges 75% positioned.
Other parts of the problem are the CTRs are actually adders, and the mode is an ADD-enable, not what is actually wanted, which is a single centre-bit-sample.
That means every sysclk FRQA is added, so 4 sysclks per bit will add 4 times, which becomes a shift-left-2.
Notice too that any small skew in the ADD-Enable will bleed from adjacent bits, to give 75% from wanted bit and 25% from unwanted adjacent bit.
I think for async it is not practical, but if P1 generates the CLK, perhaps it might work on a SPI receive use case. Those are not as common as P1/P2 -> HC595
Addit: I think a general Async Serial RX can be made with a variant of jac_goudsmit code, that uses 2 lines for pin test and shift.
That would support SysCLK/N baud speeds for any N >= 14. (6+4+4), working on a bit-time basis, but the inter-byte time may dictate the top practical rate as you need to unload byte and prepare for next start bit, inside the stop bit times. Verify of stop bit = 1, ideally should be done too, but that could be skipped.
Maybe 5333333.N.8.2 or 5333333.M.8.2 are candidate practical values ? (PL2303GC and XR21B1420 can support this value)
4.8MBd is supported by PL2303GC, XR21B1420, EFM8UB3 & FT232H, & a fractional Baud 80MHz P1 can do ((80M/16.7)) = 4790419 for ~ 0.199% baud error
The XR21B1420 can exactly follow all SysCLK/N (N>=14) possible baud values, with 0% average baud error. The bit-skews look stable within any byte, so a P1 could be coded to match that pattern.
I like the details of the code
because those 10 bit waits do not need to be identical, you can selectively add 1 more sysclk to any of the 10, to give fractional baud capability. (tho compile time locked)
Which FTDI variant was that ?
UARTS are improving all the time, the CP2102N can sustain 3.428571MBd.N.8.1 one way, and the newest ones like PL2303GC or XR21B1420 can offer even better FS-USB performance
Many have fractional baud clocks, which is nice for P1/P2 as it means you have more sysclk freedom choice
The XR21B1420 can sustain over 10Mbd averaged, (one way) with handshake line connected.
Virtual Baud CLOCKS - for values below 12M (4M on CP2102N) possible baud is VirtualBaudCLK/N
Part VirtualBaudCLK
CP2102N 24MHz
PL2303GC 96MHz
XR21B1420 480MHz
Here is an example of fractional baud clock support in XR21B1420 - if you look carefully, not all bits are exactly the same duration, but they do appear consistent within the 10b frame.
Another way to effectively drive a 595 is to share all of the clocks and all of the mode lines and have a separate Data line for each 595 and just load all 595's at once rather than daisy chaining them all together. This goes for any of the shift in/ shift out registers ... i.e. HC165, etc.
Well... not compile time locked; you could wait for 10 different values that you calculate at runtime somehow when you initialize the cog...
I measured with the FLiP which apparently has an FT231XS. I think the all the FTDI chips have the same maximum baud rate of 3mbps but I suppose it's possible that different variations perform better when it comes to actual bandwidth.
Sustained throughput? In other words, uninterrupted traffic of 300,000 or 1,000,000 characters per second, using isosynchronous transfers? This would be interesting for a project I'm working on (S/PDIF decoder for Propeller 1). I'll have to look into that...
===Jac
jac_goudsmit said:
yes, I guess you could self-modify the code if you were motivated enough.
The main use I see is to define for a given crystal value, so you get a useful baud from a standard xtal. Xtals are less likely to change at runtime
Here are my test notes - the FT232H sustains no-gaps tx here. This is a simplex test.
_FT2232H transmit c8 -> CP2102N OK at 1MBd, 2Mbd, 2.181818MBd, 2.4MBd, 2.666666MBd 3MBd, 3.428571MBd OK, Drops Chars at 4MBd 8,N,1, & 4M,8,N,2, but is looking OK at 4M 8,M,2
FT2232H transmit c8 -> FT232R OK at 1MBd, 2MBd, 3MBd, No Support for 4MBd
FT2232H transmit c8 -> FT231X OK at 1MBd, 2MBd, 3MBd, No Support for 4MBd
_
Your disappointingly low number of something like 77 kilobytes per second maximum throughput. must have been some other issue ? Was that duplex ?
Certainly one way traffic of 300k is easy, & close to 1M is possible over FS-USB. That's part of what impresses me on these parts.
pasted from my test notes
FT2232H transmit c8 -> EXAR XR21B1420 OK at 1MBd, 2MBd, 4MBd 100,000 chars continual, 8,N,1, 4.8MBd 8,M,2 OK, 6MBd 8,M,2 OK, 6MBd 8,N,1 OK, 8MBd 8,N,1 OK
and a bit more is possible with HW handshake
FT232H -> XR21B1420_12MBd_RX_3000000_RTS.png shows handshakes.
12MBd HWHS: 9764140 sustained speed, handshake needed.
8MBd : 8.00014 and no handshakes seen - cannot test other speeds as FT232H does not do between 8-12MBd,
_XR21B1420 (Driver 2.6.0.0 Dec 2019) -> FT232H 12MBd sustained average 10.447200 MBd _
_XR21B1420 Loopback tests - 8.N.1 - one indication of duplex capability (no handshake)
5333333 bd -> 2.666696*2 = 5.333392M duplex.
6000000 bd -> 3.000000*2 = 6.000000M duplex
6500000 bd -> 3.039*2 6.078
8000000 bd -> 3.039*2 6.078
1200000 bd -> 3.039*2 6.078 _
Sanity check of transported data bits over FS-USB 2*8*(3.040*2/10) = 9.728Mbits / second, inside 12Mbit link - overhead.
That in indicates a tad over 600k bytes / second both ways in loop-back duplex is around the ceiling, with more in simplex : ~1044kB/s to PC or ~976kB/s from PC
You were quite close, just missed a gated sample detail, see this code for 20MHz RX over SPI - for Async you still need the CLK pin for the RX-sample-pulse, and use MODE LOGIC A & B
https://forums.parallax.com/discussion/comment/1466234/#Comment_1466234
For a practical PC-Async system, a 96MHz sysclk, and /8 on the sample pulses would give a 12Mbd link.
@jmg,
Of course, you're right about the adding frqx into phsx once per clock. I had mistakenly thought I could use the pll_div field to slow that down to once every 4th clock to match the instruction speed.
Oh well, it was fun getting the tx working though. Thanks for the help everyone
Well I'm not beaten yet. More ideas...
I have come to peace with the fact that the P1 cannot react fast enough to a waitpeq to capture the first data bit transferred when the Baud rate is equal to (CLK / 4). Can't be done.
Yet another (literally) half-baked idea for an RX mechanism...
(this is not working code, but rather a conceptual description of my new crackpot idea)
Consider that PHSA and PHSB will run alongside in parallel until a byte arrives on the serial RX pin.
PHSB will be affected by the passing of these modulated bits, but PHSA will not.
PHSA could act like a "control" sample in a scientific test.
PHSB would be the actual "test subject" in that test.
This gives us a difference that perhaps we can analyse to figure out the actual bit pattern that came in without us having to actively clock the bits in manually.
Can this be made to work do you think?
All opinions invited.
I think the problem here is that the huge time taken to extract a single byte of info, would swamp any gains in the faster RX
The code I linked to above can (self) RX at 20MHz, so why not do a variant of that ? (tho is does need a spare pin)
In a practical ASYNC system, 20MHz is not of great use (no PC-UARTS exist), and the interbyte delays are critical.
I think 12Mbd, (possible with 96MHz sysclk and a /8 rate) or 10MBd (80MHz sysclk and /8) are possible fixed P1 upper targets for async, with maybe 8MBd(9.6MBd at 96Mhz) tops, on a WAIT-based Async block.
Even here, there are going to need to be design/system agreements between the P1 side and host side, to make use of this burst ability.
Some packet buffer size needs to be agreed and allocated, as random length timeout is hard to manage. HW handshake will also be needed to pause the HW until the SW is poised at the right place.
Stop bits also need to be agreed, as a burst RX needs time to decrement and write between bytes.
The good news is there are USB-UARTS that can manage these high speeds, so it is worthwhile checking into this.
Here is what 10MBd (fractional baud) looks like, on the one brand of UART that manages it
Addit: here is 12M.8.M.2 which is the longest STOP duration standard uarts can provide. This delivers exactly 1MBytes/s over the link, and I think a 96MHz P1 can manage this, COG-local at least.
An important gain of high burst speeds, is the link overhead is much reduced. eg users code can report 4 bytes and return in 4us.
I finally relented and turned to external hardware for the RX.
74HC595 to the rescue. A combination of fast clocking TX into the 595, followed up with a short pulse for the end-of-byte latch (shared with the receiving Prop so it knows when to read the 595).
Works beautifully as long as you remember to wire up the 8-bit parallel lines the right way around on the remote Prop's input port.
1.5 megabytes per second peak performance (80 MHz clk) in one direction, quite nice. Also the two Props don't need a common clock source as long as they are running at nominally the same frequency.
Perhaps the Props could differ by maybe 20% in clock speed and that difference would be absorbed quite well by the 595 because it is a clocked intermediary. A large difference in the two clocks would only introduce latency in two-way comms, but probably not actual read/write errors. Speculation is fun.
If you are willing to sacrifice 8 pins for parallel transfers then using two 74LVC573's back to back would work nicely, or even a 74xx543 or 74xx646 (not sure what series are available) which are back to back 8bit latches in a 24 pin package.
If you really want to add external chips, you can add something like SN74ACT2228, and with the 20MHz clocked speeds mentioned above, that's 32 bytes of 2.5MBytes/second transfer in both directions.
Or, you could use a compact FPGA like ICE40UL1K-SWG16ITR, to make a modern 2021 version of a hardware bridge
Amusingly, I have absolutely no application in mind for high speed serial / parallel transfer. I was just pushing my brain to find out where the extremes of the Propeller's counters were.
It is fun to play with this stuff and to try to optimise it. I'm pretty pleased with the result even though it's not symmetrical or even close to it.