Can anybody beat 1.5MB/S UART Data Flow?

Steel · 2007-12-28 23:12

So I just built an ASM Routine that sends UART Data to the FTDI chip at 1.5MB/S.·

to verify that it works, I created a VB.Net application and put in a serial port at 1.5MB/S.

...Low and behold, the PC reads the signal, and the Propeller reads the signal from the PC.

I guess I am totally excited about it and have nobody else to share it with.

On the Propeller, the ASM routine reads 1 byte from shared memory (rdbyte) and sends that to the PC.·

Can anybody optimize their code to get faster?· I have a 5MHZ crystal on PLL16X.· I guess if I get a 12MHz Crystal and put it on PLL8X I could probably get 1.6 or 1.7MB/s.

I guess I am just sharing a little success story· [noparse]:D[/noparse]

Shaun

Paul Baker · 2007-12-28 23:17

If I remember correctly the FTDI chips are capable of speeds upto 3 Mbps, the Propeller is capable of achieving this throughput (look at Beau's high speed prop2prop serial communications), but I don't think anyone has attempted to push a full 3 Mbps through the link thus far.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Mike Green · 2007-12-28 23:25

Shaun,
You may have problems with a 12MHz crystal and the PLL. The datasheet says that the PLL is designed for use with a 4 to 8 MHz clock. You would be better off using a 6 MHz crystal and the PLL16X setting. Keep in mind that the Propeller appears to work fine at a 96 MHz system clock, but isn't guaranteed to work reliably at that speed over the full temperature and supply voltage range.

Not to minimize what you've been able to do, even with the 80 MHz system clock, you can probably do better than 1.5MB/S in assembly language. At 50ns per instruction, you've got about 14 instructions per bit. You could probably use the video shift register to get at least 10 times that speed.

deSilva · 2007-12-29 00:39

The bottle neck is usually the receiving....
Transmitting works great with the PHSA register

  SHL  PHSA,#1
  DJNZ nmbOfbits, #$-1

10 Mhz, but not as throughput

It has been observed many times that the Propeller runs more stable at 12Mhz/PLL8X than at 6MHz/PLL16X. Saphiera had some explanations for that...

Hanno · 2007-12-29 00:48

Great job Shaun.
Since you asked if someone has beat 1.5mbps, I'll chime in...

ViewPort has been running at 2mbps- "full duplex" since day 1- back in May. Check it out at http://mydancebot.com

2mbps allows ViewPort to stream video captured by an ADC at ~5 frames/second, update the logic analyzer/oscilloscope at screen refresh rates, or continuously track the Propeller's memory.

"Full Duplex" allows ViewPort to change variables running inside the Propeller.

The FTDI chip does NOT allow arbitrary data rates. It supports 3, 2, 1.5 and slower baud rates. Therefore, 3 mbps is the only faster datarate- we've done it, but it's slightly faster speed is not worth the much more difficult programming required when dealing with full duplex and different data sources.
Hanno

mirror · 2007-12-29 01:52

I've written a Full Duplex Serial variant to support the FTDI chip up to 3M Baud. Theoritically it should work, but it's *very* pre-release and has only been tested to 921600 (the highest standard baudrate available in HyperTerminal).
Transmit is easy. Catching random received characters here and there is quite hard, but bursts of incoming characters are relatively easy once the receiver is sync'ed. I've thought about using a synchronising byte to start each packet to allow the receiver to lock-on to an incoming burst of characters. I've also thought about having the PC set for 2 stop bits, but then use only 1 in the receiver and use the other one as a synchronising bit.

Higher baud rates are great (and relatively easy) for protocols that do a large amount of transmit (eg ViewPort). Using two cogs would make the receiver easy, but at the expense of a cog. I'm sorry, but I just can't afford to give away my cogs that freely.

I'm not really prepared to release the code just yet.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

deSilva · 2007-12-29 03:01

Synchronizing to a naive async burst is not possible. A receiver can only synchronize to an idle condition of > 1 character = generally 10 bitcells. What you have to do is just send thus separated blocks; the penalty is <1%, of course depending on the "block" size.

mirror · 2007-12-29 03:20

deSilva,
At higher baud rates the biggest problem is catching the incoming start bit. You don't know when it (the bit) started, so it's hard to then sample the other bits in the middle of the bit cells. The best solution is to start with a synchronising byte like $1 or $3 (assuming LSB's sent first) which results in an extra wide start bit. The first character is discarded and the receiver can be tuned to receive the remaining characters without problem as long as you're able to specify a maximum inter character gap (say 10 bits) during a burst of incoming characters. This assumes that the link is an auto-direction half-duplex link, but as it turns out this is how very many communications links are used (ie a master/slave relationship). As mentioned in my previous post, the transmitter can be set up to send 2 stop bits which also aids with synchronisation.

The subtlenesses and limitations of trying to get high speed Tx and Rx working in the same cog are not fully realised until you've done a fair bit of the work.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

deSilva · 2007-12-29 03:42

No, Mirror, you are barking at the wrong tree

See my previous posting. It is *not* possible to synchronize into a burst. Only as soon as you see a 10 bit idle gap, everything is o.k.

However there are clever codings that allow synchronizing into a longer burst, but that is just *not* naive

Another possibilty is to to identify correct parity situations (or utilizing other content related data, as in cryptography) under a shifting mask. This works with a quite good probabiity after some dozend bytes...

Post Edited (deSilva) : 12/29/2007 3:48:48 AM GMT

mirror · 2007-12-29 05:19

deSilva said...
No, Mirror, you are barking at the wrong tree See my previous posting. It is *not* possible to synchronize into a burst. Only as soon as you see a 10 bit idle gap, everything is o.k.

In my case, once I see more than a few bits of idle time I go back to checking for transmit. So in fact long inter-charcater gaps become a problem (that is, they become the cause of non-synchronisation). By using a synchronising character·you can use it to wake the receiver up (that's all it does). Once the receiver is·"awake" it can then receive the very next character (even if there is less than 10 bits of idle time).

In order to make this work you·need to send the characters as a burst from the transmitting end, as any gaps actually serve to break synchronisation.

The big problem with the "10 bit idle gap" theory is that it implies that you're waiting in a tight loop for an incoming bit - which is NOT the case if you're trying to handle a duplex link. The code that I have for my inner loop is as follows:

CheckForTx
                    rdlong  txfirst,txhead
                    cmp     txlast,txfirst   wz     'any chars to Tx?
    if_ne           call    #Transmit
    
CheckForRx
                    test    rxmask,ina   wc         'check for start bit
    if_c            jmp     #CheckForTx
 
RxStartBit          'Start receiving incoming character

This loop takes·23 to 38 clock cycles to execute. For a 2Mbps connection there are only 40 clock cycles per bit (5Mhz X PLL16), and to do half bit position sampling·you need to count only 20 clock cycles.

As far as I can tell, the ONLY way to do reliable reception at high baud rates is to have a synchronisation character, which causes the tranmitter to be abandoned while there are incomiong characters.

Forget about the model presented by FullDuplexSerial - it is simply too slow to handle anything much above the 230Kbps that's been claimed for it. (Which I'm sure is quite possible with FDSerial).

The best that I believe you can achieve at high baud rates is an auto-direction sensing half-duplex link - if using only 1 cog. Full duplex would require 2 cogs.

·

Ale · 2007-12-29 07:46

Actually I can beat it, but not for serial. I have a board with 2 propellers (part of my LMM experiment) they communicate through a 8 bit bus, so I achieve 5 MB/s, yes megabytes. Of course sync and so on are very problematic if the code is not fine tuned correctly. Ah.. I forgot, it is bidirectional, just not at the same time

. It is an extension of Beau's idea, because mine somehow did not work :-(

deSilva · 2007-12-29 11:34

Mirror,
I most likely miss some of the basics of your architecture....
You have two physical lines, dedicated to transmitting and receiving, right?
You use a specific COG for watching the input line? If not, why not?
When you can confidently identify a "synchronization byte" - for whatever use - you are synchronized. A 10 bit idle IS a kind of synchronization byte....

You can only wait for a start bit when you are in idle before.
You are doing something which is called - I think - iso-chronious transmission (synchronious without separate clock signal). Note that there IS a clock signal interleaved in the signal stream: It's the stopbit/startbit interlude that is used for it

There is no need to do "half bit sampling" after you have identified the rising slope (error: 12 ns) of the start bit; as your bit cell is 500 ns @ 2MBPS. You have to resync at ech startbit again, of course...

Edit:

I read your last posting again, and it now seems to me that you have one physical line only? But that would be called "half-duplex" wouldn't it? This needs a handshake ("roger and over") with time-out detection. Or some collision detection will be fine as well. Lots of theory available since the early days of ethernet

Post Edited (deSilva) : 12/29/2007 12:02:48 PM GMT

mirror · 2007-12-30 02:03

deSilva: I most likely miss some of the basics of your architecture....
· You have two physical lines, dedicated to transmitting and receiving, right?
Yes that is correct.

deSilva: You use a specific COG for watching the input line? If not, why not?
COGs are a precious resource - you only get 8 of them. I need a high speed serial connection to a host system, but that host system is only occasionally attached. I·wish I had about 12 to 16 COGs. So, 1 COG is doing both Tx and Rx on two separate physical·lines.

deSilva: When you can confidently identify a "synchronization byte" - for whatever use - you are synchronized. A 10 bit idle IS a kind of synchronization byte....
A 10 bit idle IS a kind of synchronization, BUT that is not the challange. The challange is seeing the start bit when it arrives and·being cofident that you'll be sampling somewhere near the middle of the bit cell for all the·following bits. At 2MBPS you have 40 clocks, and I've already shown that my inner loop can take up to 38 clocks. At 3MBPS you only have 27 clocks, so the problem is a LOT worse.

deSilva: You can only wait for a start bit when you are in idle before.
That is true, but that doesn't mean you need to use the first character received - particularly if your confidence is low that the first character has been correctly decoded. This is the essence of how I arrived at the requirement for a synchronising character. A key feature of this first character is that it probably will not be able to be used, but once you're confident that you've seen·a synchronising character, then the·transmitter can be disabled, and the reception of further characters can be prioritised until a sufficiently large inter-character gap allows transmission of characters again.

deSilva: You are doing something which is called - I think - iso-chronious transmission (synchronious without separate clock signal). Note that there IS a clock signal interleaved in the signal stream: It's the stopbit/startbit interlude that is used for it

No I don't think so.

deSilva: There is no need to do "half bit sampling" after you have identified the rising slope (error: 12 ns) of the start bit;
This is your·primary misunderstanding·of what I'm doing. The rising slope HAS NOT BEEN accurately identified. This is the MAIN problem of using 1·COG to do both Tx and Rx.

deSilva: I read your last posting again, and it now seems to me that you have one physical line only? But that would be called "half-duplex" wouldn't it? This needs a handshake ("roger and over") with time-out detection.
You're much closer with these guesses. BUT, there are two physical lines. The connection is an auto-direction sensing half-duplex link (which·I've said·before in an earlier post). SO, there's no handshake, but there is a sort of "roger and over" procedure going on - except that it happens automatically on the basis of characters received.

So in summary, In my system a host (as master) occasionaly connects to my system (as slave) and then requests a relatively large amount of data (could be tens of megabytes). I want a high speed serial connection, but I don't need·true full-duplex (it's amazing how often you DON'T need true full-duplex). Also, I only have 1 COG to devote to this connection. I could trun it into a true half-duplex link by letting the higher order protocol determine the direction of the comms, but what happens when you get an incomplete packet or some other error. For that reason it is better to make the "direciton" auto-sensing.

If you would like a challange - have a look at the code for ViewPort. Try to understand it and then use it for something other than ViewPort. I had a look and started again from scratch on my own comms driver. High-speed serial comms on 1 COG is very challenging.

deSilva · 2007-12-30 19:17

Mirror,
Your case looks like a good example of the limits of the propeller. What you need can be done by any $1 device called (buffered) UART... The propeller substitutes such fancy specific silicon by generalistic COGs, so you need a COG for it. Period.

When you have no spare COG - I said this in another thread some months ago - you lost! All the nice features of other microcontrollers (catch&compare units, pin interrupts,..) are missing. Software polling is an acceptable alternative but not in your time scale...

I repeat what I have understood what you do.
(1) A COG sends data to a master, as fast as it can.
(2) It does this independant of whether they are recived correctly
(3) There is interleaved code checking for activity on the RCV line; when detected the transmission is interrupted for the priority task of receiving the message.
(4) End of message is defined by an idle condition on the RCV line for a certain (short) time

(5) Due to timing constraints you cannot guarantee that the first bitcell ("start bit") can always be caught, so you migth misjuge the first "SPACE-bit" of the sent character as the start bit, loosing sync.
(6) So you defined a fomat for the first byte ("SYNC") from the master's message to allow you to identify the stop bit of the SYNC, beeing in sync afterwards...

Questions:
(a) So your timing constraints allow for (6), i.e. you are fast enough to catch enough of the SYNC byte?

(b) Can you abandon the ongoing transmission instanteniously? It would become tricky if you cannot..

(c) After you have identified the SYNC character's stop bit, and transmission is abandoned, you can look for the falling edge of the startbit, simplifying the receiving ... As I said above: the stop/startbit combinations can be considered as in interleaved clock signal and you should take some advantage of them...

Post Edited (deSilva) : 12/30/2007 7:22:25 PM GMT

mirror · 2007-12-30 21:58

deSilva,
It seems you now understand what I'm doing. But, not *any* $1 buffered UART can do 3MBPS. In actual fact those parts·are probably harder to come by than expected. Also·you'd need to have it as an SPI part to make it worth connecting to the Propeller. There are limits to the Propeller, the things you run out of are:
1) Pins - which means external devices have to be SPI at worst. I2C is better - but slower.
2) Cogs - which means you need to put more than one function in a cog.
3) Memory Space
I haven't found processing power to be a problem, except when trying to use SPIN to solve a problem. My practical experience of a SPIN to PASM speedup was about 30 times. This was for an algorithm with 100 odd lines of spin in 6 different subroutines. It translated quite nicely to PASM (using about 2/3 of the cog).

I've continued in this discussion, because those watching as interested spectators can get some more·understanding of the limitations and capabilities of the Propeller.

As to your questions: Most of them are answered by the system architecture. The Propeller is a slave device, and it talks to a master (PC running Windows). Most of the time the Propeller is idle on this COG, except that occasionally·it sends out diagnostic messages. So most of the time, there is no conflict and everthing works nicely, BUT I can't have it as a "protocol controlled" half-duplex link, as then my diagnostics messages would be blocked.
At the moment the comms driver does not abandon Tx mid byte, but it is something that could be considered.

I guess the biggest thing to empahasize is that high speed comms (on 1 COG) is complex, but if you can get both sides of the link to "play nice with each other", then it is quite within the capabilities·of the Propeller.

BTW. I wrote most of my driver before the 8MBps Propeller comms thread started. Interesting to note that it uses 2 COGS (1 for Tx and 1 for Rx). However, it could probably be rewritten (into 1 COG) to allow·protocol based·direction control. Then both sides start in Rx mode, until one of the COGs is ready to send some data - at which point it goes to Tx.

The challange is not the speed as much as getting those COGs working to their optimum performance.
·

Mike G · 2007-12-31 19:50

I'd like to peek at your code Steel/mirror, if you don't mind. I started working on 1Mpbs serial comm object today when I stumbled on this post. I'm new to the Propeller (Spin/Assembly) but I do have some programming experience. For my project, I decided to split the Tx/Rx lines between two cogs. But I would love to see how it was done with a single cog.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Mike

deSilva · 2007-12-31 19:57

@Mike: It was done, just as we described it here... Note that Mirror has not the opportunity to use a real co-operative higher protocol. The master has precedence, independant of the slave's activity.

It you can, stay with two COGs: thats the "propeller way". Doing it in one COG needs detailed considerations on all protocol levels; there is no silver bullet, and Mirrors approach is tricky

Post Edited (deSilva) : 12/31/2007 10:55:45 PM GMT

Mike G · 2007-12-31 20:26

Thanks deSilva.

Can any one point me to where Beau's high speed prop2prop serial communications is located? This was mentioned earlier by Paul Baker.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Mike

OzStamp · 2007-12-31 22:51

Hi Mike G

http://forums.parallax.com/forums/default.aspx?f=25&m=233212&g=234220#m234220

Ron Melbourne Australia

Can anybody beat 1.5MB/S UART Data Flow?

Comments