Single Cog 1 Mbaud full-duplex serial
Delus
Posts: 79
I just finished a 1Mbaud UART object for a project I'm working on right now. I realize this isn't the fastest anyone has managed on the prop, but my quick search before starting didn't show any single cog 1 Mbaud+ objects that could receive and transmit simultaneously. This code uses 2x oversampling on the receiver thread using a counter to determine which sampling window to use on each start bit and takes 9.5 bit periods to complete a read to accommodate slight clock mismatches when receiving continuous data.
My application reads each byte in real time so there aren't any receive or transmit buffers implemented but there are some instructions left in the tx code should someone wish to implement buffer logic (all hub r/w occur in tx thread to maintain hub sync). And the Rx and Tx hub registers are simple enough that many Rx and Tx buffers could be implemented in a second cog.
I hope someone finds this useful, or if it has already been done better I'd love a link (though I'll kick myself for missing it).
Cheers,
David
Edit: I should probably note that the spin tx and rx methods are there for testing life-signs and may not be able to keep up with a full speed connection.
Code has been posted on the object exchange:1Mbaud FullDuplexSerial
My application reads each byte in real time so there aren't any receive or transmit buffers implemented but there are some instructions left in the tx code should someone wish to implement buffer logic (all hub r/w occur in tx thread to maintain hub sync). And the Rx and Tx hub registers are simple enough that many Rx and Tx buffers could be implemented in a second cog.
I hope someone finds this useful, or if it has already been done better I'd love a link (though I'll kick myself for missing it).
Cheers,
David
Edit: I should probably note that the spin tx and rx methods are there for testing life-signs and may not be able to keep up with a full speed connection.
Code has been posted on the object exchange:1Mbaud FullDuplexSerial
con RX_RDY = $40_00_00_00 TX_RTS = $80_00_00_00 var long RxReg long TxReg pub start(rxpin, txpin) pinRx := (1 << rxpin) pinTx := (1 << txpin) pRxReg := @RxReg pTxReg := @TxReg rxCtrPin := rxpin tx_ror := (31 - txpin) + 1 if (tx_ror == 32) tx_ror := 0 TxReg := 0 cognew(@asm_start, 0) 'wait for cog startup repeat while ((TxReg & TX_RTS) == 0) pub tx(value) repeat while ((TxReg & TX_RTS) == 0) TxReg := value & $ff pub rx | value repeat while ((RxReg & RX_RDY) == 0) value := RxReg & $ff RxReg := 0 return value pub RxRegAddr return @RxReg pub TxRegAddr return @TxReg dat asm_start org 'flag cog started wrlong tx_flagRTS, pTxReg 'configure tx mov dira, pinTx mov outa, high 'tx starts on bit 7 (currently set high by outa = high) this is where new data is read mov txcode, #tx_bit_7 mov rxcode, #rx_wait_1 'configure rx andn dira, pinRx mov frqa, #1 mov ctra, rxCtrMode movs ctra, rxCtrPin mov phsa, #0 mov frqa, #1 mov rx_dataReg, #0 'Hold Tx line high for one byte time before starting mov tmp, cnt add tmp, byteTime waitcnt tmp, byteTime 'jumb to execution code jmp txcode {------------------------------------------------------------------------------------- ********************************* RX Section *************************************** -------------------------------------------------------------------------------------} 'note if rxcode is not updated this becomes a loop rx_wait_1 'inst 1/5 and pinRx, ina nr, wz 'inst 2/5 mov phsa, #0 'inst 3/5 if_nz mov rxcode, #rx_wait_0 'inst 4/5 nop 'inst 5/5 jmp txcode 'note if rxcode is not updated this becomes a loop rx_wait_0 'inst 1/5 mov tmp, phsa 'inst 2/5 mov phsa, #0 'inst 3/5 'test how long it's been sinse the last falling edge ' and decide wheter to use this sampling interval or ' the next (2x over-sampling) cmp tmp, #22 wc, wz 'inst 4/5 if_ae mov rxcode, #rx_read_bit_start 'inst 5/5 jmp txcode rx_read_bit_start 'inst 1/5 mov rx_bitcount, #8 'inst 2/5 mov rx_bit, #1 'inst 3/5 mov rx_data, #0 'inst 4/5 nop 'inst 5/5 jmpret rxcode, txcode 'read rx bit rx_read_bit 'inst 1/5 and pinRx, ina nr, wz 'inst 2/5 muxnz rx_data, rx_bit 'inst 3/5 shl rx_bit, #1 'inst 4/5 sub rx_bitcount, #1 'inst 5/5 jmpret rxcode, txcode rx_mid_bit 'inst 1/5 cmp rx_bitcount, #0 wc, wz 'inst 2/5 if_e mov rxcode, #rx_check_frame 'inst 3/5 if_ne mov rxcode, #rx_read_bit 'inst 4/5 or rx_data, rx_flagRdy 'inst 5/5 jmp txcode rx_check_frame 'inst 1/5 and pinRx, ina nr, wz 'inst 2/5 'reset low time counter mov phsa, #0 'inst 3/5 'successfull read, write data to reg for write to hub (handled by tx code) if_nz mov rx_dataReg, rx_data 'inst 4/5 mov rxcode, #rx_wait_0 'inst 5/5 jmp txcode {------------------------------------------------------------------------------------- ********************************* TX Section *************************************** -------------------------------------------------------------------------------------} tx_bit_start 'inst 1/5 'hub sync nop 'inst 2/5 nop 'inst 3/5 mov outa, tx_dataReg 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_idle_start 'inst 1/5 and rx_dataReg, rx_flagRDY nr, wz 'inst 2/5 if_z mov txcode, #tx_bit_0_norm 'inst 3/5 'hub sync if_nz mov txcode, #tx_bit_0_rx2hub 'inst 4/5 nop 'inst 5/5 jmp rxcode tx_bit_0_norm 'inst 1/5 'hub sync mov txcode, #tx_idle_0 'inst 2/5 nop 'inst 3/5 ror outa, #1 'inst 4/5 nop 'inst 5/5 jmp rxcode tx_bit_0_rx2hub 'inst 1-2/5 'hub sync wrlong rx_dataReg, pRxReg 'inst 3/5 ror outa, #1 'inst 4/5 mov rx_dataReg, #0 'inst 5/5 jmpret txcode, rxcode tx_idle_0 'inst 1/5 nop 'inst 2/5 nop 'inst 3/5 'hub sync nop 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_bit_1 'inst 1/5 'hub sync nop 'inst 2/5 nop 'inst 3/5 ror outa, #1 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_idle_1 'inst 1/5 nop 'inst 2/5 nop 'inst 3/5 'hub sync nop 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_bit_2 'inst 1/5 'hub sync nop 'inst 2/5 nop 'inst 3/5 ror outa, #1 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_idle_2 'inst 1/5 nop 'inst 2/5 nop 'inst 3/5 'hub sync nop 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_bit_3 'inst 1/5 'hub sync nop 'inst 2/5 nop 'inst 3/5 ror outa, #1 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_idle_3 'inst 1/5 nop 'inst 2/5 nop 'inst 3/5 'hub sync nop 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_bit_4 'inst 1/5 'hub sync nop 'inst 2/5 nop 'inst 3/5 ror outa, #1 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_idle_4 'inst 1/5 and rx_dataReg, rx_flagRDY nr, wz 'inst 2/5 if_z mov txcode, #tx_bit_5_norm 'inst 3/5 'hub sync if_nz mov txcode, #tx_bit_5_rx2hub 'inst 4/5 nop 'inst 5/5 jmp rxcode tx_bit_5_norm 'inst 1/5 'hub sync mov txcode, #tx_idle_5 'inst 2/5 nop 'inst 3/5 ror outa, #1 'inst 4/5 nop 'inst 5/5 jmp rxcode tx_bit_5_rx2hub 'inst 1-2/5 'hub sync wrlong rx_dataReg, pRxReg 'inst 3/5 ror outa, #1 'inst 4/5 mov rx_dataReg, #0 'inst 5/5 jmpret txcode, rxcode tx_idle_5 'inst 1/5 nop 'inst 2/5 nop 'inst 3/5 'hub sync nop 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_bit_6 'inst 1/5 'hub sync nop 'inst 2/5 nop 'inst 3/5 ror outa, #1 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_idle_6 'inst 1/5 nop 'inst 2/5 nop 'inst 3/5 'hub sync nop 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_bit_7 'inst 1-2/5 'hub sync rdlong tx_dataReg, pTxReg 'inst 3/5 ror outa, #1 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_idle_7 'inst 1/5 and tx_dataReg, tx_flagRTS nr, wz 'inst 2/5 if_nz mov txcode, #tx_bit_stop_idle 'inst 3/5 'hub sync if_z mov txcode, #tx_bit_stop_tx 'inst 4/5 nop 'inst 5/5 jmp rxcode tx_bit_stop_idle 'inst 1/5 'hub sync mov tx_dataReg, high 'inst 2/5 nop 'inst 3/5 mov outa, high 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_idle_stop_idle 'inst 1/5 mov txcode, #tx_bit_start 'inst 2/5 nop 'inst 3/5 'hub sync nop 'inst 4/5 nop 'inst 5/5 jmp rxcode tx_bit_stop_tx 'inst 1/5 'hub sync 'set stop bit of tx data shl tx_dataReg, #1 'inst 2/5 'shift data to proper position in outs ror tx_dataReg, tx_ror 'inst 3/5 mov outa, high 'inst 4/5 nop 'inst 5/5 jmpret txcode, rxcode tx_idle_stop_tx 'inst 1/5 mov txcode, #tx_bit_start 'inst 2/5 nop 'inst 3-4/5 'hub sync 'flag data read and ready for next data in hub wrlong tx_flagRTS, pTxReg 'inst 5/5 jmp rxcode pinRx long 0 pRxReg long 0 rx_flagRDY long RX_RDY rxCtrPin long 0 rxCtrMode long %0_10101_000_00000000_000000_000_000000 'Inc on pina == 0 pinTx long 0 pTxReg long 0 tx_flagRTS long TX_RTS tx_ror long 0 high long $ff_ff_ff_ff byteTime long 800 rxcode res 1 rx_data res 1 rx_dataReg res 1 rx_bitcount res 1 rx_bit res 1 txcode res 1 tx_dataReg res 1 tmp res 1 fit
Comments
While a substantially different implementation, did you see the one by Lonesock (Jonathan Drummer)?
fast-full-duplex-serial-1-cog-a-k-a-ffds1
Like yours, it makes use of the cog counters, one to time the tx bits and the other to time the rx sampling. The speed claim is "only" 460800 baud, but it does also claim buffer management and the usual choice of baud rates etc.
Edit: Forgot to remove some debug lines (now removed)
I find that the jmpret methods that normally are used for standard full-duplex introduce substantial jitter even at low speeds of 115k but how does your timing and jitter look with full-duplex?
As for jitter I was very careful to use either 5-instruction blocks between every context switch or 4-instruction blocks with one aligned hubop. This gives me jitter free transmit however prevents the receive function from aligning properly with the incoming bit stream meaning samples are taken +/- 250 ns from the center of each bit. Still no jitter between samples but this definitely reduces it's immunity to jitter or other noise sources coming from what ever it's connected to.
I went the lazy route and wrote Full-Duplex Parallel for the FTDI FT245. It takes a lot more pins.
Good job Delus!
haha, yes that would have been MUCH easier (and I could have used pre-existing code) but I need two 1 Mbaud ports and don't have 4 cogs to spare. The code was actually much easier to write then I expected, I started by calculating hub timings and figuring out how many tx sections it would take to transmit a single byte (start, 8-bits, stop). After that I made the framework with half-bit tx dummy sections(with hub timings commented) and started replacing nops with actual code. The rx section took a little bit of mental gymnastics but as long as I kept each section to 5 instructions including the context switch and didn't use any hub-ops there I knew I'd be fine. (It also helps that each can be debugged independently by substituting dummy sections for the alternate task)
ke4pjw - Thanks, I haven't had any issues receiving data yet but I did realize that one of my timing calculations was off yesterday reducing my error margins a bit(rx_wait_0 should compare phsa(tmp) to 16 instead of 22). I have noticed that if I start the PC connection in the middle of a continuous data stream I get much more garbled data (uart frame miss alignment) than I expected but this seems to be an ftdi/pc issue as it goes away after the first break in data transmission allows the ftdi to get a lock on frame alignment.
That looks like it might be a slightly more sane approach. My biggest problem is that I don't have control over when the ARM transmits adc data or when there will be a break to send it a command, and the pc app expects the same data flow from the prop as it does when connected directly to the ARM. Admittedly my high speed data requirements are semi self-inflicted as each adc reading gets its own 32-bit timestamp and 16-bit command id which transmits 4x the data of the original 16-bit signal and I do have the power to change this on all 3 systems. This + the 10 bit encoding of uart + 4 adc channels + 2.44 ksps gives me a minimum baud rate of 780.8 kbuad for just the adc data.