My multi-threading scheduler would not be of benefit in this case. The 0.7% spec at 115200 represents 0.625 uSec, and the scheduler is designed to only time to 1 uSec. On top of that, competing demands for processor cycles between the Tx and Rx routines would cause some additional jitter.
But, it should be not too difficult to write a dedicated interleaved assembler routine to do this FDX job in a single cog to well within the specs posted. Be happy to "have a go" if anyone cares........
This is an interesting programming problem, so I attacked it today and came up with a precision one-cog full-duplex serial driver. I call it PBnJ_serial (Precision, Basic, no Jitter). And by basic, I mean just that: no handshaking, no inverted signals, no open-drain driving, no half-duplex-share-one-pin capability; just straight-up serial I/O as Emile Baudot intended.
The driver operates with ping-pong coroutines. Each routine gets 1/6 of a bit time to do its thing before having to return. Transmit and receive are interleaved (hence the "ping-pong"), so each is called every 1/3 bit time. Each slice is responsible for doing a waitcnt before returning to the coroutine dispatch, thus keeping everything in sync.
Now for the bad news: it doesn't work (yet) at 115,200 baud. (It tops out at about 107,000.) At 115,200, with an 80 MHz clock, a slice is 115 clocks (28 instructions) long. Each slice must potentially accommodate three hub accesses. The difficulty comes from the fact that clock sync is done near the beginning of receive slices (sync, sample the pin, do stuff, return) and near the end of transmit slices (do stuff, sync, write the pin, return). So a slice may not always receive a full time allocation. I think I can improve this by eliminating some subroutine calls and maybe loosening the timing precision on receive, where it's not really needed.
In the meantime, give it a try. And please let me know if you encounter any bugs. (I've not yet done a lot of testing...)
Phil, way to go! The test setup still in place, for PBnJ_serial at 100 kbaud:
Sending either the byte $FF or the byte $00 repeatedly. That is,
--$FF=start bit space only, expected width 10.0µS at 100000 baud
--$00, start + 8 space bits, expected width 90 µs at 100000 baud.
For PBnJ_serial, after 1000 samples:
--$ff: 9.975 µs mean, variation 9.957 to 9.981, sigma 0.009
--$00: 89.774 µs mean, variation 89.757 to 89.781, sigma 0.009.
--$55: all bits came in at 9.75 µs, same standard deviation as the start bit.
It is interesting that the timing comes out with a residue of 1/2 instruction cycle, even though 50ns divides 10µs exactly. That is about as good as you could do, I think, without dithering the bit durations to add 200ns to the total duration. The individual bits are uniform and solid.
At 57600 baud (bit 17.361µs, 9 bits of $00 156.25 µs:
--$ff: 17.325 µs mean, variation 17.317 to 17.341, sigma 0.008
--$00: 155.922 µs mean, variation 155.918 to 155.936, sigma 0.004.
--$55: uniform bit lengths of 17.325 µs.
You're right--It doesn't work at 108kbaud! I haven't tried it with simultaneous reception. What would be the best challenge? The enqueue routine looks kind of long, with two or even three hub accesses.
Thanks for testing! Three hub accesses are necessary, I'm afraid: read the pointer you don't own, write the pointer you do own, and read or write the data. 'No way around that. But that's not to say they all have to be done in the same time slot -- at least not for xmt, where the dequeue pointer could be written after the start bit is started. I'll see if I can tighten things up a bit...
I've melded the enqueue and dequeue subroutines into the main code and broken each into two sections. The object seems to work fine at 115,200 baud now. But please test before committing to it.
Thanks to Stephen for inspiring a fun, challenging project, and thanks to Tracy for the thorough timing analysis on the first version!
BTW, the transmit side sends 11/3 stop bits. I thought it might help with receivers that are a tad slow to maintain sync through an entire stream of characters without skipping characters or ending up with framing errors.
That's Peanut Butter 'n Jelly, isn't it! Precision, Basic no Jitter indeed. Cheers to PhiPi and the Prop. I'll run it through the timing tests when I have a chance.
Sending either the byte $FF or the byte $00 repeatedly, expected:
--$FF=start bit space only, expected width 8.68µS at 115200 baud
--$00, start + 8 space bits, expected width 78.125 µs at 115200 baud.
For PBnJ_serial, 115200 baud after 1000 samples:
--$ff: 8.700 µs mean, variation 8.697 to 8.701, sigma 0.000 <zero standard deviation!
--$00: 78.298 µs mean, variation 78.282 to 78.302, sigma 0.005.
--$55: all bits came in at 8.7 µs, sigma (standard deviation) 0.000 or 0.001.
That kind of deviation is hard to measure for even a 1GHz 'scope.
It worked for me up to 175 kbaud (176 actually, 177 not).
Sending either the byte $FF or the byte $00 repeatedly, expected at 175k
--$FF=start bit space only, expected width 5.714 µS at 175000 baud
--$00, start + 8 space bits, expected width 51.429 µs at 175000 baud.
For PBnJ_serial, after 1000 samples:
--$ff: 5.700 µs mean, variation 5.699,to 5.702 sigma 0.000 <!
--$00: 51.299 µs mean, variation 51.285 to 51.302 , sigma 0.003.
--$55: all bits came in at 8.7 µs, same standard deviation as the start bit.
Thanks for the testing, Tracy! I'm a little surprised by the 175 Kbaud figure. I'd be hesitant to say that it would run that fast if the receiver was also getting pounded. I did, however, test at 115.2 Kbaud with a constant stream of data coming in from another instance of the object in a different cog, and it seemed to hold up okay.
'Been thinking about your dithering comment. It might actually be pretty easy to pull off. All I would need are three different values for sixthbit. The closest one would be used interior to the transmit and receive routines. The two corrective ones would appear in the x_full section, since that alone determines the width of each transmitted bit. It would get four delays per bit at the normal rate and two at the corrective rates. The receiver might experience some 12.5ns sampling jitter, as a consequence, but that hardly matters.
This is an interesting programming problem, so I attacked it today and came up with a precision one-cog full-duplex serial driver. I call it PBnJ_serial (Precision, Basic, no Jitter). And by basic, I mean just that: no handshaking, no inverted signals, no open-drain driving, no half-duplex-share-one-pin capability; just straight-up serial I/O as Emile Baudot intended. -Phil
Phil,
This driver and those testing numbers are outstanding! I'll play with it this weekend and let you know what my overly demanding commercial device thinks... I think: "You rock!"
I knew this was a fun problem. Thanks so much for attempting this.
And Tracy, from one Agilent user to one LeCroy user... great work on the testing and the scope driving!
I see that there are a couple of sequences with 7 (or 8--I am not sure how to count them) instructions including two hub accesses, so the only way that it could reach 175kbaud would be for it to hit the sweet spot. At 175 kbaud, I think you have time for only 9+ instructions (4 clocks each) at 80MHz clkfreq. Is that why you were surprised? It looks like you placed the hub accesses so that if it hits the sweet spot on one it will hit the sweet spot on the second too. On the scope, when it fails at 177kbaud, it simply stops as if it has missed the waitcnt window. At 115200 there is time for 13+ instructions, and worst case not hitting the window it would come out at 11+ or 12+. Maybe there are baud rates below 175k where it the numbers don't line up? That is a caveat. It would have to work out on the receive side too.
Steven, this has been enlightening, and I learned a few new things about the LeCroy 'scope in the process. I do have an Agilent (actually HP) multimeter sitting next to it!
Good catch, Jim! There was a similar typo in the DAT section. (As long as IMAX and OMAX are the same, it doesn't matter, of course.) Attached is a corrected object.
I never thought that reading a thread in a coding forum could be as fascinating as watching a film like Odyssee 2001, Harry Potter or (whatever you like personally)
coooool !
The thread is in status solved. Anyway I would like to read a short comment from Stephen if this code can "pick" back :-) on this superpicky device
Thanks, Stephen. I hope it works out for you. If not, I still have one more trick up my sleeve, so please let me know if you need it.
-Phil
Phil, I think we've reached the limit of the commercial units' measuring ability... your code is currently working flawlessly! My commercial device is now being interacted with by a single Cog thanks to your new code.
The commercial unit makes two measurements on the bit stream: (1) start bit accuracy, (2) accuracy of bits within a byte.
My Single Cog TxOnly was seeing Start-bit 0.8% and inter-bit of 0.4%.
Your Single Cog Tx w/Rx shows Start-bit 0.6% and inter-bit of 0.4%.
Since your code measures so well with Tracy, I'm betting that we are seeing the limit of the commercial gear to detect the bit width jitter any more accurately. (and/or the clock accuracy of the prop-based hardware i'm using...)
Tracy, if you wish to test my theory that the gutted FDUX now TxOnly driver was OK but Phil's is better, I've attached my single cog versions of the Tx and Rx routines...
Is anyone getting strange character returns using this? (2CogSio)
If I tx something I'm getting consistent but wrong values in a terminal window.
My use of these routines was sending messages with full length and checksum with both ends verifying the checksums. So I was not seeing any extra or wrong valued characters being sent/received. Have you verified the baud rate you are wanting to run at?
But, I was never running at the baud rate you are trying... Have you looked with a scope or Logic Analyzer to see what shape each character is? Maybe one or more bits are bad widths?
The terminal program is probably out of sync. When identical characters are sent back-to-back that way, any zero bit becomes a start bit candidate, and if a one bit happens to occur in the stop bit position, it just stays locked out of sync. The way to break out of this is to toggle the DTR checkbox in the terminal screen to reset the Prop while the terminal program is active. This produces a good long mark condition such that the first zero bit encountered will actually be a start bit.
After taking a closer look at what you presume to be sending and what you're receiving, I would almost bet that you've got your polarity set wrong. Try inverting (or uninverting) it, and see what happens.
Would anyone with a logic analyzer (or scope) be willing to test this serial driver for me? It's supposed to be decently jitter free even at high speeds (I'm doing 460_800 at 80MHz), at least in my own mind. [8^) You can also adjust the bit period on the fly, either to change baud rates, or to adjust for a changing RCFAST clock. I am using this in a project with a RTC, and getting timing from that, but running the prop without a crystal (~12MHz), hence the need for decent speed and adjustable timing.
This is pretty much a plain vanilla serial driver, no other special features. The name is just for Fast Full-Duplex Serial, in 1 cog. I'm also trying to use some decent documentation, so hopefully it is pretty clear what is going on under the hood. The only real thing left to shoe-horn in is being able to send a 0 via the .Tx( 0 ) function...right now the sending code treats everything like a null-terminated string.
thanks,
Jonathan
EDIT: Updated the driver...better comments, a block send command, can tx(0), clamps the max freq (min half-period). Here is the max baud table:
Comments
My multi-threading scheduler would not be of benefit in this case. The 0.7% spec at 115200 represents 0.625 uSec, and the scheduler is designed to only time to 1 uSec. On top of that, competing demands for processor cycles between the Tx and Rx routines would cause some additional jitter.
But, it should be not too difficult to write a dedicated interleaved assembler routine to do this FDX job in a single cog to well within the specs posted. Be happy to "have a go" if anyone cares........
Cheers,
Peter (pjv)
The driver operates with ping-pong coroutines. Each routine gets 1/6 of a bit time to do its thing before having to return. Transmit and receive are interleaved (hence the "ping-pong"), so each is called every 1/3 bit time. Each slice is responsible for doing a waitcnt before returning to the coroutine dispatch, thus keeping everything in sync.
Now for the bad news: it doesn't work (yet) at 115,200 baud. (It tops out at about 107,000.) At 115,200, with an 80 MHz clock, a slice is 115 clocks (28 instructions) long. Each slice must potentially accommodate three hub accesses. The difficulty comes from the fact that clock sync is done near the beginning of receive slices (sync, sample the pin, do stuff, return) and near the end of transmit slices (do stuff, sync, write the pin, return). So a slice may not always receive a full time allocation. I think I can improve this by eliminating some subroutine calls and maybe loosening the timing precision on receive, where it's not really needed.
In the meantime, give it a try. And please let me know if you encounter any bugs. (I've not yet done a lot of testing...)
-Phil
Sending either the byte $FF or the byte $00 repeatedly. That is,
--$FF=start bit space only, expected width 10.0µS at 100000 baud
--$00, start + 8 space bits, expected width 90 µs at 100000 baud.
For PBnJ_serial, after 1000 samples:
--$ff: 9.975 µs mean, variation 9.957 to 9.981, sigma 0.009
--$00: 89.774 µs mean, variation 89.757 to 89.781, sigma 0.009.
--$55: all bits came in at 9.75 µs, same standard deviation as the start bit.
It is interesting that the timing comes out with a residue of 1/2 instruction cycle, even though 50ns divides 10µs exactly. That is about as good as you could do, I think, without dithering the bit durations to add 200ns to the total duration. The individual bits are uniform and solid.
At 57600 baud (bit 17.361µs, 9 bits of $00 156.25 µs:
--$ff: 17.325 µs mean, variation 17.317 to 17.341, sigma 0.008
--$00: 155.922 µs mean, variation 155.918 to 155.936, sigma 0.004.
--$55: uniform bit lengths of 17.325 µs.
You're right--It doesn't work at 108kbaud! I haven't tried it with simultaneous reception. What would be the best challenge? The enqueue routine looks kind of long, with two or even three hub accesses.
Thanks for testing! Three hub accesses are necessary, I'm afraid: read the pointer you don't own, write the pointer you do own, and read or write the data. 'No way around that. But that's not to say they all have to be done in the same time slot -- at least not for xmt, where the dequeue pointer could be written after the start bit is started. I'll see if I can tighten things up a bit...
-Phil
Thanks to Stephen for inspiring a fun, challenging project, and thanks to Tracy for the thorough timing analysis on the first version!
BTW, the transmit side sends 11/3 stop bits. I thought it might help with receivers that are a tad slow to maintain sync through an entire stream of characters without skipping characters or ending up with framing errors.
-Phil
Cheers,
Peter (pjv)
--$FF=start bit space only, expected width 8.68µS at 115200 baud
--$00, start + 8 space bits, expected width 78.125 µs at 115200 baud.
For PBnJ_serial, 115200 baud after 1000 samples:
--$ff: 8.700 µs mean, variation 8.697 to 8.701, sigma 0.000 <zero standard deviation!
--$00: 78.298 µs mean, variation 78.282 to 78.302, sigma 0.005.
--$55: all bits came in at 8.7 µs, sigma (standard deviation) 0.000 or 0.001.
That kind of deviation is hard to measure for even a 1GHz 'scope.
It worked for me up to 175 kbaud (176 actually, 177 not).
Sending either the byte $FF or the byte $00 repeatedly, expected at 175k
--$FF=start bit space only, expected width 5.714 µS at 175000 baud
--$00, start + 8 space bits, expected width 51.429 µs at 175000 baud.
For PBnJ_serial, after 1000 samples:
--$ff: 5.700 µs mean, variation 5.699,to 5.702 sigma 0.000 <!
--$00: 51.299 µs mean, variation 51.285 to 51.302 , sigma 0.003.
--$55: all bits came in at 8.7 µs, same standard deviation as the start bit.
'Been thinking about your dithering comment. It might actually be pretty easy to pull off. All I would need are three different values for sixthbit. The closest one would be used interior to the transmit and receive routines. The two corrective ones would appear in the x_full section, since that alone determines the width of each transmitted bit. It would get four delays per bit at the normal rate and two at the corrective rates. The receiver might experience some 12.5ns sampling jitter, as a consequence, but that hardly matters.
And, yeah, my lunch-inspired P B & J <> fancy!
-Phil
Phil,
This driver and those testing numbers are outstanding! I'll play with it this weekend and let you know what my overly demanding commercial device thinks... I think: "You rock!"
I knew this was a fun problem. Thanks so much for attempting this.
And Tracy, from one Agilent user to one LeCroy user... great work on the testing and the scope driving!
-Stephen, KZ0Q
-Phil
I see that there are a couple of sequences with 7 (or 8--I am not sure how to count them) instructions including two hub accesses, so the only way that it could reach 175kbaud would be for it to hit the sweet spot. At 175 kbaud, I think you have time for only 9+ instructions (4 clocks each) at 80MHz clkfreq. Is that why you were surprised? It looks like you placed the hub accesses so that if it hits the sweet spot on one it will hit the sweet spot on the second too. On the scope, when it fails at 177kbaud, it simply stops as if it has missed the waitcnt window. At 115200 there is time for 13+ instructions, and worst case not hitting the window it would come out at 11+ or 12+. Maybe there are baud rates below 175k where it the numbers don't line up? That is a caveat. It would have to work out on the receive side too.
Steven, this has been enlightening, and I learned a few new things about the LeCroy 'scope in the process. I do have an Agilent (actually HP) multimeter sitting next to it!
I took a real quick look at your code.
Question: In your 'PUB tx' routine, you have a reference to IMAX. Since this is the output routine, should it be OMAX instead?
Obviously it won't matter as long as the two buffers are the same size.
Cool program!
Jim
-Phil
I never thought that reading a thread in a coding forum could be as fascinating as watching a film like Odyssee 2001, Harry Potter or (whatever you like personally)
coooool !
The thread is in status solved. Anyway I would like to read a short comment from Stephen if this code can "pick" back :-) on this superpicky device
best regards
Stefan
Phil, I think we've reached the limit of the commercial units' measuring ability... your code is currently working flawlessly! My commercial device is now being interacted with by a single Cog thanks to your new code.
The commercial unit makes two measurements on the bit stream: (1) start bit accuracy, (2) accuracy of bits within a byte.
My Single Cog TxOnly was seeing Start-bit 0.8% and inter-bit of 0.4%.
Your Single Cog Tx w/Rx shows Start-bit 0.6% and inter-bit of 0.4%.
Since your code measures so well with Tracy, I'm betting that we are seeing the limit of the commercial gear to detect the bit width jitter any more accurately. (and/or the clock accuracy of the prop-based hardware i'm using...)
Tracy, if you wish to test my theory that the gutted FDUX now TxOnly driver was OK but Phil's is better, I've attached my single cog versions of the Tx and Rx routines...
I think this is fun news Phil!
- Stephen, KZ0Q
-Phil
Phil, midrange... 57k6
-Stephen, KZ0Q
If I tx something I'm getting consistent but wrong values in a terminal window.
I get "2" on the return.
If I change around the tx value I get something different that what was entered.
Enter "3" get "e" and so on.
Any ideas?
My use of these routines was sending messages with full length and checksum with both ends verifying the checksums. So I was not seeing any extra or wrong valued characters being sent/received. Have you verified the baud rate you are wanting to run at?
But, I was never running at the baud rate you are trying... Have you looked with a scope or Logic Analyzer to see what shape each character is? Maybe one or more bits are bad widths?
-Stephen
I do not have a scope available for this, only a return window for review.
With the FDS module the receive window displays what is send, with this it is consistent but not the same value.
-Phil
Reset everything numerous times but still gets the exact same output.
-Phil
Swapping back to the FDS object gives the correct output. But I need tx speed.
Also swapping to the PBnJ gives the same results.
a = O
S = V
b = '
and so on.
Could it be those objects are shifting one bit over?
Bean
Well, you certainly won't get 460.8 Kbaud with my PBnJ driver! Why not post all of your code and a schematic of your serial interface?
-Phil
Its in there but I borrowed a scope , everything looks clean. But, the signals are inverted, haa. Good call.
What can I do to reverse those as minicom seems to lack a way of setting dtr + or - ?
EDIT:
Well I guess the real issue is that I can't get the signal to invert regardless of mode setting.
Neither pb_j or SST respond to mode changes, FDS does.
Edit part deux:
All this would be from where both say-- there is no mode functionality.
Thanks all. Changing line 158 to from took care of my issue at least on the inversion part.
This is pretty much a plain vanilla serial driver, no other special features. The name is just for Fast Full-Duplex Serial, in 1 cog. I'm also trying to use some decent documentation, so hopefully it is pretty clear what is going on under the hood. The only real thing left to shoe-horn in is being able to send a 0 via the .Tx( 0 ) function...right now the sending code treats everything like a null-terminated string.
thanks,
Jonathan
EDIT: Updated the driver...better comments, a block send command, can tx(0), clamps the max freq (min half-period). Here is the max baud table:
EDIT2: removed the attachment...a newer version is in my next post.
Jonathan
EDIT: removed the attachment...next post is better.