Shop OBEX P1 Docs P2 Docs Learn Events
I cannot read fast enough. Serial at 1Mbps. — Parallax Forums

I cannot read fast enough. Serial at 1Mbps.

peterzpeterz Posts: 59
edited 2007-02-19 20:28 in Propeller 1
I have been patching the SerialDFullDuplex provided by Parallax in order to have half-duplex high speed serial (1Mbps).
Basically the original code in SerialFullDuplex interleaves reads and writes in one routine while I keep separate code for both. That is, when I am writing I am 100% writing and the same applies to reading.

I have checked that writing works well. I mean, the device on the other side of the line reads the data I send without problems, despite I am unable·to send contiguous bytes in sequence. Between bytes there is always a 50us delay. The receiver seems to have no problem with this. Note that, in order·to speed up things,·I have added a new function to the SerialFullDuplex object that writes a sequence of bytes, instead of one byte. However, this code is writen in Spin so I guess the TX ASM code is able to send one byte before I write the second byte on the buffer. I write sequences of about 8 bytes. Between sequences of bytes·there may be a wait in the order of seconds.
Please note that TX code is continuously running, in a separate COG,·in a tight·spin looking for a byte to send, so it could easily send the first byte before I am able to put the second byte on the buffer.

Although I am able to write a sequence of bytes, ·I am unable to read a sequence of bytes propperly. It seems that I can read well only the first byte of·the sequence.
The remote sender(which is a hardware out of my control) sends a sequence of bytes (about 8 bytes; with 16 bytes in my receive buffer)·in a very fast sequence. The remote sender does not have the delay I have between bytes wen I am writing.·It is able to send bytes separated only by the start bit! I mean, the sender does no have delays between bytes. The gap between two bytes is exactly 1us, which is the basic pulse running at 1Mbps.

So I suspect I am having a delay between reading bytes that is larger than 1us.
The delay I have at the end of a byte read·should be, I think, the code that puts the data in my receive buffer. This code, that is a copy of the SerialFullDuplex object, is as follows:

'Note: rxdata has been loaded with 8 data bits in reverse order on top bits

······················ shr···· rxdata,#32-8 'justify and trim received byte
······················· and···· rxdata,#$FF
······················· 'save received byte and inc head
······················· rdlong· t2_rx,par·············· 't2 =·current index
······················· add···· t2_rx,rxbuff··········· 't2 = &rxbuff[noparse][[/noparse]head]
······················· wrbyte· rxdata,t2_rx··········· '(BYTE)rxbuf[noparse][[/noparse]head]=(BYTE)rxdata
······················· sub···· t2_rx,rxbuff··········· 't2 = index head
······················· add···· t2_rx,#1··············· 'increments index
······················· and···· t2_rx,#$0F
······················· wrlong· t2_rx,par·············· 'head
jmp #ReadNextByte

Would the rdlong+wrbyte+wrlong combo code take a significant delay ? Note that if it is larger than 1us·next reading will be wrong ! ·Note: I am working at 80Mhz.

I have another question about the Start bit. The line is set to 5V with a pull-up resistor. I wait for the Start bit that is a zero. When I receive the very first LOW pulse I count·1+1/2 pulses and then read the first data bit. What I want is read in the middle of the next pulse. But, should I wait the full 1+1/2 pulses or only 1/2 pulse ? To read the remaining data bits (8 bits·plus one Stop bit) I wait a full pulse between bits.

Ideas about how to speed up things in order to be able to read the full sequence of bytes ?








·

Comments

  • David BDavid B Posts: 591
    edited 2007-01-28 17:22
    About your last point - you're not really reading a pulse; you're making many individual level readings.

    Ideally, the receive code will be sampling the bit level fast enough to take 4 or more samples during the time of a single start pulse, the more the better. (Some UART chips sample 16 times the bit rate.) Because of the fast repeated sampling, the read code detects that a start condition has arrived very close to the actual beginning of the start pulse.

    So because the code activates the read sequence close to the actual start transition, you'd want to wait a full 1 1/2 bit periods for the first bit, then one period after that, so each data bit sample would be timed as close as possible to the center of each data bit.

    David
  • Tracy AllenTracy Allen Posts: 6,656
    edited 2007-01-28 21:40
    There is probably also a Stop bit. 10 bits total, start is 1 uS low, then 8 data bits, then stop is obligatory 1 uS high.

    Does the receive data also arrive in packets of 8? If so, the reception of the packet could be speeded up considerably by buffering it directly in the COG, and subsequently transferring it to the HUB at a more leisurely pace.

    The kicker is the time it takes to wait for the HUB. If you add up the time taken for the code you listed above, starting with the first rdlong, it comes out (quick calculation) to 55 clock cycles at 20 megahertz, or 2.75 microseconds. There are 13 clock cycles wasted doing nothing between the wrbyte and the last wrlong. For best sync to the HUB, there need to be 2 regular instructions between HUB intructions, but at that point there are 3 intervening instructions, so it has to sit there for 13 to resync. The first rdlong is also problematic, because the wasted clock cycles there can be anywhere from 0 to 15, and in the nature of asycnronous data reception, there is nothing you can do about that uncertainty. So the whole routine as listed takes from 70 to 85 clock cycles. That is 3.5 to 4.25 microseconds, way more than the single available 1 microsecond stop bit.

    A small observation. The code does not need the first
    and rxdata,#$FF
    The shr instruction already shifts zeros into the MSBs.

    It should be easy to buffer the packet in the COG and then to write an optimal routine to transfer it to the HUB. There is time for 20 regular 4-cycle instructions in one microsecond.


    EDIT: got the times wrong by a factor of 4, see my next post.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com

    Post Edited (Tracy Allen) : 1/29/2007 5:10:58 PM GMT
  • lnielsenlnielsen Posts: 72
    edited 2007-01-28 22:58
    Are you trying to write an interface for the Bioloid AX-12? I don't have any hardware yet (proto boards on order) but I am also interested in 1Mbps half duplex support for the Bioloid. If you want, post your code and a demo and I can see if I can help by using Asterisk's Gear utility.

    Let me know if you are working with the Bioloid, I am interested in finding others working on this same combination.
  • rokickirokicki Posts: 1,000
    edited 2007-01-29 08:19
    Also, if you are the only one generating serial (putting into the buffer), you only need read the index variable
    once, at the very start of the routine, and you can write it (only) for each byte received. This reduces the
    hub ops from 3 to 2.

    At 80 MHz, you really shouldn't have any more difficulty than this. You don't need to read the stop bit
    (unless you are paranoid). Even if you do insist on reading the stop bit, though, you should be able to get
    the check code and store code, and set up for the next start bit, to all finish in 40 cycles (one half
    microsecond), with no need for fancy buffering or anything.

    Give it another go removing the unneeded read of the index pointer and let us know what you see.
  • Tracy AllenTracy Allen Posts: 6,656
    edited 2007-01-29 17:08
    My bad, my execution time figures were off in my previous post. I multiplied times the 1/20 microsecond instruction time instead of the 1/80 uS clock time. So the total time for the routine down is much closer to the 1 microsecond you need, but still just over.

    By making a copy of the t2_rx value (a mov which uses a free time slot between the first two hub accesses), you can then dispose of one instruction between the second two hub accesses and thus avoid 13 wasted clock cycles. That takes the total number of clock cycles down to between 36 and 51, which is around 1/2 microsecond. That should give plenty of time to get back to ReadNextByte well before the next start bit.

        shr     rxdata,#32-8 'justify and trim received byte, msbs are zeroed
        'save received byte and inc head
        rdlong  t2_rx,par               't2 = current index
        mov  t9_rx,t2_rx               ' make a copy into another long
        add     t2_rx,rxbuff            't2 = &rxbuff[noparse][[/noparse]head]
        wrbyte  rxdata,t2_rx            '(BYTE)rxbuf[noparse][[/noparse]head]=(BYTE)rxdata
        add     t9_rx,#1                'increments index using copy
        and     t9_rx,#$0F
        wrlong  t9_rx,par               'head
        jmp #ReadNextByte
    
    

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • rokickirokicki Posts: 1,000
    edited 2007-01-29 18:08
    Well, if he's actually *reading* the stop bit, and there's only one stop bit and one start bit, then he needs it to be *under* 1/2 uS.
    Otherwise he'll keep drifting further and further off.

    Removing the rdlong t2_rx,par inside the loop, and putting it at the very head of the routine, should accomplish this.
  • Tracy AllenTracy Allen Posts: 6,656
    edited 2007-01-29 20:13
    Agreed. Also agreed that it is probably not necessary to read the stop bit.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • peterzpeterz Posts: 59
    edited 2007-01-30 15:08
    Have tried these optimizations. Still I cannot read well beyond the first byte. Thanks for your help.
  • Tracy AllenTracy Allen Posts: 6,656
    edited 2007-01-30 17:08
    Are you sure about the protocol sent by the other device, in terms of number of bits, parity, and so on? In your original post you had questions about the timing did not mention a stop bit. You said, "It is able to send bytes separated only by the start bit! I mean, the sender does no have delays between bytes.". In normal asynchronous 8 bit data there has to be a stop bit, but that is not to say that this device does not have its own peculiar protocol based on longer bit streams. Someone asked if this is for the AX-12 servos, which I don't know much about, except I see that they do send their data at 1mbps. Maybe with more info about the device and a look at your byte receive code we can help a little more.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • rokickirokicki Posts: 1,000
    edited 2007-01-30 18:22
    Yeah, in theory a stop bit is necessary otherwise you don't have a start bit "edge" to synchronize to.

    But if there's no stop bit, and it's like an async-and-pray then I'm not sure what to do. smile.gif

    Why not post the entire code chunk? (And maybe spend a few minutes commenting it up first.)
    This would really give us a clue (and it would be pretty easy for us to write a "test" assembly
    routine that generated a pseudo-random 1MHz serial stream to test it with.)
  • peterzpeterz Posts: 59
    edited 2007-01-30 19:14
    Sorry, I meant 'separated only by a Stop bit'. And yes, there is a Stop bit. Please note that when I talk about reading in my first post I mention the Stop bit too.
    The protocol is very simple. After I write a command the sender sends an answer that is a sequence of 8-to-10 bytes in sequence, without delays. Each full 'BYTE' has 1 Start Bit that is LOW, 8 data bits, and 1 Stop bit that is high. Data bits are sent in LSB first order, as usual.

    I have tested with an scope that the data over the line is actually the right data I expected and that is nicely sequenced with eaxct 1us pulses. All expected bits are there: Start, Data and Stop. The signal is not inverted. It is my received data what is wrong.

    Please note that I am able always to read the first byte without problems. The sequence gets wrong after the second byte. It is interesting to note that the following bytes are always the same for the same sequence being sent. I mean, the sequence I read is wrong but it is always the same. If I change the pattern of the sequence being sent I receive a different sequence.

    For testing purposes I have used two different pieces of hardware, that work at 1mbps with the same format.

    As I told above my code is a simple variant of the Chip Gracey code for SerialFullDuplex. I will post here after cleaning it up a bit.

    I am going to do the following for testing purposes, using the scope as a debugger tool: each time I receive a bit I set another I/O pin to the value I read. I should see on the scope something similar to the original signal, right?. I wanto to do this test in order to see if the problem is in the software, that is trashing the data, or actually I am reading wrong signals!
  • rokickirokicki Posts: 1,000
    edited 2007-01-30 21:05
    If you *really* want to get to the bottom of this, just write an assembly routine that generates the serial data out a particular pin in the
    exact format you describe. (This should be really easy and a small amount of code.) Then you can spawn that after your reader
    routine, and then you've got a self-contained test setup that doesn't require *any* extra hardware, and that we can run on our own
    demo boards, etc., to see what's going on.

    But in any case, this should be pretty straightforward to resolve.
  • pjvpjv Posts: 1,903
    edited 2007-01-31 01:00
    Hi Rokicki;

    For tracking this problem down, your suggestion is an approach that must be used with caution. Often these problems are caused by clocks not matching each other properly, and if you use the same clock source for receiving as transmitting, then you are eliminating that as a possibility. However, what it may tell you is, if the symptoms go away when the same receive/transmit clock sources are used, then clock differences are almost surely the problem.

    Cheers,

    Peter (pjv)
  • TransistorToasterTransistorToaster Posts: 149
    edited 2007-02-19 20:03
    Peter,
    Exactly what is the application of the 1Mbps transfer? You can work 20% faster with no start and stop bits at all. That's the case for wireless data transfers.

    Actually, the reason why I found your post is because I am new to Propellor assembly and I am trying to understand the rdbyte, wrbyte vs mov. I could not find a good definition of register space vs main memory space. On p 394 of the Propellor manual, it says that rdbyte uses 9 bits for the address. I assume that it is an indexed register. The register space, is it the chunk of ram that is allocated on each cognew, with main memory being everything in the Propellor? What is the most efficient way to copy from INA pins to a buffer in RAM?
    Frank
  • Mike GreenMike Green Posts: 23,101
    edited 2007-02-19 20:28
    Frank,
    The main memory space is functionally much like an I/O device. It is not accessible to ordinary instructions and is accessed only with RDxxxx and WRxxxx. There are pairs of these for accessing BYTEs, WORDs, and LONG words. The format of all these instructions is the same and the destination field has the cog memory address to be used while the source field has the main memory address. If the main memory address is 9 bits or less, an immediate operand can be used for this. The cog memory is completely separate from main memory (32K x 8-bits) and there is a separate cog memory of 512 32-bit words associated with each cog. The Spin interpreter is normally loaded into a cog's memory and executes there. The interpreter makes main memory appear to be the only memory in the Propeller and Spin code cannot access any cog memory at all. An assembly program can be loaded into a cog and that can "cooperate" with a Spin program to copy data into main memory.

    The most efficient way to copy from INA pins to a buffer in RAM is to use the shift instructions to build a 32-bit long word containing 4 bytes, then write this to the buffer using WRLONG. If this is written carefully, there will be minimal clock cycles spent waiting for access to main memory (access is shared among the 8 cogs - one time slot of 2 clock cycles for each). If there is a cycle of some number of bytes of data received, then a wait of some time, the data can be copied to main memory in bursts from a buffer in the cog's memory.
Sign In or Register to comment.