I cannot read fast enough. Serial at 1Mbps.
I have been patching the SerialDFullDuplex provided by Parallax in order to have half-duplex high speed serial (1Mbps).
Basically the original code in SerialFullDuplex interleaves reads and writes in one routine while I keep separate code for both. That is, when I am writing I am 100% writing and the same applies to reading.
I have checked that writing works well. I mean, the device on the other side of the line reads the data I send without problems, despite I am unable·to send contiguous bytes in sequence. Between bytes there is always a 50us delay. The receiver seems to have no problem with this. Note that, in order·to speed up things,·I have added a new function to the SerialFullDuplex object that writes a sequence of bytes, instead of one byte. However, this code is writen in Spin so I guess the TX ASM code is able to send one byte before I write the second byte on the buffer. I write sequences of about 8 bytes. Between sequences of bytes·there may be a wait in the order of seconds.
Please note that TX code is continuously running, in a separate COG,·in a tight·spin looking for a byte to send, so it could easily send the first byte before I am able to put the second byte on the buffer.
Although I am able to write a sequence of bytes, ·I am unable to read a sequence of bytes propperly. It seems that I can read well only the first byte of·the sequence.
The remote sender(which is a hardware out of my control) sends a sequence of bytes (about 8 bytes; with 16 bytes in my receive buffer)·in a very fast sequence. The remote sender does not have the delay I have between bytes wen I am writing.·It is able to send bytes separated only by the start bit! I mean, the sender does no have delays between bytes. The gap between two bytes is exactly 1us, which is the basic pulse running at 1Mbps.
So I suspect I am having a delay between reading bytes that is larger than 1us.
The delay I have at the end of a byte read·should be, I think, the code that puts the data in my receive buffer. This code, that is a copy of the SerialFullDuplex object, is as follows:
'Note: rxdata has been loaded with 8 data bits in reverse order on top bits
······················ shr···· rxdata,#32-8 'justify and trim received byte
······················· and···· rxdata,#$FF
······················· 'save received byte and inc head
······················· rdlong· t2_rx,par·············· 't2 =·current index
······················· add···· t2_rx,rxbuff··········· 't2 = &rxbuff[noparse][[/noparse]head]
······················· wrbyte· rxdata,t2_rx··········· '(BYTE)rxbuf[noparse][[/noparse]head]=(BYTE)rxdata
······················· sub···· t2_rx,rxbuff··········· 't2 = index head
······················· add···· t2_rx,#1··············· 'increments index
······················· and···· t2_rx,#$0F
······················· wrlong· t2_rx,par·············· 'head
jmp #ReadNextByte
Would the rdlong+wrbyte+wrlong combo code take a significant delay ? Note that if it is larger than 1us·next reading will be wrong ! ·Note: I am working at 80Mhz.
I have another question about the Start bit. The line is set to 5V with a pull-up resistor. I wait for the Start bit that is a zero. When I receive the very first LOW pulse I count·1+1/2 pulses and then read the first data bit. What I want is read in the middle of the next pulse. But, should I wait the full 1+1/2 pulses or only 1/2 pulse ? To read the remaining data bits (8 bits·plus one Stop bit) I wait a full pulse between bits.
Ideas about how to speed up things in order to be able to read the full sequence of bytes ?
·
Basically the original code in SerialFullDuplex interleaves reads and writes in one routine while I keep separate code for both. That is, when I am writing I am 100% writing and the same applies to reading.
I have checked that writing works well. I mean, the device on the other side of the line reads the data I send without problems, despite I am unable·to send contiguous bytes in sequence. Between bytes there is always a 50us delay. The receiver seems to have no problem with this. Note that, in order·to speed up things,·I have added a new function to the SerialFullDuplex object that writes a sequence of bytes, instead of one byte. However, this code is writen in Spin so I guess the TX ASM code is able to send one byte before I write the second byte on the buffer. I write sequences of about 8 bytes. Between sequences of bytes·there may be a wait in the order of seconds.
Please note that TX code is continuously running, in a separate COG,·in a tight·spin looking for a byte to send, so it could easily send the first byte before I am able to put the second byte on the buffer.
Although I am able to write a sequence of bytes, ·I am unable to read a sequence of bytes propperly. It seems that I can read well only the first byte of·the sequence.
The remote sender(which is a hardware out of my control) sends a sequence of bytes (about 8 bytes; with 16 bytes in my receive buffer)·in a very fast sequence. The remote sender does not have the delay I have between bytes wen I am writing.·It is able to send bytes separated only by the start bit! I mean, the sender does no have delays between bytes. The gap between two bytes is exactly 1us, which is the basic pulse running at 1Mbps.
So I suspect I am having a delay between reading bytes that is larger than 1us.
The delay I have at the end of a byte read·should be, I think, the code that puts the data in my receive buffer. This code, that is a copy of the SerialFullDuplex object, is as follows:
'Note: rxdata has been loaded with 8 data bits in reverse order on top bits
······················ shr···· rxdata,#32-8 'justify and trim received byte
······················· and···· rxdata,#$FF
······················· 'save received byte and inc head
······················· rdlong· t2_rx,par·············· 't2 =·current index
······················· add···· t2_rx,rxbuff··········· 't2 = &rxbuff[noparse][[/noparse]head]
······················· wrbyte· rxdata,t2_rx··········· '(BYTE)rxbuf[noparse][[/noparse]head]=(BYTE)rxdata
······················· sub···· t2_rx,rxbuff··········· 't2 = index head
······················· add···· t2_rx,#1··············· 'increments index
······················· and···· t2_rx,#$0F
······················· wrlong· t2_rx,par·············· 'head
jmp #ReadNextByte
Would the rdlong+wrbyte+wrlong combo code take a significant delay ? Note that if it is larger than 1us·next reading will be wrong ! ·Note: I am working at 80Mhz.
I have another question about the Start bit. The line is set to 5V with a pull-up resistor. I wait for the Start bit that is a zero. When I receive the very first LOW pulse I count·1+1/2 pulses and then read the first data bit. What I want is read in the middle of the next pulse. But, should I wait the full 1+1/2 pulses or only 1/2 pulse ? To read the remaining data bits (8 bits·plus one Stop bit) I wait a full pulse between bits.
Ideas about how to speed up things in order to be able to read the full sequence of bytes ?
·
Comments
Ideally, the receive code will be sampling the bit level fast enough to take 4 or more samples during the time of a single start pulse, the more the better. (Some UART chips sample 16 times the bit rate.) Because of the fast repeated sampling, the read code detects that a start condition has arrived very close to the actual beginning of the start pulse.
So because the code activates the read sequence close to the actual start transition, you'd want to wait a full 1 1/2 bit periods for the first bit, then one period after that, so each data bit sample would be timed as close as possible to the center of each data bit.
David
Does the receive data also arrive in packets of 8? If so, the reception of the packet could be speeded up considerably by buffering it directly in the COG, and subsequently transferring it to the HUB at a more leisurely pace.
The kicker is the time it takes to wait for the HUB. If you add up the time taken for the code you listed above, starting with the first rdlong, it comes out (quick calculation) to 55 clock cycles at 20 megahertz, or 2.75 microseconds. There are 13 clock cycles wasted doing nothing between the wrbyte and the last wrlong. For best sync to the HUB, there need to be 2 regular instructions between HUB intructions, but at that point there are 3 intervening instructions, so it has to sit there for 13 to resync. The first rdlong is also problematic, because the wasted clock cycles there can be anywhere from 0 to 15, and in the nature of asycnronous data reception, there is nothing you can do about that uncertainty. So the whole routine as listed takes from 70 to 85 clock cycles. That is 3.5 to 4.25 microseconds, way more than the single available 1 microsecond stop bit.
A small observation. The code does not need the first
and rxdata,#$FF
The shr instruction already shifts zeros into the MSBs.
It should be easy to buffer the packet in the COG and then to write an optimal routine to transfer it to the HUB. There is time for 20 regular 4-cycle instructions in one microsecond.
EDIT: got the times wrong by a factor of 4, see my next post.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com
Post Edited (Tracy Allen) : 1/29/2007 5:10:58 PM GMT
Let me know if you are working with the Bioloid, I am interested in finding others working on this same combination.
once, at the very start of the routine, and you can write it (only) for each byte received. This reduces the
hub ops from 3 to 2.
At 80 MHz, you really shouldn't have any more difficulty than this. You don't need to read the stop bit
(unless you are paranoid). Even if you do insist on reading the stop bit, though, you should be able to get
the check code and store code, and set up for the next start bit, to all finish in 40 cycles (one half
microsecond), with no need for fancy buffering or anything.
Give it another go removing the unneeded read of the index pointer and let us know what you see.
By making a copy of the t2_rx value (a mov which uses a free time slot between the first two hub accesses), you can then dispose of one instruction between the second two hub accesses and thus avoid 13 wasted clock cycles. That takes the total number of clock cycles down to between 36 and 51, which is around 1/2 microsecond. That should give plenty of time to get back to ReadNextByte well before the next start bit.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com
Otherwise he'll keep drifting further and further off.
Removing the rdlong t2_rx,par inside the loop, and putting it at the very head of the routine, should accomplish this.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com
But if there's no stop bit, and it's like an async-and-pray then I'm not sure what to do.
Why not post the entire code chunk? (And maybe spend a few minutes commenting it up first.)
This would really give us a clue (and it would be pretty easy for us to write a "test" assembly
routine that generated a pseudo-random 1MHz serial stream to test it with.)
The protocol is very simple. After I write a command the sender sends an answer that is a sequence of 8-to-10 bytes in sequence, without delays. Each full 'BYTE' has 1 Start Bit that is LOW, 8 data bits, and 1 Stop bit that is high. Data bits are sent in LSB first order, as usual.
I have tested with an scope that the data over the line is actually the right data I expected and that is nicely sequenced with eaxct 1us pulses. All expected bits are there: Start, Data and Stop. The signal is not inverted. It is my received data what is wrong.
Please note that I am able always to read the first byte without problems. The sequence gets wrong after the second byte. It is interesting to note that the following bytes are always the same for the same sequence being sent. I mean, the sequence I read is wrong but it is always the same. If I change the pattern of the sequence being sent I receive a different sequence.
For testing purposes I have used two different pieces of hardware, that work at 1mbps with the same format.
As I told above my code is a simple variant of the Chip Gracey code for SerialFullDuplex. I will post here after cleaning it up a bit.
I am going to do the following for testing purposes, using the scope as a debugger tool: each time I receive a bit I set another I/O pin to the value I read. I should see on the scope something similar to the original signal, right?. I wanto to do this test in order to see if the problem is in the software, that is trashing the data, or actually I am reading wrong signals!
exact format you describe. (This should be really easy and a small amount of code.) Then you can spawn that after your reader
routine, and then you've got a self-contained test setup that doesn't require *any* extra hardware, and that we can run on our own
demo boards, etc., to see what's going on.
But in any case, this should be pretty straightforward to resolve.
For tracking this problem down, your suggestion is an approach that must be used with caution. Often these problems are caused by clocks not matching each other properly, and if you use the same clock source for receiving as transmitting, then you are eliminating that as a possibility. However, what it may tell you is, if the symptoms go away when the same receive/transmit clock sources are used, then clock differences are almost surely the problem.
Cheers,
Peter (pjv)
Exactly what is the application of the 1Mbps transfer? You can work 20% faster with no start and stop bits at all. That's the case for wireless data transfers.
Actually, the reason why I found your post is because I am new to Propellor assembly and I am trying to understand the rdbyte, wrbyte vs mov. I could not find a good definition of register space vs main memory space. On p 394 of the Propellor manual, it says that rdbyte uses 9 bits for the address. I assume that it is an indexed register. The register space, is it the chunk of ram that is allocated on each cognew, with main memory being everything in the Propellor? What is the most efficient way to copy from INA pins to a buffer in RAM?
Frank
The main memory space is functionally much like an I/O device. It is not accessible to ordinary instructions and is accessed only with RDxxxx and WRxxxx. There are pairs of these for accessing BYTEs, WORDs, and LONG words. The format of all these instructions is the same and the destination field has the cog memory address to be used while the source field has the main memory address. If the main memory address is 9 bits or less, an immediate operand can be used for this. The cog memory is completely separate from main memory (32K x 8-bits) and there is a separate cog memory of 512 32-bit words associated with each cog. The Spin interpreter is normally loaded into a cog's memory and executes there. The interpreter makes main memory appear to be the only memory in the Propeller and Spin code cannot access any cog memory at all. An assembly program can be loaded into a cog and that can "cooperate" with a Spin program to copy data into main memory.
The most efficient way to copy from INA pins to a buffer in RAM is to use the shift instructions to build a 32-bit long word containing 4 bytes, then write this to the buffer using WRLONG. If this is written carefully, there will be minimal clock cycles spent waiting for access to main memory (access is shared among the 8 cogs - one time slot of 2 clock cycles for each). If there is a cycle of some number of bytes of data received, then a wait of some time, the data can be copied to main memory in bursts from a buffer in the cog's memory.