How can I detect Wiznet RX buffer overrun when receiving UDP packets?
ags
Posts: 386
Can anyone point me towards an example of how to detect if enough UDP packets have been transmitted by a peer, so quickly that they have overrun the Wiz RX buffer? I'd like to accomplish this without inspecting the payload of the UDP packet. (I'm using a WIZ812 module)
I have never seen a partial-length packet so I presume if all of a UDP packet cannot be stored the entire packet is discarded. Can anyone confirm that?
Thanks.
I have never seen a partial-length packet so I presume if all of a UDP packet cannot be stored the entire packet is discarded. Can anyone confirm that?
Thanks.
Comments
I see four possible options (in theory):
1) The Wiznet module detects that it did receive a UDP packet but was unable to store it in RX buffer and some flag is set to indicate that. I've read the entire datasheet many times and don't see anything like this so I'm ruling it out (but would be happy to be proven wrong).
2) The UDP packet is processed and as much as can fit in the remaining RX buffer is stored. That would be likely to result in some packets occasionally being truncated (containing only a portion of the original payload). I have not observed this in operation and think it is also not what happens.
3) The UDP packet simply overwrites the Wiznet RX buffer, effectively wrapping around and overwriting the oldest previously-received data. This seems to me plausible but unlikely. One way I can think of to prove/disprove this would be to blast a huge number of unique UDP packets to the Wiznet, (much more than the RX buffer can hold) with a delay before clearing any to the Propeller hub RAM. Inspection of the first packet would indicate if it is the first UDP packet sent or a later one that overwrote the original. This will take some time for me to implement given my current system implementation but I can do so if necessary. If someone has already done a similar experiment or found some proof one way or the other I'd be happy not to replicate his/her work.
4) Even if a UDP packet is received intact (i.e. in some temporary internal working buffer used before moving to the application-accessible RX buffer), if there is insufficient memory in the Wiznet RX buffer, the UDP packet is simply tossed out with no remaining artifacts indicating that. I suspect this to be the case, but wanted to clarify my question and see if my presumption is correct.
If (as I suspect) #4 is the case, that has significant implications for my project. I've been able to develop a custom (single-purpose) driver for the Wiznet (W5100 using indirect addressing mode) that can receive and move data to the Propeller hub RAM at sustained speeds of 12Mbps. The transmitter sends data at an average speed of 2.6 Mbps; however, it does so in bursts, which may exceed the 4KB Wiznet RX buffer (I use two sockets so this is the maximum available RX buffer size that I can configure). Unless I can find a way to throttle the TX burst size, I'll always lose data.
Thanks for the reply.
The receive code is always looking for data. As soon as the receive logic sees data in the Rx buffer it will transfer the data to RAM.
Send 248 16 byte messages as fast as my PC will transmit UDP packets. The time delta between packet timestamps is 31.4uS. Very few packets are received. Somewhere around 4. Usually the last packet received is number 64. Not sure how relevant that is
Send 10 16 bytes message at 31us. Received around 2 packets.
Send 248 16 byte messages with a 1ms delay between each message. All packets received without error.
Send 248 1110 byte messages at max PC transmit speed. Very few packets are received usually in the 20s.
Send 248 11110 byte message with a 1ms delay and all 248 packets are received.
The Rx logic is pretty lean. For my setup, if processing logic is added, the time delay between packets must increase else packets will drop. Indeed a PC can fire packets too fast for the receive logic to handle. I don't think it's the buffer overflowing though. I think it's the time it takes for the Rx logic to process a packet and get ready for the next packet. I can't say this is the case for your setup as I have no idea how your code works.
I tried using all 4 sockets but that did not help there were still dropped packets at full speed. However, the 1ms delay always works. I'd have to modify my test code to figure out the delay threshold.
I sounds like you want lossless UDP. You'd have to implement some kind of hand shake or maybe try TCP.
How many packet are sent in a burst and how large are the packets?
I may have to try using TCP, but I am concerned that the overhead (and latency) will be a problem as synchronization is an important factor for me. I may also try implementing some form of flow control on top of UDP. I think ultimately the solution would be for me to change my design (which means new boards) to use the W5200 or W5300 which has a large enough RX buffer to handle the bursts. In other words, my driver can sustain 5x the average data rate, but not the instantaneous data rate possible using a 100Mbps network.
You asked a question: I'm sending 50 packets of 130 bytes each.That's more than the 4kB buffer I have available. To verify my theory before I design a new board around a Wiz chip with larger memory, I'll attempt to rip out all dependencies on the other socket I'm currently using so I can dedicate all 8kB of RX memory to this one socket, and that should reduce if not totally stop dropped packets (since a full burst is 6500 bytes). Unfortunately, that task will be difficult to accomplish.
A question for you: How did you get a timestamp in your packets to determine the data burst rate? Or, are you counting Propeller clocks in between polling for the next packet to be ready?
Thanks.
I've been testing with 1024 byte chunks of data. If I add dummy processing in the Rx loop, I have to increase the packet to packet interval else I'll get lost packets. With a bare bones receiver, I need ~2.9ms between 1024 byte packets in order move the data and set the buffer pointers. My SPI driver take 7072 ticks to move the data but higher level processing takes much longer, a factor of 100, to get ready for the next packet. I'm positive, my bottleneck is due to spin code that handles resetting the Rx pointers. Which I should move to PASM.
Using 50 130 byte messages my setup requires 1.5ms between packets to move the data to RAM successfully without loss using one socket.
With my setup the W5200 is only slightly faster. The bottleneck remains the setup for the next packet.
WireShark packet sniffer provides the timestamp, a custom application running on a PC and a custom Prop app do the reset. The PC app lets me adjust the packet size and packet frequency plus it IDs the packets sent to the prop. The prop app counts the good packets received. Frequency resolution is 417 picoseconds on my box.
FYI (maybe it will help) my driver is entirely in PASM (including managing the Wiz pointers). I need 1731 ticks in total to read, process, and offload received data payload as well as moving the Wiz buffer pointers for the next RX operation (including sending the RECV command). The Wiz module doesn't care about the data being read, only what the pointer locations are. As I said, I am not the expert here, but no matter how fast you read the data off the Wiz module, all that matters is setting the buffer pointers to allow more data to be pushed into the RX buffer. If that's true, couldn't your problem still be related to an overflow issue?
Are you polling the socket interrupt register (Sn_IR) to determine when data has been received or are you reading the socket received size register (Sn_RX_RSR)? I decided to poll the interrupt register because of speed considerations (I can determine the interrupt status reading one byte but need to read both bytes of the RSR to determine if there is unread data). My implementation *mostly* works, so I don't think this is the problem, but I've never been able to prove that if I read one packet, set the RX buffer pointers properly, and send the RECV command, that the socket interrupt register will still be set (when I poll next) if there was more than one packet received and I only processed one (which would be evident if the RX read buffer pointer was still less than the RX buffer write pointer).
EDIT: I just reread the Wiz datasheet and it clearly states that the socket data received interrupt register bit will remain set if the RECV command is sent and not all existing data has been read. Oh well, still looking for clues...
Thanks again for the replies.
Not sure I understand. What did you do to be able to move data off the Wiz module so quickly, I can't do that even by using Indirect mode.
Also, how are you able to predict the overflow? That's what I was looking for. There is a RX write pointer available, and I was thinking that if I see it is within some threshold (like a maximum expected packet size) of the RX read pointer, there is risk of an overflow. This would be an expensive operation to put into the critical loop though.
I poll the the Rx buffer for bytes. If the there's one 1k of data and I'm expecting 130 byte packets, there is a very good chance of an overflow. It was pretty late last night when I was testing and have not had a chance to get back to it. I would assume your setup would have the same type of situation where the bytes to read are > [expected packet size] x threshold.
The other alternative is to create a buffer that can handle max bytes.
What about TCP? Lossless and ordered... A little slower at burst rates but you would have the data.
What do you do with 6.5k worth of 130 byte messages? Are the messages processed?
.
That's my other option. It will take a lot of work to try it (I'll need an almost 100% PASM driver for TCP) and I'm pretty sure there will be latency/synchronization problems. But if I run out of other options I will try that. I also looked into using a different Wiz chip; the W5200 would work for me, but it requires a more complex startup procedure (to configure Indirect Addressing mode) and is not available in an evaluation module that supports Indirect mode - so I'd need to do a total redesign of my board. The W5300 will work, and does not require the complex startup procedure to set Indirect mode - but it would require one additional address pin, which means a board redesign (but not total redesign).
I do some minimal processing, then immediately offload to hub RAM where other cogs do the more complex data processing.