How can I detect Wiznet RX buffer overrun when receiving UDP packets?

ags · 2013-02-23 18:06

Can anyone point me towards an example of how to detect if enough UDP packets have been transmitted by a peer, so quickly that they have overrun the Wiz RX buffer? I'd like to accomplish this without inspecting the payload of the UDP packet. (I'm using a WIZ812 module)

I have never seen a partial-length packet so I presume if all of a UDP packet cannot be stored the entire packet is discarded. Can anyone confirm that?

Thanks.

Mike G · 2013-02-23 18:25

ags, UDP is not guaranteed. If the target does not receive a UDP packet, for any reason, the packet is lost forever. This is the nature of UDP.

ags · 2013-02-24 07:15

@MikeG: Yes, I understand that is a limitation of the UDP protocol. However, on a LAN, with only switches and using wired media, those losses should be minimal. I'm trying to detect a very specific condition. In some cases, a burst of UDP packets may be transmitted so fast that it exceeds the ability of the Wiznet module to process the data and move from its RX buffer to the Propeller's hub RAM. My question is whether there is any way to detect this?

I see four possible options (in theory):

1) The Wiznet module detects that it did receive a UDP packet but was unable to store it in RX buffer and some flag is set to indicate that. I've read the entire datasheet many times and don't see anything like this so I'm ruling it out (but would be happy to be proven wrong).

2) The UDP packet is processed and as much as can fit in the remaining RX buffer is stored. That would be likely to result in some packets occasionally being truncated (containing only a portion of the original payload). I have not observed this in operation and think it is also not what happens.

3) The UDP packet simply overwrites the Wiznet RX buffer, effectively wrapping around and overwriting the oldest previously-received data. This seems to me plausible but unlikely. One way I can think of to prove/disprove this would be to blast a huge number of unique UDP packets to the Wiznet, (much more than the RX buffer can hold) with a delay before clearing any to the Propeller hub RAM. Inspection of the first packet would indicate if it is the first UDP packet sent or a later one that overwrote the original. This will take some time for me to implement given my current system implementation but I can do so if necessary. If someone has already done a similar experiment or found some proof one way or the other I'd be happy not to replicate his/her work.

4) Even if a UDP packet is received intact (i.e. in some temporary internal working buffer used before moving to the application-accessible RX buffer), if there is insufficient memory in the Wiznet RX buffer, the UDP packet is simply tossed out with no remaining artifacts indicating that. I suspect this to be the case, but wanted to clarify my question and see if my presumption is correct.

If (as I suspect) #4 is the case, that has significant implications for my project. I've been able to develop a custom (single-purpose) driver for the Wiznet (W5100 using indirect addressing mode) that can receive and move data to the Propeller hub RAM at sustained speeds of 12Mbps. The transmitter sends data at an average speed of 2.6 Mbps; however, it does so in bursts, which may exceed the 4KB Wiznet RX buffer (I use two sockets so this is the maximum available RX buffer size that I can configure). Unless I can find a way to throttle the TX burst size, I'll always lose data.

Thanks for the reply.

Mike G · 2013-02-24 10:29

4) Even if a UDP packet is received intact (i.e. in some temporary internal working buffer used before moving to the application-accessible RX buffer), if there is insufficient memory in the Wiznet RX buffer, the UDP packet is simply tossed out with no remaining artifacts indicating that. I suspect this to be the case, but wanted to clarify my question and see if my presumption is correct.

I just happen to be running performance tests and can tested #4. My UPD SPI library is much slower but I believe the test is relevant. Socket 0 set to UDP with a 2k buffer.

The receive code is always looking for data. As soon as the receive logic sees data in the Rx buffer it will transfer the data to RAM.

Send 248 16 byte messages as fast as my PC will transmit UDP packets. The time delta between packet timestamps is 31.4uS. Very few packets are received. Somewhere around 4. Usually the last packet received is number 64. Not sure how relevant that is

Send 10 16 bytes message at 31us. Received around 2 packets.

Send 248 16 byte messages with a 1ms delay between each message. All packets received without error.

Send 248 1110 byte messages at max PC transmit speed. Very few packets are received usually in the 20s.

Send 248 11110 byte message with a 1ms delay and all 248 packets are received.

The Rx logic is pretty lean. For my setup, if processing logic is added, the time delay between packets must increase else packets will drop. Indeed a PC can fire packets too fast for the receive logic to handle. I don't think it's the buffer overflowing though. I think it's the time it takes for the Rx logic to process a packet and get ready for the next packet. I can't say this is the case for your setup as I have no idea how your code works.

I tried using all 4 sockets but that did not help there were still dropped packets at full speed. However, the 1ms delay always works. I'd have to modify my test code to figure out the delay threshold.

I sounds like you want lossless UDP. You'd have to implement some kind of hand shake or maybe try TCP.

How many packet are sent in a burst and how large are the packets?

ags · 2013-02-24 15:01

Mike, while I don't have as much specific data as you do, the symptoms I see in my implementation seem aligned with your findings. I'm not sure why you say this:

Indeed a PC can fire packets too fast for the receive logic to handle. I don't think it's the buffer overflowing though. I think it's the time it takes for the Rx logic to process a packet and get ready for the next packet.

It seems to me the reason the RX logic processing time matters at all is because if it can't keep up with the incoming data rate, eventually the RX buffer will overflow and packets will be dropped (if my belief that scenario #4 in post #3 is correct)

I may have to try using TCP, but I am concerned that the overhead (and latency) will be a problem as synchronization is an important factor for me. I may also try implementing some form of flow control on top of UDP. I think ultimately the solution would be for me to change my design (which means new boards) to use the W5200 or W5300 which has a large enough RX buffer to handle the bursts. In other words, my driver can sustain 5x the average data rate, but not the instantaneous data rate possible using a 100Mbps network.

You asked a question: I'm sending 50 packets of 130 bytes each.That's more than the 4kB buffer I have available. To verify my theory before I design a new board around a Wiz chip with larger memory, I'll attempt to rip out all dependencies on the other socket I'm currently using so I can dedicate all 8kB of RX memory to this one socket, and that should reduce if not totally stop dropped packets (since a full burst is 6500 bytes). Unfortunately, that task will be difficult to accomplish.

A question for you: How did you get a timestamp in your packets to determine the data burst rate? Or, are you counting Propeller clocks in between polling for the next packet to be ready?

Thanks.

Mike G · 2013-02-24 17:08

It seems to me the reason the RX logic processing time matters at all is because if it can't keep up with the incoming data rate, eventually the RX buffer will overflow and packets will be dropped (if my belief that scenario #4 in post #3 is correct)

That's not the case on my end. I get lost packets sending 10 16 byte messages. Which is not even close to overflowing the buffer. As a matter of fact, the size of the packet does not seem to matter as much as the the gap between packets.

I've been testing with 1024 byte chunks of data. If I add dummy processing in the Rx loop, I have to increase the packet to packet interval else I'll get lost packets. With a bare bones receiver, I need ~2.9ms between 1024 byte packets in order move the data and set the buffer pointers. My SPI driver take 7072 ticks to move the data but higher level processing takes much longer, a factor of 100, to get ready for the next packet. I'm positive, my bottleneck is due to spin code that handles resetting the Rx pointers. Which I should move to PASM.

Using 50 130 byte messages my setup requires 1.5ms between packets to move the data to RAM successfully without loss using one socket.

I may have to try using TCP, but I am concerned that the overhead (and latency) will be a problem as synchronization is an important factor for me. I may also try implementing some form of flow control on top of UDP. I think ultimately the solution would be for me to change my design (which means new boards) to use the W5200 or W5300 which has a large enough RX buffer to handle the bursts. In other words, my driver can sustain 5x the average data rate, but not the instantaneous data rate possible using a 100Mbps network.

With my setup the W5200 is only slightly faster. The bottleneck remains the setup for the next packet.

A question for you: How did you get a timestamp in your packets to determine the data burst rate? Or, are you counting Propeller clocks in between polling for the next packet to be ready?

WireShark packet sniffer provides the timestamp, a custom application running on a PC and a custom Prop app do the reset. The PC app lets me adjust the packet size and packet frequency plus it IDs the packets sent to the prop. The prop app counts the good packets received. Frequency resolution is 417 picoseconds on my box.

ags · 2013-02-24 17:45

Thanks for another helpful reply Mike. It sounds like you have some serious instrumentation to measure/diagnose your implementation. I have WireShark, and a custom app that is sending the packets. I'll have to think about how I might use them.

FYI (maybe it will help) my driver is entirely in PASM (including managing the Wiz pointers). I need 1731 ticks in total to read, process, and offload received data payload as well as moving the Wiz buffer pointers for the next RX operation (including sending the RECV command). The Wiz module doesn't care about the data being read, only what the pointer locations are. As I said, I am not the expert here, but no matter how fast you read the data off the Wiz module, all that matters is setting the buffer pointers to allow more data to be pushed into the RX buffer. If that's true, couldn't your problem still be related to an overflow issue?

Are you polling the socket interrupt register (Sn_IR) to determine when data has been received or are you reading the socket received size register (Sn_RX_RSR)? I decided to poll the interrupt register because of speed considerations (I can determine the interrupt status reading one byte but need to read both bytes of the RSR to determine if there is unread data). My implementation *mostly* works, so I don't think this is the problem, but I've never been able to prove that if I read one packet, set the RX buffer pointers properly, and send the RECV command, that the socket interrupt register will still be set (when I poll next) if there was more than one packet received and I only processed one (which would be evident if the RX read buffer pointer was still less than the RX buffer write pointer).

EDIT: I just reread the Wiz datasheet and it clearly states that the socket data received interrupt register bit will remain set if the RECV command is sent and not all existing data has been read. Oh well, still looking for clues...

Thanks again for the replies.

Mike G · 2013-02-24 21:12

I have to admit that my test rig masked what was happening. Turns out I can receive data at full speed. I should be able to predict an overflow fairly reliably. I have more tests to run.

ags · 2013-02-24 22:14

That's good news for you. I spent most of the day running tests that continue to point towards a buffer overflow problem. (although not conclusively). I coded up a separate cog that flashes an LED when the Wiznet driver cog detects an out-of-sequence packet and sets a flag in hub RAM. I can pretty reliably connect the overall size of the data burst, the out-of-sequence LED flashing, and the incorrect behavior that is the final outcome.

ags · 2013-02-25 11:20

Mike G wrote: »

I have to admit that my test rig masked what was happening. Turns out I can receive data at full speed. I should be able to predict an overflow fairly reliably. I have more tests to run.

Not sure I understand. What did you do to be able to move data off the Wiz module so quickly, I can't do that even by using Indirect mode.

Also, how are you able to predict the overflow? That's what I was looking for. There is a RX write pointer available, and I was thinking that if I see it is within some threshold (like a maximum expected packet size) of the RX read pointer, there is risk of an overflow. This would be an expensive operation to put into the critical loop though.

Mike G · 2013-02-25 16:04

Not sure I understand. What did you do to be able to move data off the Wiz module so quickly, I can't do that even by using Indirect mode.

I can't stop an overflow only mitigate it as much as possible. We're at the mercy of network spreed and available MIPS.

Also, how are you able to predict the overflow? That's what I was looking for. There is a RX write pointer available, and I was thinking that if I see it is within some threshold (like a maximum expected packet size) of the RX read pointer, there is risk of an overflow. This would be an expensive operation to put into the critical loop though.

I poll the the Rx buffer for bytes. If the there's one 1k of data and I'm expecting 130 byte packets, there is a very good chance of an overflow. It was pretty late last night when I was testing and have not had a chance to get back to it. I would assume your setup would have the same type of situation where the bytes to read are > [expected packet size] x threshold.

The other alternative is to create a buffer that can handle max bytes.

What about TCP? Lossless and ordered... A little slower at burst rates but you would have the data.

What do you do with 6.5k worth of 130 byte messages? Are the messages processed?
.

ags · 2013-02-27 17:52

Mike G wrote: »

The other alternative is to create a buffer that can handle max bytes.

I can test that by dedicating all 8k Wiz memory to this socket, but that would just be to validate the theory. I can't support the overall design with just one socket available (need two).

What about TCP? Lossless and ordered... A little slower at burst rates but you would have the data.

That's my other option. It will take a lot of work to try it (I'll need an almost 100% PASM driver for TCP) and I'm pretty sure there will be latency/synchronization problems. But if I run out of other options I will try that. I also looked into using a different Wiz chip; the W5200 would work for me, but it requires a more complex startup procedure (to configure Indirect Addressing mode) and is not available in an evaluation module that supports Indirect mode - so I'd need to do a total redesign of my board. The W5300 will work, and does not require the complex startup procedure to set Indirect mode - but it would require one additional address pin, which means a board redesign (but not total redesign).

What do you do with 6.5k worth of 130 byte messages? Are the messages processed?

I do some minimal processing, then immediately offload to hub RAM where other cogs do the more complex data processing.

How can I detect Wiznet RX buffer overrun when receiving UDP packets?

Comments