I recently encountered a server application UDP packet loss in my work, and after reviewing a lot of information in the process of troubleshooting, I summarized this article for more people’s reference.
Before we start, let’s use a diagram to explain the process of receiving network messages on a linux system.
- first, the network message is sent to the NIC(Network Interface Controller) through the physical network cable
- the network driver reads the message from the network into the ring buffer, a process that uses DMA (Direct Memory Access) and does not require CPU participation
- the kernel reads the message from the ring buffer, processes it, executes the IP and TCP/UDP layer logic, and finally puts the message into the application’s socket buffer
- the application reads the message from the socket buffer for processing
In the process of receiving UDP messages, any of the processes in the diagram may actively or passively discard the messages, so packet loss may occur in the NIC and driver, or in the system and application.
The reason why the send data process is not analyzed is because the send process is similar to the receive, only in the opposite direction; in addition the probability of losing messages in the send process is smaller than in the receive, and only occurs when the application sends messages at a rate greater than the kernel and NIC processing rate.
This article assumes that the machine has only one interface with the name eth0
, if there are more than one interface or the interface name is not eth0, please analyze it according to the actual situation.
NOTE: RX
(receive) in this article means receive message, TX
(transmit) means send message.
Confirm that a UDP packet drop is occurring
To see if the NIC is dropping packets, use ethtool -S eth0
to see if there is data in the fields corresponding to bad
or drop
in the output, which should normally be 0. If you see the number growing, the NIC is dropping packets.
Another command to check the packet loss data of the NIC is ifconfig
, which will output statistics of RX
(receive incoming packets) and TX
(transmit outgoing packets).
In addition, the linux system also provides packet loss information for each network protocol, which can be viewed using the netstat -s
command, plus -udp
to see only UDP-related packet data.
|
|
For the above output, look at the following information to see how UDP packet loss is occurring.
packet receive errors
is not empty and keeps growing indicating that the system has UDP packet losspackets to unknown port received
indicates that the target port where the UDP message was received is not being listened to, which is usually caused by the service not being started and does not cause serious problemsreceive buffer errors
indicates the number of packets lost because the UDP receive buffer is too small
NOTE : It is not a problem if the number of packets lost is not zero. For UDP, if there is a small amount of packet loss it is likely to be the expected behavior, such as a packet loss rate (number of packets lost/number of received messages) of 1 in 10,000 or less.
NIC or driver packet loss
As mentioned before, if there are rx_***_errors
in ethtool -S eth0
then it is likely that there is a problem with the NIC that is causing the system to drop packets and you need to contact the server or NIC vendor to deal with it.
netstat -i
will also provide the packet reception and drop of each NIC, normally the output should be 0 for error or drop.
If there is no problem with the hardware or driver, usually the NIC drops packets because the ring buffer is too small, you can use the ethtool
command to view and set the ring buffer of the NIC.
ethtool -g
can view the ring buffer of a particular NIC, like the example below.
Pre-set indicates the maximum ring buffer value of the NIC, which can be set using ethtool -G eth0 rx 8192
.
Linux system packet loss
There are many reasons for packet loss on linux systems, the common ones are: UDP message error, firewall, insufficient UDP buffer size, excessive system load, etc. Here we analyze these reasons for packet loss.
UDP packet error
If the UDP message is modified during transmission, it will lead to checksum error or length error, which will be verified by linux when receiving the UDP message, and the message will be discarded once the error is invented.
If you want the UDP message checksum to be sent to the application even if there is an error, you can disable UDP checksum checking by using the socket parameter.
Firewall
If the system firewall drops packets, the behavior is generally all UDP messages are not received properly, of course, do not exclude the possibility that the firewall only drop a part of the message.
If you encounter a very large packet loss rate, please check the firewall rules first to ensure that the firewall does not actively drop UDP messages.
UDP buffer size is not enough
After receiving a message, the linux system saves the message in the buffer. Since the size of the buffer is limited, if a UDP message is too large (exceeding the buffer size or MTU size), or if it is received at too fast a rate, it may cause linux to drop packets because the buffer is full.
At the system level, linux sets the maximum value that can be configured for the receive buffer, which can be seen in the following file, and is generally an initial value set by linux at boot time based on memory size.
- /proc/sys/net/core/rmem_max: the maximum value of receive buffer allowed to be set
- /proc/sys/net/core/rmem_default: the default receive buffer value to be used
- /proc/sys/net/core/wmem_max: the maximum value of send buffer allowed * /proc/sys/net/core/wmem_max: the maximum value of send buffer allowed
- /proc/sys/net/core/wmem_dafault: the maximum value of send buffer to be used by default
However, these initial values are not intended for heavy UDP traffic, and need to be increased if the application receives and sends a lot of UDP messages. You can use the sysctl
command to make it effective immediately.
|
|
You can also modify the corresponding parameters in /etc/sysctl.conf
to keep them in effect the next time you boot.
If a message is too large, you can split the data on the sender side to ensure that each message is within the MTU size.
Another parameter that can be configured is netdev_max_backlog
, which indicates the number of messages that the linux kernel can cache after reading messages from the NIC driver, the default is 1000 and can be increased to, for example, 2000.
|
|
System load is too high
High system CPU, memory and IO load may cause network packet loss. For example, if the CPU load is too high, the system does not have time to perform checksum calculation of messages, copy memory and other operations, which may lead to packet loss from the NIC or socket buffer; if the memory load is too high, the application will be too slow to process messages in time; if the IO load is too high, the CPU is used to respond to IO wait and does not have time to process the UDP messages in the cache.
The linux system itself is an interconnected system, and problems with any one component may affect the normal operation of other components. For excessive system load, either the application is faulty or the system is inadequate. For the former, you need to find, debug and fix it in time; for the latter, you also need to find and expand the capacity in time.
Application packet loss
Above mentioned the system UDP buffer size, the regulated sysctl parameter is only the maximum allowed by the system, each application needs to set its own socket buffer size value when creating a socket.
The linux system puts the received messages into the socket’s buffer, and the application reads messages from the buffer continuously. So there are two application-related factors that affect packet loss: the socket buffer size and the speed at which the application reads the packets.
For the first problem, you can set the socket receive buffer size when the application initializes the socket, for example, the following code sets the socket buffer to 20MB.
If you do not write and maintain the program yourself, it is not a good or even possible thing to modify the application code. Many applications provide configuration parameters to adjust this value, please refer to the corresponding official documentation; if no configuration parameters are available, you can only raise an issue to the developer of the application.
Obviously, increasing the receive buffer of the application will reduce the possibility of packet loss, but it will also cause the application to use more memory, so it needs to be used with caution.
Another factor is the speed at which the application reads the messages in the buffer, which should be processed asynchronously for the application
Where packets are dropped
If you want to know in detail which function is dropping packets on your linux system, you can use the dropwatch
utility, which listens for packet drops and prints out the address of the function where the drop occurred.
|
|
By using this information and finding the corresponding kernel code, you can find out at which step the kernel drops the packet and the general reason for the packet drop.
In addition, you can use the linux perf utility to listen for the kfree_skb
(which is called when a network message is dropped) event.
There are many articles available on the web about the use and interpretation of the perf command.
Summary
- UDP is a connectionless and unreliable protocol, which is suitable for scenarios where occasional loss of messages does not affect the program state, such as video, audio, games, monitoring, etc. Of course, you can also do retry and de-duplication in the application layer to ensure reliability.
- If you find that the server is losing packets, first check whether the system load is too high through monitoring, and then try to reduce the load to see if the packet loss problem disappears.
- If the system load is too high, there is no effective solution for UDP packet loss. If the CPU, memory and IO are too high due to abnormal application, please locate the abnormal application and fix it in time; if the resources are not enough, the monitoring should be able to find it in time and expand the capacity quickly.
- If the system receives or sends a lot of UDP messages, you can reduce the probability of packet loss by adjusting the socket buffer size of the system and the program.
- Applications should use asynchronous methods when processing UDP messages, and not have too much processing logic between the two received messages