The TCP protocol contains 11 different states, and the TCP connection transitions states based on the messages sent or received. The state machine shown below illustrates all possible transitions, including not only the state transition process under normal conditions, but also the state transition under abnormal conditions.
Both parties communicating using the TCP protocol will trigger the TIME_WAIT
state when closing the connection. The operation of closing the connection actually tells the other party to the communication that it has no data to send, but it still maintains the ability to receive data from the other party, a common process of closing the connection is as follows.
- when the client has no data to send, it sends a
FIN
message to the server, and after sending the message it enters theFIN_WAIT_1
state. - when the server receives a
FIN
message from the client, it will enter theCLOSE_WAIT
state and send anACK
message to the client, and the client will enter theFIN_WAIT_2
state when it receives theACK
message. - the server sends a
FIN
message to the client when there is no data to be sent on the server side. - when the client receives a
FIN
message, it enters theTIME_WAIT
state and sends anACK
message to the server, which receives it and enters theCLOSED
state. - the client also enters the
CLOSED
state after waiting for two maximum segment lifetime (Maximum segment lifetime (MSL)) times.
From the above, we can see that TIME_WAIT
appears only on the actively disconnected side, while the passively disconnected side goes directly to the CLOSED
state, and the client that enters TIME_WAIT
needs to wait for 2 MSL before it can actually close the connection. The reason why the TCP protocol requires the TIME_WAIT
state is the same as the reason why a client needs to wait for two MSLs before it can directly enter the CLOSED
state.
- prevent delayed data segments from being received by other TCP connections using the same source address, source port, destination address, and destination port.
- guaranteeing that a TCP connection is properly closed remotely, i.e., waiting for the
ACK
message corresponding toFIN
to be received by the party passively closing the connection.
Both of the above reasons are relatively simple, so let’s expand on some of the possible problems behind them.
Blocking Delayed Data Segments
Each TCP data segment contains a unique sequence number. This sequence number ensures the reliability and sequential nature of the TCP protocol, and without regard to sequence number overflow zeroing, sequence number uniqueness is an important convention in the TCP protocol that can cause confusing phenomena and results when this rule is violated. To ensure that the data segment of a new TCP connection does not duplicate the data segment of a historical connection still in transit on the network, a TCP connection needs at least the maximum time that a silent data segment can survive on the network before a new sequence number is assigned, i.e., MSL
To be sure that a TCP does not create a segment that carries a sequence number which may be duplicated by an old segment remaining in the network, the TCP must keep quiet for a maximum segment lifetime (MSL) before assigning any sequence numbers upon starting up or recovering from a crash in which memory of sequence numbers in use was lost.
In the TCP connection shown above, the SEQ = 301
message sent by the server is not received until after the TCP connection is closed due to network delays; the SEQ = 301
message is sent to the client when a TCP connection using the same port number is reused, yet this expired message may be received normally by the client, which poses a more serious problem, so we should be very careful when adjusting the TIME_WAIT
policy and must be clear about what we are doing.
RFC 793 states that TCP connections need to wait 2 times the MSL in TIME_WAIT
, but it does not explain where the double comes from.
The RFC 793 documentation sets the MSL time to 120 seconds, or two minutes, however this is not a tightly extrapolated value, but rather an engineering choice, and there is no problem if we are asked to change the OS settings based on the service’s historical experience; in fact, earlier versions of Linux started setting the wait time for TIME_WAIT
TCP _TIMEWAIT_LEN
to 60 seconds in order to more quickly reuse TCP connection resources
On Linux, clients can establish connections to remote servers using port numbers 32,768 to 61,000, for a total of 28,232 port numbers, and applications can choose from any of nearly 30,000 port numbers.
However, if the host has created more than 28,232 TCP connections to a specific port on the target host in the last minute, then an error will occur if a new TCP connection is created, which means that if we do not adjust the host’s configuration, then the maximum number of TCP connections that can be created per second is ~470
Guaranteeing Connection Closure
From the definition of the TIME_WAIT
state in RFC 793, we can find another important role for this state, waiting long enough to make sure that the remote TCP connection has received the ACK
corresponding to its outgoing termination message FIN
.
TIME-WAIT - represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.
If the client does not wait long enough to re-establish a TCP connection with the server when the server has not received the ACK message, this will cause the following problem - the server will still consider the current connection as legitimate because it has not received the ACK message, and the client will receive an RST message from the server when it resends the SYN message to request a handshake, and the connection establishment process will be terminated.
By default, if the client waits long enough it will encounter either
- the server receives the
ACK
message normally and closes the current TCP connection. - the server does not receive the
ACK
message, resendsFIN
to close the connection and waits for a newACK
message.
As long as the client waits for 2 MSL, the connection between the client and the server is closed normally, and the probability that a newly created TCP connection will be affected is negligible, ensuring the reliability of data transmission.
Summary
There are some scenarios where a 60-second wait for destruction is really unacceptable, e.g., highly concurrent stress tests. When we test the throughput and latency of a remote service with concurrent requests, a large number of TCP connections in the TIME_WAIT
state can be generated locally, and active connections can be viewed on macOS using the command shown below.
|
|
When we stress test the server with thousands of concurrent connections on the host, these connections for stress testing will quickly consume the TCP connection resources on the host and almost all TCP will be in TIME_WAIT
state waiting to be destroyed. If we do encounter a situation where we have to deal with a TIME_WAIT
state on a single machine, then this can be handled in several ways.
- use the
SO_LINGER
option and set the staging timel_linger
to 0. At this point, if we close the TCP connection, the kernel will simply discard all the data in the buffer and send aRST
message to the server to directly terminate the current connection. - use the
net.ipv4.tcp_tw_reuse
option to allow the kernel to reuse TCP connections that are in theTIME_WAIT
state via the TCP timestamp option. - modify the available port range in the
net.ipv4.ip_local_port_range
option to increase the maximum number of TCP connections that can co-exist.
Note that another common TCP configuration item,
net.ipv4.tcp_tw_recycle
, has been removed in Linux 4.12, so we can no longer This configuration solves the problems caused by theTIME_WAIT
design.
The TIME_WAIT
state of TCP plays a very important role as it is an indispensable part of the TCP protocol reliability design, and if it can be solved by adding machines, then we need to understand the design rationale behind it and avoid modifying the default configuration as much as possible, as the Linux manual says, when modifying these configurations Here, let’s revisit the reason for the TIME_WAIT
state in the TCP protocol, which causes the following problems when re-establishing a connection to a remote using the same port number if the client is not waiting long enough.
- Because the network transmission time of a data segment is uncertain, it may receive a data segment that was not received on the last TCP connection.
- Because the
ACK
sent by the client may not have been received by the server, the server may still be in theLAST_ACK
state, so it will reply with aRST
message to terminate the establishment of a new connection.
The TIME_WAIT
state is the result of TCP’s struggle with uncertain network latency, and uncertainty is the biggest impediment to the TCP protocol on the road to reliability. To conclude, let’s look at some more open-ended related issues, and the interested reader can ponder the following questions.
- How does the
net.ipv4.tcp_tw_reuse
configuration guarantee the relative security of reused TCP connections via timestamps? - Why was the
net.ipv4.tcp_tw_recycle
configuration removed from the protocol stack by Linux?