HTTP (Hypertext Transfer Protocol) has become the most commonly used application layer protocol on the Internet, however, it is only a network protocol for transmitting hypertext and does not provide any security guarantees, using plaintext to transmit packets over the Internet makes eavesdropping and man-in-the-middle attacks possible. Transmitting passwords over HTTP is actually similar to running around naked on the Internet.
Netscape designed the HTTPS protocol in 1994 to secure data transfers using Secure Sockets Layer (SSL), and with the development of Transport Layer Security (TLS), we now use TLS instead of the deprecated SSL protocol, although the term SSL certificate is still used.
HTTPS is an extension to the HTTP protocol that allows us to use it to securely transmit data over the Internet, however, the originator of an HTTPS request needs to go through a 4.5 times Round-Trip Time (RTT) to get a response from the recipient the first time. This article will detail the request initiation and response process and analyze why the HTTPS protocol requires a 4.5-RTT time to obtain a response from the service provider.
- TCP protocol - the two communicating parties establish a TCP connection via three handshakes.
- TLS protocol - the two communicating parties establish a TLS connection through four handshakes.
- HTTP protocol - the client sends a request to the server and the server sends back a response.
The analysis here is based on a specific version of the protocol implementation and common scenarios. As the network technology evolves, we are able to reduce the number of network communications needed, and some common optimizations will be mentioned in the corresponding sections of this article.
TCP
HTTP protocol as an application layer protocol, it needs the underlying transport layer protocol to provide its basic data transfer function, HTTP protocol will generally use TCP protocol as the underlying protocol. To discourage incorrectly established historical connections, the two parties communicating over the TCP protocol establish a TCP connection via three handshakes, and we briefly review the entire process of TCP connection establishment here.
- the client sends to the server a data segment with
SYN
and the initial sequence numberSEQ = 100
for the client to start sending the data segment (Segment). - when the server receives the data segment, it sends the client the data segment with
SYN
andACK
.- confirming the initial sequence number of the client’s data segment by returning
ACK = 101
. - informing the client that the server has started sending the initial sequence number of the data segment by sending
SEQ = 300
.
- confirming the initial sequence number of the client’s data segment by returning
- the client sends the data segment with
ACK
to the server, confirming the initial sequence number of the server, which containsACK = 301
.
The two sides of a TCP connection determine the initial sequence number, window size, and maximum data segment of the TCP connection through three handshakes, so that the two communicating sides can use the initial sequence number in the connection to ensure that both sides do not overlap data segments, control traffic through the window size, and use the maximum data segment to avoid packet fragmentation by the IP protocol.
The original version of the TCP protocol did establish TCP connections over three communications, and three handshakes are also unavoidable in most current scenarios, although TCP Fast Open (TFO), proposed in 2014, can establish TCP connections over one communication in some scenarios
The TCP fast-start policy uses a TFO cookie stored on the client to quickly establish a connection with the server. The client will cache the cookie and when it re-establishes a connection with the server, it will use the stored cookie to establish a TCP connection directly, and the server will send SYN
and ACK
to the client after verifying the cookie and start transmitting data, which will reduce the number of communications.
TLS
The role of TLS is to build a secure transport channel on top of the reliable TCP protocol, which itself does not provide reliability guarantee, we still need a lower layer reliable transport layer protocol. After a reliable TCP connection is established between the two parties, we need to exchange keys between them via TLS handshake, and here we will introduce the connection establishment process of TLS 1.2
- the client sends a Client Hello message to the server with the protocol version, encryption algorithm, compression algorithm and ** random number** generated by the client.
- after the server receives the information about the protocol version, encryption algorithm, etc. supported by the client.
- sends a Server Hello message to the client carrying the selected specific protocol version, encryption method, session ID, and server-generated random number; and
- sending to the client the Certificate message, i.e., the server-side certificate chain, which contains information such as the domain name, issuer and validity period supported by the certificate.
- send a Server Key Exchange message to the client, passing public key and signature information.
- send an optional message CertificateRequest to the client to verify the client’s certificate.
- send the Server Hello Done message to the client to notify the server that all relevant information has been sent.
- the client receives the protocol version, encryption method, session ID and certificate of the server and verifies the certificate of the server.
- sending a Client Key Exchange message to the server containing a random string encrypted with the server’s public key, the Pre Master Secret.
- send a Change Cipher Spec message to the server, informing the server that the subsequent data segments will be encrypted for transmission.
- send a Finished message to the server containing the encrypted handshake information.
- upon receipt of the Change Cipher Spec and Finished messages on the server side.
- sends a Change Cipher Spec message to the client informing the client that the subsequent data segments will be transmitted encrypted.
- send a Finished message to the client to verify the client’s Finished message and complete the TLS handshake.
The key to the TLS handshake is to use the random string generated by both sides of the communication and the public key of the server to generate a negotiated key, and both sides of the communication can use this symmetric key to encrypt the message to prevent man-in-the-middle listening and attacks, ensuring the security of the communication.
In TLS 1.2, we need 2-RTT to establish a TLS connection, but TLS 1.3 reduces the time required to establish a TLS connection by optimizing the protocol from two round-trip delays to one, allowing the client to transmit application layer data to the server after 1-RTT.
In addition to reducing the network overhead under the regular handshake, TLS 1.3 also introduces a 0-RTT connection establishment process; 60% of network connections are established when users visit a website for the first time or after a period of time, and the remaining 40% can be solved by TLS 1.3’s 0-RTT policy, which is, however, similar to However, the implementation principle of this policy is similar to that of TFO, which is achieved by reusing sessions and caching, so there is a certain security risk, and it should be used in conjunction with the specific business scenarios.
HTTP
Transferring data over established TCP and TLS channels is relatively straightforward, and the HTTP protocol can be used directly to transfer data over reliable, secure channels established at lower layers. The client writes data to the server via the TCP socket interface, and the server receives the data, processes it, and returns it via the same route. Because the entire process requires the client to send a request and the server to return a response, the time taken is 1-RTT.
The data exchange of the HTTP protocol consumes only 1-RTT, and when the client and server handle only one HTTP request, we can no longer optimize from the HTTP protocol itself. However, as the number of requests grows, HTTP/2 can reduce the additional overhead of TCP and TLS handshakes by reusing the established TCP connections.
Summary
When the client wants to access the server via HTTPS request, the whole process requires 7 handshakes and consumes 9 times the latency. If the client and server are limited by physical distance and the RTT is about 40ms, the first request will take ~180ms; however, if we want to access a server in the US and the RTT is about 200ms, then the HTTPS request will take ~900ms, which is a relatively high latency. Let’s summarize the reasons why the HTTPS protocol requires 9 times the latency to complete communication: 1.
- the TCP protocol requires three handshakes to establish a TCP connection to ensure the reliability of the communication (1.5-RTT).
- the TLS protocol uses four handshakes on top of TCP to establish a TLS connection to ensure the security of the communication (2-RTT).
- the HTTP protocol sends requests and receives responses over TCP and TLS in one round trip (1-RTT).
It should be noted that the calculation of round-trip latency in this paper is based on a specific scenario and a specific protocol version, which is constantly updated and evolving.
HTTP/3 is one such example, which uses the UDP-based QUIC protocol for handshaking, combining the TCP and TLS handshaking processes, reducing 7 handshakes to 3 handshakes, directly establishing a reliable and secure transport channel, and reducing the original ~900ms time to ~500ms, as we will describe in a later article about the HTTP/3 protocol. We will cover the HTTP/3 protocol in a later article. Finally, let’s look at some of the more open questions that interested readers can ponder.
- What are the similarities and differences between the QUIC and TCP protocols as transport layer protocols?
- Why is it possible to establish a client-server connection via 0-RTT?