HTTP is known as Hypertext Transfer Protocol, and there is a wide variety of information available on the Internet about the HTTP protocol, but most of them are listing the specific provisions of the HTTP protocol, and few of them are talking about the reasons why the HTTP protocol is designed this way. Today I will try to analyze the main features of the HTTP protocol from the perspective of problem solving, hoping to help you quickly understand the HTTP protocol.
HTTP is a protocol for transferring data over the network. We don’t want data to be lost or corrupted during transmission. That’s why HTTP chose TCP as the underlying network protocol, because TCP is a reliable transport layer protocol.
After establishing a TCP connection, the two communicating parties immediately find a new problem: what data does the server want to send to the client? So the client has to send what it wants to the server after the connection is established, which is called a “request”. This establishes the fundamental design of the HTTP protocol, which is a client-driven request-response protocol.
The client comes up and sends a “request” to the server. But it is possible that the server may not receive exactly the same content as the client. Wait, isn’t TCP a reliable transport protocol? How can the received data be different? This brings us to the issue of data segmentation. For example, if the client sends “abcdef”, the underlying TCP protocol may transmit “abc” and “def” twice, or several times. Regardless of the number of times, their order is fixed and exactly the same as the order sent by the client. The server side may receive multiple pieces of data, so the server side needs to “save” the received data and wait until all the client’s data is received to see the full picture of the client’s “request”.
And at what point is everything received? This is a fundamental problem in TCP communication. There are two schools of thought to solve this problem: length streams and separator streams.
The so-called length stream is the length of the data sent before the actual data is sent. The server side reads the length information first, and then “saves” the data that follows based on the length. Won’t the server encounter segmentation problems when reading the length? Actually, no, because TCP will only segment the longer data. The aforementioned “abcdef” segmentation is just an extreme example, and it hardly happens in practice. So, as long as the length of data sent first is not too long, the server can receive it at once. Even if the segmentation is true, the length data itself will be specified by the length stream protocol. For example, if two bytes are used to represent the length, that range is the range of the data length is 0-65535. the server can receive two bytes first, and then receive what follows according to the data length.
The biggest advantage of lengthflow is that it is simple to implement, memory efficient, and the server does not have to allocate much memory beforehand. But the disadvantage is also more prominent, the range of length is not flexible enough. If we specify a length field of two bytes, but then we cannot transfer data over 64k. However, if we specify the length field as eight bytes, it will cause waste when transferring shorter data.
In addition, length streams are less scalable. If we want to transfer other information beyond the length, such as data types, version numbers, and so on, we need to specify the length of these data in advance. Once the length is set, it is difficult to extend it later. The most typical length stream protocol is IP messaging. For those interested, see how the IP protocol specifies the length of data.
In view of the shortcomings of length streams, people have come up with splitter streams. Simply put, a special separator is used to represent the end of the data. The classic example is the C string, where the endings are represented by \0
. A server-side program using this genre has to keep receiving data from the client until it receives a certain splitter, which indicates that the complete “request” has been received.
Since there is no need to specify the length of data in advance, the splitter genre solves the problem of inflexible length range of length streams at once. The splitter genre protocol can receive data of arbitrary length. However, there is a price to be paid for the splitter school. Because the length is not fixed, the server must allocate a relatively large amount of memory or dynamically allocate memory multiple times, which results in a relatively large consumption of resources. A malicious user could fill up the server’s memory by constructing very long data.
But the HTTP protocol still incorporates this genre, and the split it uses is \r\n
. Here \r
stands for carriage return, which means it tells the printer to return the print head to the leftmost position. \n
means line feed, which means that the printer moves the paper up one line, ready to print a new real character. Computers in the ancient times did not use the current LCD screen, and used teleprinter to “display” the content, so they needed to transmit \r\n
two characters. Now that these are obsolete, it is theoretically possible to use \n
, like Nginx, which supports only \n
.
So, the simplest HTTP request would look like this:
|
|
Here GET
is an anthropomorphic way of saying take something from a service. This is also the beginning of the design of HTTP semanticization (by semanticization I mean that ordinary people can understand it). This is followed by a space, and then the path to the file. And finally the separator \r\n
. Because the last is \r\n`, the data above is also called the request line.
The client sends the above data as soon as it establishes a connection with the server. The server will start to parse the data after receiving \r\n
, that is, extract /mypage.html
, then find the corresponding file and send the file content to the client.
At this point, the client receives the contents of the file sent by the server, also called the “response”. However, the client immediately faces the same problem as the server: how to be sure that the complete content of mypage.html has been received? Should the server send the splitter \r\n
at the end? No! Because the content of mypage.html may contain \r\n
itself, and if the client still uses \r\n
as the end marker, it may lose data.
For this reason Tim Berners-Lee (the father of the HTTP protocol) has adopted a simpler approach - closing the connection. That is, the server actively closes the TCP connection after the transfer is complete, so that the client knows explicitly that all the content has been transferred.
This is the original HTTP protocol, released around 1990. The HTTP protocol of this era is now called HTTP/0.9, mainly to distinguish it from the standardized 1.x that followed. And so the era of the World Wide Web began.
HTTP/0.9 has been widely used since its release. However, its functionality is too simple, so many browsers have extended on it. The most important extensions are as follows.
- Add version information
- Add extension header information
- Add return status information
The version information is added to facilitate mutual recognition between the client and the server so that the extension can be enabled. The request line after adding is as follows.
|
|
The extension header information is added to convey more information about the extension. For example, this is when different browsers mark their identity in the request. To facilitate the subsequent addition of various extensions, the HTTP protocol continues to use the concept of “lines” and splitters.
First, consistent with the request line, each piece of extension information occupies one line, split by a colon and ending with \r\n
, e.g.
|
|
Second, this information can have multiple lines. How can the server determine how many lines there are? This also requires the splitting character \r\n
. The HTTP protocol uses a blank line to indicate that the extended message is over. So the complete request is.
The server first receives a line, extracts the file away, and then extracts the extension information line by line according to \r\n
. If a blank line is received, the extension information is received.
These extensions, also called headers, are used to implement various features of the HTTP protocol.
HTTP/0.9 transfers the contents of a file directly after receiving a request. However, some scenarios require additional information to be returned, such as the file does not exist, so people have added return status information to it. In addition, the extended HTTP protocol also supports the server to return multiple headers before sending data. A typical extended response is.
The server will first send a line of data 200 OK\r\n
. Here 200
is the status code, indicating success. The next OK
is the semantic part of the line that is shown to the human. This line is also called the status code line, followed by the extended information, in exactly the same form as in the request, one line per line, ending with a blank line. The last part is the content of the file.
Because of the headers, the scalability of the HTTP protocol took off straight away. People kept adding all kinds of features to the HTTP protocol.
HTTP/0.9 can only transfer plain text files. Because of the Header, we can transfer more descriptive information, such as the type, length, update time, etc. of the file. This descriptive information about the transmitted data is also called Entity Header, and the data itself is called Entiy.
Common Entiy Header are:
Content-Type
Content typeContent-Length
Content lengthContent-Encoding
data encoding
Content-Type indicates the data type, for example, the type of a gif is image/gif
. The value of the type is eventually normalized to Multipurpose Internet Mail Extensions (MIME).
Content-Length indicates the length of the data. But as we said before, the HTTP/0.9 server does not need to return the file length, just close the TCP connection when the transfer is finished. Why is it necessary to define the length information again?
There are two issues here. The first is to support uploading content in the request, and the second is a connection optimization issue.
HTTP/0.9 has only one GET request. Obviously downloading alone was not enough. Requests such as HEAD and POST were introduced one after another to submit data to the server. Once the data is submitted, it is not enough to use splitters. Because the submitted data itself may contain split characters. So the length of the data needs to be specified in advance. This length is specified with the Content-Length header.
Another issue is connection optimization. In fact, the history of the HTTP protocol is largely the history of transmission performance optimization.
HTTP/0.9 creates a TCP connection with each request, and the connection is closed when the reading is finished. It would not be a problem if only one file was downloaded at a time. However, later HTML pages support embedded images and other content, and a page may have multiple images. So when a browser opens an HTML page, it needs to make multiple HTTP requests, and each request has to repeatedly establish and close TCP connections. This not only wastes server resources, but also slows down the loading speed of the page.
So, people find ways to reuse the underlying TCP connection. This simply means that the server does not actively close the connection after the content has been sent. But not closing it leads to the problem mentioned earlier, where the client does not know when the response content has been transmitted. So the length of the data needs to be specified in advance. Since the HTTP protocol already has a header mechanism, adding Content-Length is the most natural way to do this.
There is also a compatibility issue here. If the client does not support multiplexing TCP connections, then the client will be waiting if the server does not close the connection. So the feature of multiplexing TCP connections should not be enabled by default, but should be decided by the client whether to use it or not. This leads to the Connection:Keep-Alive
header. If the client specifies Keep-Alive in the request, the server will not actively close the TCP connection.
In addition to reusing TCP connections, another area of HTTP/0.9 worth optimizing is data compression. In those days, the network speed was very slow, so if the data could be compressed before transmission, the transmission time could be significantly reduced. The server cannot compress at will, because some clients may not support it. So the Accept-Encoding
header was introduced first, with possible values such as compress
or gzip
. The server receives the request and then compresses the content. Since browsers may support multiple compression algorithms, the browser needs to choose one that it also supports to compress the data, so it needs to specify which algorithm it used when returning the content. This is where the Content-Encoding
header comes in.
Whether it is Connection or Accept-Encoding, the HTTP protocol negotiates the use of extensions by adding a new header in order to be as compatible as possible with different clients. This negotiation is led by the client and the server needs to cooperate with the client’s request.
Still, because of the slow and costly network, the HTTP protocol needs to further optimize data transfer efficiency. A typical scenario is that the client has already downloaded the content of a file. When the client requests again, the server also want to return or not. If not, the client will not get the latest content; if returned, when the server’s file has not changed, the client will take a long time to load a file that has already been downloaded. How to optimize this problem?
The following Entity Header has been introduced.
Last-Modified
Expires
If the file is not changed frequently, the server can send the last-modified time to the browser via Last-Modified. If the browser supports it, it can take this time with it the next time the resource is requested, i.e. by adding the following header to the request.
|
|
The server receives it and compares it with the current modification time of the file, and returns 304 directly if there is no modification.
|
|
This is called a conditional request and can significantly reduce unnecessary network transfers.
Even so, the client still initiates one HTTP request to get a 304 response, which also incurs network transfer and server-side overhead. For further optimization, HTTP has introduced the Expires header, which means a future expiration time. Before this time the browser can safely use a locally cached copy without downloading it from the server. This eliminates the need to even initiate conditional requests.
However, the Expires feature has a side effect: once a file is issued, it cannot be modified at all until it expires.
Around the time of 1991-1995, browser vendors implemented the above feature one after another. However, different browsers and server software supported different features, causing various compatibility problems. So in 1996, the IETF published RFC1945. RFC1945 can only be described as a summary of current best practices, not a recommended standard. However, it is still referred to as HTTP/1.0.
Not a year later, in 1997, the IETF published RFC2068, the famous HTTP/1.1 protocol specification.
HTTP/1.1 is a compendium and extension of HTTP/1.0. The core changes are.
- TCP connection multiplexing is enabled by default, so clients no longer need to send Connection:Keep-Alive
- Added the so-called pipeline feature to further optimize transmission efficiency
- Support chunked transport encoding
- Extended cache control
- Content negotiation, including language, transport encoding, type, etc.
- Create multiple HTTP sites on the same IP
The so-called pipeline feature was a further optimization of the HTTP protocol’s transport efficiency, but ultimately failed.
The HTTP protocol is a request-response protocol. The client sends a request and then waits for the server to return the content. Although optimizations such as TCP connection reuse, content compression, and conditional requests have been available since the HTTP/1.0 era, the client must wait for the server to return content before initiating a new request. In other words, the client cannot initiate multiple requests in parallel on a single connection. For this reason, the HTTP/1.1 pipeline allows clients to make multiple HTTP requests in sequence and then wait for the server to return the results. The server needs to return the responses in the order of the requests.
|
|
Although the server can process multiple requests concurrently, this concurrency provides limited optimization, and the pipeline feature does not reduce actual network transfers. There is little software implementation of the pipeline feature, so this optimization design ends up in failure.
The chunked encoding is a very successful optimization that addresses the case of dynamically generated response content on the server side.
HTTP/1.0 can only specify the content length using Content-Length, and sends the header before the body, which requires that the length of the content be determined before it is transmitted. For static files, this is certainly not a problem. But if you want to load an HTML that is dynamically rendered by PHP, there is a problem. Because HTML is generated dynamically by the program, there is no way to determine the length of the content in advance. If we use the original approach, we have to generate the content and save it to a temporary file before sending it to the client. Obviously this performance is too poor.
To solve this problem, HTTP/1.1 introduced the chunked encoding. This simply means going back to the previous length stream and sending the data to the client segment by segment, with the length information preceding each segment.
Transfer-Encoding
is specified as chunked. the next data is also transferred in rows. One line length, one line of data. The length is specified as zero at the end, and then a blank line is added. This way, the server does not need to determine the length of the response in advance, and PHP can send it while rendering. This feature was used to implement message pushing in the days when WebSocket was not popular. You can search Comet or HTTP long polling for more information.
HTTP/1.1 defines caching in a more granular way, introducing Cache-Control extension information. This is a complex section that affects the behavior of CDN nodes in addition to the browser’s caching behavior. Some CDN vendors also extend the semantics of the standard cache directive. For the sake of space, I won’t expand on this here.
However, HTTP/1.1 extends conditional requests, which can be described.
The operating system automatically records the modification time of a file and it is very easy to read that time, but Last-Modified does not cover all cases. Sometimes we need to use a program to generate some files at regular intervals, and its modification time will change periodically, but not necessarily the content. So using Last-Modified alone may still generate unnecessary network transfers. So the HTTP protocol introduces a new header, Etag.
The semantics of the Etag
is to calculate a value based on the content of the file, and only generate a new Etag when the content is modified; each time the client requests it, it brings back the previous Etag, i.e., it adds the following header.
|
|
The server will compare the Etag after receiving it, and will return the new file content only when changes occur.
At that time, the network was very unstable and disconnection was a common occurrence. Imagine what it was like to download a file to 99% and then have it disconnected. In order to reduce unnecessary data transfer, people soon added the “breakpoint continuation” function to the HTTP protocol. In fact, breakpoints are seen from the client’s point of view. From the protocol point of view, the function to be added is to transfer data according to the specified range. That is, the original file is 100 bytes, and the client can specify to download only the last 10 bytes.
|
|
Here 91-100
indicates the range to be downloaded, followed by 100 indicating the length of the entire file. If the server supports it, the following response will be returned.
This feature can be used for parallel download acceleration in addition to breakpoint transfer. The client can start multiple threads, establish multiple TCP connections, each thread downloads a part, and finally merge the contents together. It’s that simple.
In addition, HTTP/1.1 requires the client to send a Host header when requesting. This holds the domain name of the website corresponding to the current request. When the server receives the request, it determines what to return based on the domain name in the Host and the path in the request line. This makes it possible to build websites with different domains on the same IP, which is also known as web hosting. This greatly reduces the cost of building websites and plays a crucial role in the development of the Web ecosystem.
In addition to extending the original functionality of HTTP/1.0, HTTP/1.1 also introduced connection upgrade functionality. This feature is actually not used much later, but there is a heavyweight protocol WebSocket
in use, so it has to be mentioned.
So connection upgrade is to switch the TCP connection currently used for HTTP sessions to another protocol. Take WebSocket as an example.
The Connection is set to Upgrade to indicate that you want to switch protocols. And Upgrade:websocket means that you want to switch to the websocket protocol. Before the switch, this is still a normal HTTP request. The server can do various HTTP actions on the request such as authentication. If the server accepts the user’s request, it will return the following response.
From this point on, both parties cannot send HTTP protocol data over that TCP connection. This is because the protocol has switched to WebSocket.
Starting in 1999 and ending in 2015 with the release of HTTP/2, the HTTP protocol has not changed significantly for 15 years. At the same time, the Internet boomed, transitioning from Web 1.0 to Web 2.0, from the PC Internet to the mobile Internet, and from plaintext HTTP to encrypted HTTPS, with the HTTP protocol playing a central role throughout the process. This also shows that the HTTP protocol is a very scalable protocol.
But HTTP/1.1 is after all a protocol designed in the nineties, and with the rise of mobile Internet after 2010, the industry wants to optimize the problems of HTTP further. So what other issues can be optimized? There are several main areas.
- the protocol uses text format, transmission and parsing efficiency are relatively low
- the Header part of the information can not be compressed, but the reality is that the Header volume is not small (such as cookies)
- the inability to concurrently request resources on a single TCP connection (pipeline failed)
- the server cannot actively send content to the client
Text format is actually one of the features of HTTP. We can directly use telnet to connect to the server when debugging, and then use the naked eye to see the returned results from the server. The human-friendly design must not be machine-friendly. The HTTP protocol uses \r\n as a splitting character and does not limit the number of headers, which inevitably leads to dynamic memory allocation when parsing. And there are also numbers, dates, and other information to be converted into the corresponding binary format, which all require additional parsing costs.
HTTP/1.x supports compressing data content and uses headers to describe the compression algorithm. So it is not possible to compress headers with the same algorithm. You have to find another way.
HTTP/1.1’s pipeline has failed to fully reuse TCP connections, and HTTP was designed from the ground up to be a request-and-response style, with no way for the server to actively push content to the client.
To solve these problems, Google has introduced the SPDY protocol. This protocol has two features.
- compatible with HTTP semantics
- uses binary format to transfer data
SPDY introduces frames as the smallest unit of transmission.
|
|
The first three bytes of each frame indicate the length of the data, then one byte for the type, and another byte to hold some extension tags. Then comes the four-byte stream ID, and finally the actual data. This actually shows that the HTTP protocol has shifted from split streams to length streams.
On the same TCP connection, data frames can be sent alternately and are no longer subject to the request-response mode. This means that the server can also send unsolicited messages to the client. The header and data parts of the same request can be sent separately, instead of requiring the header to be sent first and then the body, and because the data frames are interleaved, the data under the same session needs to be correlated, so SPDY adds a stram ID to each frame. In other words, SPDY virtualizes multiple streams on a TCP connection, each of which is a TCP connection in effect. Different HTTP request and response data can be transmitted concurrently using their own streams, without affecting each other. This solves the three problems 1, 3 and 4 above at once.
The second problem is more problematic. HTTP/1.x headers are K-V, and they are all strings. The K-Vs here rarely change. For example, whenever you visit my blog, you have to send Host: www.sobyte.net
no matter how many requests there are. For this kind of constant, we can keep a mapping table at each end and assign a number to each Key and Value. Then subsequent requests can just pass the Key and Value numbers, thus achieving the compression effect. Looking at the Host alone may not seem like much of an improvement. But think about your own cookies, which contain login session information, and the waste of repeatedly sending each time is quite amazing. So the optimization brought by compressing the header information is still amazing.
Because Google controls Chrome, the browser with the largest market share, and content services like Google/YouTube, it was easy to develop the next-generation HTTP protocol, which was released in 2012, standardized at the IETF, and published in 2015 as RFC7540.
With the development of society, privacy protection has become an important issue of concern. In order to protect user information, the industry is really promoting the popularity of HTTP + TLS, also known as HTTPS, which uses port 443. As we mentioned earlier, HTTP/2 uses binary encoding and is not compatible with HTTP/1.x. However, clients will not upgrade overnight. How can we support both HTTP protocols on one port? This is where the ALPN extension to the TLS protocol comes in. Simply put, when a client initiates a TLS session, it will use the ALPN extension to attach its own supported application layer protocols, such as http/1.1 and h2, and the server will return its supported application layer protocols to the client upon receipt. This allows both parties to determine what protocols to use for the next TLS session.
Theoretically HTTP/2 could be negotiated through the HTTP/1.1 upgrade mechanism, which would also solve the problem of sharing TLS sessions between the two versions. However, such an upgrade would introduce additional latency, so it is not supported by major browsers.
After HTTP/2 was released, the entire industry was actively migrating to the new protocol. But in practice, HTTP/2 has not proven to be as good as it was thought to be. Why? Because for the same domain name, browsers only open one connection by default, and all requests are sent and received using a single TCP connection. Although different requests use different streams, there is only one underlying connection. If the network is jittery, no matter which request’s data needs to be retransmitted, the other requests’ data must wait. This is known as the Head of Line blocking problem, and HTTP/2 is not optimized, but is even worse than HTTP/1.x. Because in the HTTP/1.x era, browsers knew that HTTP could not reuse connections, so they created multiple TCP connections for the same domain. Different requests may be distributed to different connections, and the impact of network jitter is a little better than using only one connection.
Another problem with HTTP/2 is that the functionality is too complex. For example, it supported actively pushing resources (such as CSS files) to the browser at the server so that the client would have to wait for the network transfer while loading. However, this feature was so complex and limited that even Chrome itself eventually dropped support for it. This part of the functionality was replaced by the HTTP 103 Early Hints status code, which can be found in RFC8297.
One plan did not work, another plan. Google’s engineers fought with the “Head of Line blocking” problem. This time they pointed the finger at the root of the problem, the TCP protocol. Because TCP is a reliable transmission protocol, data must be sent and received in order, and must be sent while acknowledging. If the underlying TCP connection was used, it would be impossible to solve the Head of Line blocking problem. For this reason, they designed the QUIC protocol based on the UDP protocol.
The QUIC protocol is simply a message-oriented transport protocol (TCP is a stream-oriented transport protocol.) QUIC also has the concept of streams, where each session can have multiple streams. Data from different streams are sent and received using UDP without interference. Like TCP, the data needs to be confirmed by the other party after it is sent. Then QUIC and HTTP/2 frames are mapped together to form the HTTP/3 protocol, which is RFC9114.
Are there any problems with QUIC? There are, but they are basically not by design.
The first problem is that carriers may restrict UDP traffic, and many firewalls may block QUIC traffic. This is a result of the fact that UDP communications were not widely used before. As HTTP/3 technology becomes more widespread, these issues will improve over time.
The second problem is the delay in starting HTTP/3. HTTP/3 uses UDP communication, which is not compatible with HTTP/1.x and HTTP/2, so the browser cannot determine whether the server supports HTTP/3.
The prevailing practice is for websites to support both HTTP/2 and HTTP/3. The browser first accesses the server over a TCP connection. The server returns a special Header in the first response.
|
|
This means that the HTTP/3 service is provided on port 4430 of the UDP and the message is valid for 3600 seconds. The browser can then use QUIC to connect to port 4430.
The discerning eye will know that there is a problem here, before setting up an HTTP/3 session, you have to use the HTTP/2 startup first. This is not scientific 🔬 and it causes additional time consumption. For this reason, people started to think of another way, which is DNS SVCB/HTTPS records.
DNS SVCB/HTTPS simply means that a special DNS record is used to expose the preceding Alt-Svc information. Before accessing the website, the browser first queries the DNS for HTTP/3 support and the corresponding UDP port, and then directly initiates an HTTP/3 session. This does not rely on TCP connections at all.
By the way, HTTP/3 can work on any UDP port by default, unlike HTTPS, which works on port 443 by default. If the operator blocks 443, there is no way to serve the public. When HTTP/3 becomes popular, everyone can use their own broadband to build websites.
Well, I think I have basically explained the development of the HTTP protocol. I can’t say it’s not a pity that I didn’t discuss the technical details of HTTP/2 and HTTP/3 in detail due to the space, but I will add them later when I have time. I hope this article can help you better understand the HTTP protocol.