In reviewing the code that my colleagues have written to initiate external HTTP requests, I have rarely seen a more standard (or correct and safe) way to construct the URL of an HTTP request. What is standard practice, if you ask me? I probably can’t tell you exactly. However, I have a few simple criteria of my own.
- Protocol: Does the request work without
http://
? - Path: Does it stitch out
Path
correctly with/
at the end? - Query: Do the query parameters handle the transcoding correctly?
Preliminary Knowledge
What is a URL? The following structure is from the Go language URL official documentation.
|
|
Handling protocols correctly
Go’s url
package does not support URLs without a protocol (Scheme), and since http internally also uses the url
package to parse, the following request is wrong.
The error is reported as follows.
I don’t know if this counts as a bug in the url
package, so I won’t theorize here, but I always make a habit of dealing with it first in the following way.
Note that I’m judging here by ://
and not http://
or https://
. There are several reasons for this.
-
Simplicity. No need to determine both
http://
orhttps://
. -
Don’t worry about case. The protocol (scheme) part of the URL is case-insensitive.
HTTP://example.com
andhttp://example.com
are equivalent. If you insist on determining the prefix, you should also write it as follows.This is too cumbersome to write! This is still the way to write the
http
protocol only.
Proper handling of paths
The part of http://example.com/path/to/file.txt
that looks like /path/to/file.txt
is called Path
, i.e. path. Understanding and constructing paths correctly is a major area of error.
Naming issues
The first and foremost problem is naming. Many people assume that API requests can only be sent to /
paths, like http://example.com/v1/posts
, an API interface where the /v1/posts
part is fixed and the preceding http://example.com
is in the configuration file. So they name this part host
(or even more host_port
). At first glance I thought I could only match the example.com
part (because it’s called host, hostname or host_port).
So in case I want to test a proxied API someday and the prefix changes, for example, now it’s http://example.com/proxy/v1/posts
. Then the configuration file should now say http://example.com/proxy
in this section. Is this still called host
?
Why am I bothering with this, you ask? I didn’t want to, I didn’t even think this kind of stuff could be a dispute for us. I thought everyone was following the specs.
Where is this scenario, you ask? All over the place, like Grafana using requests to data sources in Server mode (as opposed to Direct).
So what’s a good name for it? I’ve seen: endpoint
, prefix
, url
, address
, etc.
To have or not to have the final /
question
Because his code is based on configuring and then appending (yes, +
) API paths, so.
- If the configuration is
http://example.com
, then it will get:http://example.com/v1/posts
; * If the configuration ishttp://example.com
, then it will get - If you configure
http://example.com/
, then you get:http://example.com//v1/posts
; * If you configurehttp://example.com/
, then you get:http://example.com//v1/posts
.
See? They simply can’t handle whether it ends with /
or not, and at one point they even verbally asked you not to bring the final /
.
The root cause of this is that their URLs are manually spliced.
Not all servers are compatible with automatically turning //
into /
, and errors are inevitable.
So how to do it? Use the path
package. (This package conflicts with our common variable name path
, which is a bit unpleasant.)
There is a corresponding package called filepath
. The main difference between these two packages is that the former applies to forward-slash related paths, while the latter applies to OS related paths. For example, /
separates paths on Linux and \
separates paths on Windows. This is clearly stated in the package documentation.
path
Package path implements utility routines for manipulating slash-separated paths.
The path package should only be used for paths separated by forward slashes, such as the paths in URLs. This package does not deal >with Windows paths with drive letters or backslashes; to manipulate operating system paths, use the path/filepath package.
filepath
Package filepath implements utility routines for manipulating filename paths in a way compatible with the target operating >system-defined file paths.
The filepath package uses either forward slashes or backslashes, depending on the operating system. To process paths such as URLs that always use forward slashes regardless of the operating system, see the path package.
How do I use the path
package? Just one method: path.Join
.
func Join(elem ...string) string
Join joins any number of path elements into a single path, separating them with slashes. Empty elements are ignored. The result is Cleaned. However, if the argument list is empty or all its elements are empty, Join returns an empty string.
Test code.
|
|
Output results.
If you are using go1.18 or later, the url
library already comes with this capability, see: net/url: add JoinPath
, URL.JoinPath.
Path encoding problem
The above path.Join
will automatically handle encoding issues:.
|
|
Output results.
Many servers or more modern backends should now be able to handle unencoded characters correctly, and it is less common to use characters other than numeric letters in the API. The encoding problem is not particularly serious.
Do you think I /v1/posts/new-posts
wrote it wrong because I didn’t encode it myself? Sorry, no mistake. The path.Join
joins the paths (segments) before encoding, and the url.String()
method gets the final URL for transmission.
Handling query strings correctly
A query is simply the part of the URL that comes after the question mark. For example: http://example.com/v1/posts?page_no=1&a=b
, page_no=1&a=b
is called a query (query or query_string).
I’m sure everyone has seen someone else manually splice in the code for this query, or written it themselves (I’m no exception).
|
|
The following are the results.
It’s hard to read so much hardcoding. I don’t think you’ve seen many URLs with spaces, have you? The ones with Chinese characters are probably not very standardized either, right? I’m very bitter.
The following is what I think is a more standard and safe way to write.
|
|
Output results.
Finally
I don’t know if you have made similar mistakes, but I have made them many times before anyway, so I have a summary like this today.
I also don’t know if other languages have similar problems is, at least I used to write C, C++, Lua, PHP, Javascript, etc. all have similar problems.