Recently, while reading Go1.18 Release Notes, I found that the Title method of the strings, bytes standard library has been deprecated. Why is this?
Introduction
Here is an example of the strings standard library. The strings.Title method does the following: maps all Unicode letters at the beginning of a word to its Unicode title case.
The example is as follows.
Output results.
These words are converted to their upper case.
Problems
It may seem like everything is fine, but there are actually 2 obvious flaws at this stage.
They are.
- Does not handle Unicode punctuation correctly.
- Does not take into account the capitalization rules of specific human languages.
Let’s get into the details.
Unicode punctuation
For the first question, the example is as follows.
Output results.
|
|
Variable a conversion processing results in “Go.Go․go”, but should be “Go.Go․Go” according to the actual claim.
Language specific rules
For the second problem, the code is as follows.
Output results.
|
|
In the Dutch word. “ijsland” should be capitalized as “IJsland”, but the result is converted to “Ijsland”.
Solution
This problem was discovered in 2013 from “strings: Title function incorrectly handles word breaks” and was flagged as an unplanned problem by Rob Pike, the father of the Go language.
As follows.
Because of the Go1 compatibility guarantee treaty, this is “impossible” to fix, and once fixed it will affect the output of the function and is a destructive change.
However, it is possible to take another approach, which is “deprecated” as mentioned in this article. This is identified below.
|
|
Mark “Deprecated” on the function.
The corresponding Go documentation will collapse it and explicitly show the deprecation, and it is recommended to use the golang.org/x/text/cases
library directly to implement this functionality.
The new x/text/cases case is as follows.
|
|
Output results.
Outputting multiple language conversions, our core focus is on the code associated with cases.Lower(language.Und)
, which the library will use by calling.
-
cases.Title(<language>).Bytes(<bytes>)
-
cases.Title(<language>).String(<string>)
The language of processing is specified in programming to address the claims of different human language symbols, different languages and capitalized words to avoid one-size-fits-all.
But this new “trap”, apparently, also introduces more complexity, saying that the good old “less is more…” , it’s worth considering the new cost when using the method.
Summary
Although only a small function, but also extends a lot of problems. Essentially, there are still cognitive limitations in the design.
The strings.Title
and bytes.Title
functions are often misunderstood in practice as methods for converting initial capitalization, contrary to their design meaning.
Although in the end such misunderstanding brings better results compared to the defects, it is still a big problem for some special scenarios and language support.