Recently, while reading Go1.18 Release Notes, I found that the Title method of the strings, bytes standard library has been deprecated. Why is this?

Introduction

Here is an example of the strings standard library. The strings.Title method does the following: maps all Unicode letters at the beginning of a word to its Unicode title case.

The example is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import (
 "fmt"
 "strings"
)

func main() {
 fmt.Println(strings.Title("her royal highness"))
 fmt.Println(strings.Title("eddy cjy"))
 fmt.Println(strings.Title("хлеб"))
}

Output results.

1
2
3
Her Royal Highness
Eddy Cjy
Хлеб

These words are converted to their upper case.

Problems

It may seem like everything is fine, but there are actually 2 obvious flaws at this stage.

They are.

  • Does not handle Unicode punctuation correctly.
  • Does not take into account the capitalization rules of specific human languages.

Let’s get into the details.

Unicode punctuation

For the first question, the example is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import (
 "fmt"
 "strings"
)

func main() {
 a := strings.Title("go.go\u2024go")
 b := "Go.Go\u2024Go"
 if a != b {
  fmt.Printf("%s != %s\n", a, b)
 }
}

Output results.

1
Go.Go․go != Go.Go․Go

Variable a conversion processing results in “Go.Go․go”, but should be “Go.Go․Go” according to the actual claim.

Language specific rules

For the second problem, the code is as follows.

1
2
3
func main() {
 fmt.Println(strings.Title("ijsland"))
}

Output results.

1
Ijsland

In the Dutch word. “ijsland” should be capitalized as “IJsland”, but the result is converted to “Ijsland”.

Solution

This problem was discovered in 2013 from “strings: Title function incorrectly handles word breaks” and was flagged as an unplanned problem by Rob Pike, the father of the Go language.

As follows.

Rob Pike

Because of the Go1 compatibility guarantee treaty, this is “impossible” to fix, and once fixed it will affect the output of the function and is a destructive change.

However, it is possible to take another approach, which is “deprecated” as mentioned in this article. This is identified below.

1
2
3
4
5
6
7
// Title returns a copy of the string s with all Unicode letters that begin words
// mapped to their Unicode title case.
//
// BUG(rsc): The rule Title uses for word boundaries does not handle Unicode punctuation properly.
//
// Deprecated: Use golang.org/x/text/cases instead.
func Title(s string) string {

Mark “Deprecated” on the function.

Deprecated

The corresponding Go documentation will collapse it and explicitly show the deprecation, and it is recommended to use the golang.org/x/text/cases library directly to implement this functionality.

The new x/text/cases case is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import (
 "fmt"

 "golang.org/x/text/cases"
 "golang.org/x/text/language"
)

func main() {
 src := []string{
  "hello world!",
  "i with dot",
  "'n ijsberg",
  "here comes O'Brian",
 }
 for _, c := range []cases.Caser{
  cases.Lower(language.Und),
  cases.Upper(language.Turkish),
  cases.Title(language.Dutch),
  cases.Title(language.Und, cases.NoLower),
 } {
  fmt.Println()
  for _, s := range src {
   fmt.Println(c.String(s))
  }
 }
}

Output results.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
hello world!
i with dot
'n ijsberg
here comes o'brian

HELLO WORLD!
İ WİTH DOT
'N İJSBERG
HERE COMES O'BRİAN

Hello World!
I With Dot
'n IJsberg
Here Comes O'brian

Hello World!
I With Dot
'N Ijsberg
Here Comes O'Brian

Outputting multiple language conversions, our core focus is on the code associated with cases.Lower(language.Und), which the library will use by calling.

  • cases.Title(<language>).Bytes(<bytes>)

  • cases.Title(<language>).String(<string>)

The language of processing is specified in programming to address the claims of different human language symbols, different languages and capitalized words to avoid one-size-fits-all.

But this new “trap”, apparently, also introduces more complexity, saying that the good old “less is more…” , it’s worth considering the new cost when using the method.

Summary

Although only a small function, but also extends a lot of problems. Essentially, there are still cognitive limitations in the design.

The strings.Title and bytes.Title functions are often misunderstood in practice as methods for converting initial capitalization, contrary to their design meaning.

Although in the end such misunderstanding brings better results compared to the defects, it is still a big problem for some special scenarios and language support.