In Go language, how to handle string summing efficiently? Since strings are immutable, stitching many strings together is like declaring a new variable to store. Here we can use strings.Builder or bytes.Buffer to solve the string summing performance problem. In addition to performance issues, it is important to note that bytes.Buffer
handles the conversion between []byte
and string
. Here are some of the errors written in the actual project for your reference.
Buffer reuse problem
The result of parsing data with the bytes.Buffer
suite. The following is a basic example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
package main
import (
"bytes"
"fmt"
)
var buf bytes.Buffer
func parseMultipleValue(n int, str string) []byte {
buf.Reset()
for i := 0; i < n; i++ {
buf.WriteString(str)
}
return buf.Bytes()
}
func main() {
s1 := parseMultipleValue(5, "1")
fmt.Println("s1:", string(s1))
s2 := parseMultipleValue(3, "2")
fmt.Println("s1:", string(s1))
fmt.Println("s2:", string(s2))
}
|
Please directly open the example online to see, the result after execution is as follows.
1
2
3
|
s1: 11111
s1: 22211
s2: 222
|
Have you seen that if you want to access the result of s1
for the second time, you will find that the latter s2
data will cover part of the s1
data. The reason for this is that when the first time s1 gets a memory with 5 bits of space, and when the second time parseMultipleValue
is executed, the bytes.Rest()
just moves the offset position to the 0 position, and writes the new content to the front of the same memory position. The first 3 characters of the content of s1
are changed to the new s2
string.
Two solutions
How can I do this without affecting the contents of s1
? The problem can be solved by using the built-in bytes.Buffer
function String()
directly.
1
2
3
4
5
6
7
8
9
|
var buf bytes.Buffer
func parseMultipleValue(n int, str string) string {
buf.Reset()
for i := 0; i < n; i++ {
buf.WriteString(str)
}
return buf.String()
}
|
If you don’t use String()
, you can also use the copy method, and use unsafe.Pointer
to optimize the performance of byte to string.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
var buf bytes.Buffer
func b2s(b []byte) string {
return *(*string)(unsafe.Pointer(&b))
}
func parseMultipleValue(n int, str string) string {
buf.Reset()
for i := 0; i < n; i++ {
buf.WriteString(str)
}
s := make([]byte, len(buf.Bytes()))
copy(s, buf.Bytes())
return b2s(s)
}
|
Both of the above solutions can eventually solve the problem, and there is no difference in performance, so you can choose one of them.
1
2
3
4
|
BenchmarkA
BenchmarkA-8 34922 33986 ns/op 106496 B/op 1 allocs/op
BenchmarkB
BenchmarkB-8 35760 33714 ns/op 106496 B/op 1 allocs/op
|
Complete code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
|
package main
import (
"bytes"
"math/rand"
"testing"
"unsafe"
)
const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
func randomString(n int) string {
b := make([]byte, n)
for i := range b {
b[i] = letterBytes[rand.Intn(len(letterBytes))]
}
return string(b)
}
var buf bytes.Buffer
func b2s(b []byte) string {
return *(*string)(unsafe.Pointer(&b))
}
func parseMultipleValue(n int, str string) string {
buf.Reset()
for i := 0; i < n; i++ {
buf.WriteString(str)
}
s := make([]byte, len(buf.Bytes()))
copy(s, buf.Bytes())
return b2s(s)
}
func parseMultipleValue2(n int, str string) string {
buf.Reset()
for i := 0; i < n; i++ {
buf.WriteString(str)
}
return buf.String()
}
func benchmark(b *testing.B, f func(int, string) string) {
str := randomString(10)
b.ReportAllocs()
for i := 0; i < b.N; i++ {
f(10000, str)
}
}
func BenchmarkA(b *testing.B) { benchmark(b, parseMultipleValue) }
func BenchmarkB(b *testing.B) { benchmark(b, parseMultipleValue2) }
|
Since I was in a hurry to analyze the content of a very large file (200MB), I did not write a complete test, so I did not find this error. It is true that my own momentary negligence caused this mistake, after making up the complete test, I can optimize the performance one after another.