Studying the performance of a program is important for every engineer, and I can even say this: It is a necessary skill for an engineer.
Golang’s performance analysis tool, pprof, supports the following kinds of analysis.
- heap: a sampling of all living objects in current memory (almost a must for languages with GC), which can be used to analyze memory problems.
- profile: as opposed to
heap
, a sampling of the CPU, which can be used to analyze the time-consuming bottleneck of the program. - goroutine: a stack trace of all current
goroutines
. - allocs: all past memory requests sampled.
- threadcreate: system-level thread stack trace.
- block: stack trace for blocking on synchronous primitives.
- mutex:
mutex
stack trace of all race relations. - trace: trace of current program execution.
The debugging experience with good tools is also great, and the whole process can be analyzed in a few simple commands. However, due to the limitation of space and the fact that CPU analysis has been mentioned many times before, today we will only talk about how to analyze memory, i.e. Heap.
Heap is generally used for memory leaks, or if you want to optimize the use of memory.
Memory leaks and memory optimization
For memory leaks, these are often difficult to find and analyze because they require monitoring the Go program itself or looking at the OOM logs in Linux’s dmesg to find them.
|
|
When you find an OOM log, you should consider adding monitoring that you have ignored, because this problem will be reproduced (but often difficult to reproduce on your own machine and in pre-release environments). If you don’t know what the monitoring parameters are, you can look at the monitoring data and simply set a percentage, for example, when your program initializes it takes up 10% of the memory, then once the Go program’s memory usage reaches a certain percentage, such as 50% of the machine’s memory, you will be alerted immediately and you will be able to analyze it.
However, there is no need to go through a lot of trouble because you can add pprof support to your Go program with a few simple lines of code that will not affect the program’s operation and provide web services so that you can analyze it through your browser.
Then, using the pprof
tool provided by go, you can analyze, for example, for memory leaks.
|
|
It will take you to pprof’s REPL, where you can locate the problem with some simple commands. However, for a better analysis experience, there are two things to note.
- if your compile parameters are reloaded with
-trimpath
and-ldflag "-s -w"
, it is better to remove them, otherwise it will affect your ability to locate the problem. - execute this command on the compiling machine so that it can be analyzed directly down to the level of each line of code.
The next actual example I use belongs to memory usage analysis optimization, and since I haven’t encountered OOM yet, I’ll start with a small example I encountered instead, since the two problems are analyzed in the same way.
How to use pprof
First step, first look at top10
.
|
|
Here’s a hint: flat
means that the rightmost call is still free, and cum
means the cumulative (cumulative) space requested. top is sorted by flat by default, and you can switch the sorting method with the argument: top10 -cum
.
If you don’t see anything unusual here, there’s something else to look at, because the sampling statistics of the Golang heap are divided into four parts.
- alloc_objects : requested objects
- alloc_space: requested space
- inuse_objects: objects in use
- inuse_space: space in use (default)
You can switch between them with a command like sample_index=inuse_objects
.
In my case, since I am sure here why the first item io.ReadAll
is in my program, but the second item compress/flate.NewWriter
makes me think there is an exception, but I don’t know where it is called. So, after identifying the exception item, the second step can be to further confirm the call chain by tree
.
|
|
Now, we can basically confirm that there was a small problem in my implementation of GetMetrics
when handling the serialization compression of the prometheus client (but not to the point of a memory leak). Alternatively, here you can add a Step 3: use the list
plus keyword command to see the analysis down to the level of each line of code.
After locating the problem, it is the last step to solve it, and my solution is to use sync.Pool
. Previously, I was using gzip.NewWriter
directly to compress the metric text each time it was taken out of prometheus, but this caused gzip
to duplicate memory requests and initialization multiple times, so I modified the code to use sync.Pool
instead.
The original code.
The Modified code.
|
|
We can write a benchmark to test.
You can see that the allocs
of memory has dropped from 129 to 108.
Well, that’s it for now.
For most people, it is simpler to analyze the problem with a mouse click on the web page, as the tool Go pprof currently does a one-stop shop where you can see the call chart as well as the flame chart directly on the web page. For more details, see the official documentation.