Variable memory allocation and recycling
Go programs allocate memory for variables in two places, one is the global heap and the other is the function call stack. The Go language has a garbage collection mechanism, and it is up to the compiler to decide whether a variable is allocated on the heap or stack in Go, so developers don’t need to pay much attention to whether a variable is allocated on the stack or heap. However, if you want to write high quality code, it is necessary to understand the implementation behind the language. The mechanism of allocating variables on the stack and on the heap is completely different, and the performance difference between the allocation and recycling process of variables is very big.
Difference between heap and stack
Heap
The memory that is dynamically allocated when the program is running is located in the heap, which is managed by the memory allocator, and the size of this area changes as the program runs. That is, when we request memory from the heap but the allocator finds that there is not enough memory in the heap, it requests the operating system kernel to expand the size of the heap in the direction of higher addresses. When we release the memory and return it to the heap, if the allocator finds that there is too much free memory left, it requests the OS to shrink the heap size to the lower address. As we can see from the memory request and release process, the memory allocated from the heap must be returned to the heap after it is used up, otherwise the memory allocator may repeatedly request the operating system to expand the heap size, resulting in more and more heap memory being used and eventually running out of memory, which is called a memory leak. It is worth mentioning that traditional c/c++ code needs to handle the allocation and release of memory manually, while in Go, there is a garbage collector to collect the memory on the heap, so the programmer only needs to apply for memory, not to care about the release of memory, which greatly reduces the mental burden of the programmer, which not only improves the productivity of the programmer, but more importantly, also reduces the generation of many bugs.
Stack
The function call stack, referred to as the stack, plays a very important role in the running of a program, whether it is the execution of a function or a function call, and it is mainly used to.
- store the local variables of the function.
- pass parameters to the called function.
- to return the return value of a function.
- to hold the return address of the function, which is the address of the instruction that the caller should continue to execute after returning from the called function.
Each function needs to use a piece of stack memory to store these values during execution, and we call this piece of stack memory the stack frame of a function. When a function call occurs, because the caller has not finished executing, the data saved in its stack memory is still available, so the called function cannot overwrite the caller’s stack frame, but can only “push” the called function’s stack frame onto the stack, and then “pop” its stack frame from the stack after the called function has finished executing. pop" out, so that the size of the stack will grow with the increase of the function call level, and shrink with the return of the function, that is, the deeper the function call level, the more stack space is consumed. The growth and shrinkage of the stack is automatic and is done automatically by the code inserted by the compiler, so the memory used by the function local variables located in the stack memory is allocated with the function call and released automatically with the return of the function, so the programmer does not need to release the memory used by the local variables himself, whether he uses a high-level programming language with or without garbage collection. This is quite different from the memory allocated on the heap.
The process is the basic unit of resource allocation for the operating system. Each process is allocated a fixed size of memory on the process stack by the operating system at startup, and the default stack size of the process in Linux can be viewed by ulimit -s
. The memory allocated on the stack is automatically reclaimed when the function exits by changing the offset of the register pointer. The size of memory in the heap is requested from the operating system while the process is running. The amount of memory available in the process heap also depends on the amount of memory currently available to the operating system.
So how does the compiler decide whether to allocate variables on the heap or the stack in Go?
Variable memory allocation escape analysis
As mentioned above, it is up to the compiler to decide whether to allocate variables on the heap or the stack in Go, and the way the compiler decides where to allocate memory is called escape analysis.
When a local variable is declared within a function in Go, the compiler will allocate memory on the stack when it finds that the scope of the variable does not escape from the function, otherwise it will be allocated on the heap. Escape analysis is done by the compiler and acts at the compilation stage.
Check whether the variable is allocated on the stack or the heap
There are two ways to determine whether a variable allocates memory on the heap or on the stack:
- by compiling the generated assembly function to confirm that variables that allocate memory on the heap call the
newobject
function of the runtime package. - compile-time display of compilation optimization information by specifying options, and the compiler outputs the escaped variables.
The variables in the following code examples are analyzed for escapes by both of the above.
1. Verify that variable memory allocation is not escaping through assembly**
|
|
The above is just the compiled assembly code of the example function. You can see that the runtime.newobject function is called in line 8 of the program.
2. Check by compilation options
|
|
You can use go tool compile --help
to see the meaning of several options.
The official Go faq documentation stack_or_heap also describes how to know whether a variable is allocated on the heap or on a sticky, and the documentation is relatively simple.
Some cases of intra-function variables allocated on the heap
1. Variables of pointer type, pointer escape
Code example, consistent with the example in the previous section.
2. insufficient stack space
|
|
As you can see in the Go compiler code, variables over 10M in size are allocated to the heap for declared types, and implicit variables over 64KB are allocated to the heap by default.
|
|
3. Dynamic types, interface{} Dynamic type escapes
4. Closure reference object
|
|
Performance differences when returning from a function using a value and a pointer
The above article introduced the way of memory allocation for variables in Go. From the above article, we know that when a variable is defined in a function and returned with a value, the variable will be allocated on the stack and the function will copy the whole object when it returns.
Although the value has a copy operation, the return pointer will allocate the variable on the heap, and the allocation and recycling of the variable on the heap will have a larger overhead. For this problem, there is also a certain relationship with the returned object and platform, and different platforms need to be benchmarked to get a more accurate result.
return_value_or_pointer.go
|
|
benchmark_test.go
|
|
In my local tests, structures with 200000 int types return values faster, and pointers are faster when they are less than 200000. If you have higher performance requirements for your code, you will need to benchmark it on a real platform to reach a conclusion.
Some other experience in using
-
stateful objects must use pointers to return, such as the system built-in sync. WaitGroup, sync.Pool, etc. In Go, some structures have an explicit noCopy field to remind that value copying is not possible.
-
objects with short life cycles use value return, if the life cycle of the object exists longer or the object is larger, you can use the pointer to return.
-
large objects are recommended to use pointers to return, object size threshold needs to be benchmarked in specific platforms to derive data.
-
reference to the use of some large open source projects, such as kubernetes, docker, etc..
Summary
This article has analyzed some of the issues when using variables in Go functions, the differences between allocating memory on the heap and the stack when variables will exist in both places, and when variables need to be allocated memory on the heap.