This article explains the rules for executing defer
and introduces the defer
type. It explains how defer
function calls are done, mainly through heap allocation.
Introduction
defer execution rules
The order of execution of multiple defers is “Last In First Out LIFO "
In the above example, the string Naveen
is traversed using a for loop and then defer
is called. These defer
calls act as if they were stacked, and the last defer
call pushed onto the stack is pulled out and executed first.
The output is as follows.
The defer declaration will first calculate the value of the parameter
In this example, the variable i is determined when defer
is called, not when defer
is executed, so the output of the above statement is 0.
defer can modify the return value of a named return value function
As officially stated.
For instance, if the deferred function is a function literal and the surrounding function has named result parameters that are in scope within the literal, the deferred function may access and modify the result parameters before they are returned.
An example is as follows.
However, it should be noted that only the named return value (named result parameters) function can be modified, and the anonymous return value function cannot be modified, as follows.
Because anonymous return-valued functions are declared when return
is executed, only named return-valued functions can be accessed in the defer
statement, not anonymous return-valued functions directly.
Types of defer
Go made two optimizations to defer
in versions 1.13 and 1.14, which significantly reduced the performance overhead of defer
in most scenarios.
Allocation on the heap
Prior to Go 1.13 all defer
s were allocated on the heap, a mechanism that at compile time.
- inserting
runtime.deferproc
at the location of thedefer
statement, which, when executed, saves thedefer
call as aruntime._defer
structure to the top of the_defer
chain of Goroutine. runtime.deferreturn
is inserted at the position before the function returns, and when executed, the topruntime._defer
is retrieved from Goroutine’s_defer
chain and executed sequentially.
Allocation on the stack
New in Go 1.13, deferprocStack
implements on-stack allocation of defer
. Compared to heap allocation, on-stack allocation frees _defer
after the function returns, eliminating the performance overhead of memory allocation and requiring only proper maintenance of the chain of _defer
. According to the official documentation, this improves performance by about 30%.
Except for the difference in allocation location, there is no fundamental difference between allocating on the stack and allocating on the heap.
It is worth noting that not all defer
s can be allocated on the stack in version 1.13. A defer
in a loop, whether it is a display for
loop or an implicit loop formed by goto
, can only use heap allocation, even if it loops once.
|
|
Open coding
Go 1.14 added open coding, a mechanism that inserts defer
calls directly into functions before they return, eliminating the need for deferproc
or deferprocStack
operations at runtime. This optimization reduces the overhead of defer
calls from ~35ns in version 1.13 to ~6ns or so.
However, certain conditions need to be met in order to trigger.
- the compiler optimization is not disabled, i.e.
-gcflags "-N"
is not set. - the number of
defer
s in the function does not exceed 8 and the product of thereturn
statements and the number ofdefer
statements does not exceed 15. - the
defer
keyword of the function cannot be executed in a loop.
defer structure
The parameters to note above are siz
, heap
, fn
, link
, openDefer
which will be covered in the following analysis.
Analysis
In this article, we will start with the heap allocation, we will talk about why the execution rules of defer are as described at the beginning, and then we will talk about the stack allocation of defer and the development coding related content.
The analysis starts with a function call as the entry point.
Allocation on the heap
Named function return value calls
Let’s start with the example mentioned above and look at heap allocation from function calls. Note that running the following example on 1.15
does not allocate directly to the heap, but requires you to recompile the Go source code to force the defer to allocate to the heap.
File location: src/cmd/compile/internal/gc/ssa.go
Print the assembly using the command.
|
|
First of all, let’s look at the main function, there is nothing to say, it is a very simple call to the f function.
The following subparagraph looks at the calls to the f function.
|
|
Since allocation on the defer
heap calls the runtime.deferproc
function, what is shown in this assembly is an assembly before the runtime.deferproc
function is called, which is still very simple to understand.
Because the argument to the runtime.deferproc
function is two arguments, as follows.
|
|
In the function call process, the parameters are passed from the right to the left of the parameter list stack, so the top of the stack is pressed into the constant 8, in the 8(SP) position is pressed into the second parameter f.func1-f
function address.
See here may have a question, in the pressure into the constant 8 when the size is int32 occupies 4 bytes size, why the second parameter does not start from 4 (SP), but to start from 8 (SP), this is because the need to do memory alignment caused.
In addition to the parameters, it should also be noted that the 16(SP) position is pressed into the 40(SP) address value. So the entire pre-call stack structure should look like the following.
Let’s look at runtime.deferproc
:
File location: src/runtime/panic.go
|
|
When calling the deferproc
function, we know that the argument siz
is passed in as the value at the top of the stack representing the argument size of 8 and the address corresponding to the 8(SP)
passed in as the argument fn.
So the two sentences above are actually a combination of the address value we saved in 16(SP) above into the next 8bytes block of memory immediately below defer
as the argument to defer
. A simple diagram would look like the following, where the argp
immediately below defer
actually stores the address value saved in 16(SP).
Note that here the argp value is copied by a copy operation, so the argument is already determined when defer
is called, not when it is executed, but here the value of an address is copied.
And we know that when allocated on the heap, defer
is stored in the current Goroutine as a chain, so if there are 3 defer
s called separately, the last one called will be at the top of the chain.
For the newdefer
function, the general idea is to fetch from P’s local cache pool, and if not, fetch half of defer
from sched’s global cache pool to fill P’s local resource pool, and if there is still no available cache, allocate new defer
and args
directly from the heap. The memory allocation here is roughly the same as the memory allocator allocation, so we won’t analyze it again, but you can see for yourself if you are interested.
Let’s go back to the assembly of the f function.
|
|
Here it is very simple, write constant 6 directly to 40(SP) as the return value and then call runtime.deferreturn
to execute defer
.
Let’s look at runtime.deferreturn
:
File location: src/runtime/panic.go
|
|
First, note that the argument arg0
passed in here is actually the value at the top of the caller’s stack, so the following assignment actually copies the defer argument to the top of the caller’s stack.
|
|
*(*uintptr)(deferArgs(d))
What is stored here is actually the address value saved by the caller 16(SP). Then the caller’s stack frame is shown below.
Go to runtime.jmpdefer
to see how this is done.
Location: src/runtime/asm_amd64.s
This assembly is very interesting, the jmpdefer
function, since it was called by runtime.deferreturn
, now has the following call stack frame
The arguments passed to the jmpdefer
function are 0(FP) for the fn function address, and 8(FP) for the SP of the call stack of the f function.
So the following sentence represents the return address
of the runtime.deferreturn
call stack written to SP.
|
|
Then -8(SP)
represents the Base Pointer
of the runtime.deferreturn
call stack.
|
|
We will focus on explaining why the value of the SP pointer minus 5 is used to obtain the address value of runtime.deferreturn
.
|
|
We return to the assembly of the f function call.
Since the runtime.deferreturn
function needs to return to the 0x45defd address after the call, the return address
in the stack frame corresponding to the runtime.deferreturn
function is actually 0x45defd.
In the jmpdefer
function, the value corresponding to (SP)
is the return address
of the runtime.deferreturn
call stack, so subtracting 5 from 0x45defd will give you 0x45def8, which is the value of the runtime.deferreturn
function. address.
Then when we finally jump to the f.func1
function, the call stack is as follows.
The location of the call stack (SP)
actually holds a pointer to the deferreturn
function, so after the f.func1
function is called, it returns to the deferreturn
function until there is no data in the _defer
chain.
Here’s another short look at the f.func1
function call.
|
|
The call here is very simple: get the data pointed to by the 8(SP) address value and do the arithmetic, then write the result to the stack and return.
Here we have basically shown you the whole process of calling defer
functions through heap allocation. The answer is that the defer
argument passed during the defer
call is a pointer to the return value, so the return value is modified when defer
is finally executed.
Anonymous function return value calls
So what if anonymous return value functions are passed? For example, something like the following.
Print the compilation below.
|
|
In the output above, we can see that the anonymous return value function call first writes the constant 100 to 24(SP), then writes the address value of 24(SP) to 16(SP), and then writes the return value to 48(SP) with the MOVQ
instruction, which means that the value is copied, not the pointer, and so the return value is not modified.
Summary
Here is a diagram comparing the two after calling runtime.deferreturn
stack frames.
It is clear that the famous return value function stores the address of the return value at 16(SP), while the anonymous return value function stores the address of 24(SP) at 16(SP).
The above sequence of analysis also answers a few questions in passing.
-
how does defer pass arguments? We found in the above analysis that when executing the
deferproc
function, the argument value is first copied to the location immediately adjacent to thedefer
memory address value as the argument, if it is a pointer pass it will directly copy the pointer, and a value pass will directly copy the value to the location of thedefer
argument.Then when the
deferreturn
function is executed, it copies the parameter values to the stack and then callsjmpdefer
for execution.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
func deferreturn(arg0 uintptr) { ... switch d.siz { case 0: // Do nothing. case sys.PtrSize: // 将 defer 保存的参数复制出来 // arg0 实际上是 caller SP 栈顶地址值,所以这里实际上是将参数复制到 caller SP 栈顶地址值 *(*uintptr)(unsafe.Pointer(&arg0)) = *(*uintptr)(deferArgs(d)) default: // 如果参数大小不是 sys.PtrSize,那么进行数据拷贝 memmove(unsafe.Pointer(&arg0), deferArgs(d), uintptr(d.siz)) } ... }
-
How are multiple defer statements executed?
When the
deferproc
function is called to register adefer
, the new element is inserted at the head of the table, and execution is done by getting the head of the chain in order. -
What is the order of execution of defer, return, and return value?
To answer this question, let’s take the assembly of the output in the above example and examine it.
1 2 3 4 5 6 7 8
"".f STEXT size=126 args=0x8 locals=0x20 ... 0x004e 00078 (main.go:11) MOVQ $6, "".result+40(SP) ;; 将常量6写入40(SP),作为返回值 0x0057 00087 (main.go:11) XCHGL AX, AX 0x0058 00088 (main.go:11) CALL runtime.deferreturn(SB) ;; 调用 runtime.deferreturn 函数 0x005d 00093 (main.go:11) MOVQ 24(SP), BP 0x0062 00098 (main.go:11) ADDQ $32, SP 0x0066 00102 (main.go:11) RET
From this assembly, we know that for
- it is the first to set the return value to the constant 6.
- then
runtime.deferreturn
will be called to execute thedefer
chain. - executing the RET instruction to jump to the caller function.
Stack allocation
As mentioned at the beginning, defer
on-stack allocation was added after Go version 1.13, so one difference from heap allocation is that defer
is created on the stack via deferprocStack
.
Go goes through the SSA stage at compile time, and if it’s a stack allocation, then it needs to use the compiler to initialize the _defer
record directly on the function call frame and pass it as an argument to deferprocStack
. The rest of the execution process is no different from heap allocation.
For the deferprocStack
function let’s look briefly at.
File location: src/cmd/compile/internal/gc/ssa.go
|
|
The main function is to assign a value to the _defer
structure and return it.
Open coding
The Go language was optimized in 1.14 by inlining code so that calls to the defer
function are made directly at the end of the function, with little additional overhead. In the build phase of SSA buildssa
will insert open coding based on a check to see if the condition is met. Since the code in the build phase of SSA is not well understood, only the basics are given below and no code analysis is involved.
We can compile a printout of the example for the allocation on the heap.
|
|
We can see in the assembly output above that the defer function is inserted directly into the end of the function to be called.
This example above is easy to optimize, but what if a defer
is in a conditional statement that must not be determined until runtime?
The defer bit
delay bit is also used in open coding to determine whether a conditional branch should be executed or not. This delay bit is an 8-bit binary code, so only a maximum of 8 defer
s can be used in this optimization, including the defer
in the conditionals. Each bit is set to 1 to determine if the delay statement is set at runtime, and if so, the call occurs. Otherwise, it is not called.
For example, an example is explained in the following article.
https://go.googlesource.com/proposal/+/refs/heads/master/design/34481-opencoded-defers.md
At the stage of creating a deferred call, it is first recorded which defer with conditions are triggered by a specific location of the deferred bits.
Before the function returns and exits, the exit
function creates a check code for the delayed bits in reverse order:
Before the function exits, it determines whether the position is 1 by taking the delayed bits with the corresponding position, and if it is 1, then the defer
function can be executed.
Summary
This article explains the execution rules of defer
and introduces the defer
type. The main purpose of this article is to explain how defer
function calls are made through heap allocation, such as: function calls to understand “defer
argument passing”, “how multiple defer statements are executed”, “and what is the order of execution of defer, return, and return value”, and other issues. Through this analysis, we hope you can have a deeper understanding of defer.