Go 1.17 changed the long-standing stack-based calling convention. Before we can understand Go’s calling convention, we need to know what it is.
The x86 calling convention is, in a nutshell, the language’s convention for passing parameters between functions. The caller knows what parameters to pass to the called function in what form and in what order, and the called function follows this convention to find the contents of the passed parameters in the appropriate place.
We’ve seen the argument passing diagram in older versions of Go in many, many places, so here’s one I drew earlier.
You can see that the incoming and return values are on the stack, in order, from the low address, to the high address.
This stack-based pass-through is indeed simpler in design and implementation, but the stack pass-through results in several parameter moves between registers and memory during the function call. For example, when calling, the arguments are moved to the SP location (here from register -> memory); when ret, the arguments are moved from register to FP. After ret, the return value is moved from memory -> register.
There is an order of magnitude performance difference between registers, which are internal components of the CPU, and main memory, which is generally external, so it has been said that Go’s function calls are poor and need to be optimised (although these are probably not optimised for overall system performance reasons either).
Go 1.17 devised a set of call protocols based on register passing, which is currently only enabled on x86 platforms, and we can take a brief look at them via disassembly. Here, still to simplify matters, we only use int parameters (float uses a non-generic register).
|
|
Passing in a few more arguments makes it easier to see that there are 12 arguments and 11 values returned.
Looking directly at the results of the disassembly, we start with the call to main.add.
|
|
As you can see, there are officially only 9 general purpose registers used, AX, BX, CX, DI, SI, R8, R9, R10, R11, in that order, and beyond, on the stack.
Then there is the return value part of main.add.
|
|
The return value uses the exact same sequence of registers as the input, and again when there are more than 9 return values, the excess is returned on the stack.
In a traditional calling protocol, a distinction is usually made between caller saved registers and callee saved registers, but in Go all registers are caller saved, i.e. the caller is responsible for saving them, and there is no guarantee in callee that they will not be destroyed on site.
This is also evidenced here by the fact that the return value directly overwrites the registers used by the incoming reference.
Since function calls don’t need to be passed through the stack anymore, there is a certain probability that the goroutine stack itself will use less memory in some scenarios where the function calls are nested at a deeper level. But since I don’t have a production environment at hand, I can’t verify this for now.
Reference https://xargin.com/go1-17-new-calling-convention/