The Go language has a built-in runtime (that is, runtime) that abandons the traditional way of allocating memory in favor of autonomous management. This allows for autonomous implementation of better memory usage patterns, such as memory pooling, pre-allocation, etc. This way, you don’t need to make a system call for every memory allocation.
The Golang runtime memory allocation algorithm is derived from Google’s TCMalloc algorithm for C. The core idea is to reduce the granularity of locks by dividing memory into multiple levels of management. It manages the available heap memory in a two-level allocation: each thread maintains a separate memory pool of its own and allocates memory from that pool first, and only applies to the global memory pool when the pool is insufficient to avoid frequent competition between different threads for the global memory pool.
Basic Concepts
When Go starts a program, it first requests a block of memory from the operating system (note that this is still just a virtual address space and does not actually allocate memory), cuts it into small chunks and then manages it itself.
The requested memory block is allocated three areas, 512MB, 16GB, and 512GB in size on the X64.
The arena
is what we call the heap area. Go dynamically allocates memory in this area, which divides the memory into 8KB
sized pages, some of which are combined and called mspan
.
The bitmap area
identifies which addresses in the arena
area hold objects, and uses the 4bit
flag bit to indicate whether the object contains pointer, GC
tag information. One byte
size of memory in bitmap
corresponds to 4 pointer size (8B pointer size) of memory in the arena
area, so the size of bitmap
area is 512GB/(4*8B)=16GB
.
From the above figure, you can actually see that the high address part of the bitmap points to the low address part of the arena area, which means that the address of the bitmap grows from the high address to the low address.
The spans area
holds the pointers to the mspan
(that is, the basic unit of memory management combined with some pages of the arena
partition, which will be discussed later), and each pointer corresponds to a page, so the size of the spans
area is 512GB/8KB*8B=512MB
. Dividing by 8KB is to calculate the number of pages in the arena
area, and multiplying by 8 is to calculate the size of all the pointers in the spans
area. When creating an mspan
, the corresponding spans
area is filled by page, and when recycling the object
, the mspan
it belongs to can be easily found based on its address.
Memory management unit
mspan
: The basic memory management unit in Go, a large block of memory consisting of a contiguous 8KB
page. Note that a page here is not the same thing as an OS page, which is typically several times the size of an OS page. In a nutshell: mspan
is a Doubly linked list containing the starting address, the mspan
specification, the number of pages, etc.
Each mspan
is divided into several objects
according to the size of its own property Size Class
, and each object
can store one object. A bitmap is used to mark the objects
that are not yet used. The property Size Class
determines the size of the object
, and mspan
will only be assigned to objects that are close to the size of the object
, but, of course, smaller than the size of the object
. There is another concept: Span Class
, which has a similar meaning to Size Class
.
|
|
This is because there are actually two mspan
s per Size Class
, which means there are two Span Class
s. One of them is allocated to objects that contain pointers, and the other one is allocated to objects that do not contain pointers. This will bring benefits to the garbage collection mechanism, which will be discussed in a later article.
As shown below, mspan
consists of a set of consecutive pages divided into objects
of a certain size.
Go1.9.2 mspan
of Size Class
has 67 kinds, each mspan
partitioned object size is a multiple of 8*2n, this is written dead in the code.
|
|
According to the Size Class
of mspan
, you can get the size of the object
it divides. For example, if Size Class
is equal to 3, object
size is 32B. An object of size 32B can store objects in the size range of 17B to 32B. For tiny objects (less than 16B), the allocator will merge them and allocate several objects into the same object
.
The maximum number in the array is 32768, that is, 32KB, more than this size is a large object, it will be treated specially, this will be introduced later. By the way, a type Size Class
of 0 indicates a large object, which is actually allocated directly from heap memory, while small objects are allocated through mspan
.
For mspan, its Size Class
determines the number of pages it can be allocated, which is also written dead in the code.
|
|
For example, when we want to request a mspan
of object
size 32B
, the corresponding index in class_to_size is 3, and the index 3 corresponds to the number of pages in the class_to_allocnpages
array is 1.
mspan
structure definition.
|
|
Let’s put mspan
into a larger perspective.
The above figure shows that there are two S
s pointing to the same mspan
, because the P
to which these two S
s point belongs to the same mspan
. So, the S
pointing to it can be quickly found by the address on the arena
, and the mspan
can be found by the S
, recalling what we said earlier about each pointer to the mspan
area corresponding to a page.
Assuming that the Size Class
of the first mspan
on the far left is equal to 10, according to the class_to_size
array, the object
size of this msapn
partition is 144B, and the number of objects that can be allocated is 8KB/144B=56.89
, rounded up to 56, so there will be some memory wasted. Go’s source code has the size of all mspan
wasted memory of Size Class
; then according to the class_to_allocnpages
array, we get this mspan
consists of only 1 page
; suppose this mspan
is allocated to pointerless objects, then spanClass
is equal to 20.
startAddr
points directly to a location in the arena
region, indicating the starting address of this mspan
, allocBits
points to a bitmap, each representing whether a block has been allocated or not, and allocCount
indicates the total number of objects allocated.
Thus, the parameters of each field of the first mspan
from the left are as follows.
Memory Management Components
Memory allocation is done by the memory allocator. The allocator consists of 3 components: mcache
, mcentral
, mheap
.
mcache
mcache
: Each worker thread is bound to an mcache, which locally caches available mspan
resources so that they can be allocated directly to Goroutines, as there is no competition from multiple Goroutines, so no lock resources are consumed.
Structure definition of mcache
.
mcache
uses Span Classes
as an index to manage multiple mspan
for allocation, and it contains all sizes of mspan
. It is twice as large as _NumSizeClasses
, which is 67*2=134
. Why there is a twofold relationship, as we mentioned earlier: to speed up memory reclamation afterwards, half of the objects allocated in mspan
in the array do not contain pointers, and the other half contain pointers.
The mspan
of a pointerless object is garbage collected without further scanning for references to other active objects. More on this in a later garbage collection article, but this time we’ll stop here.
mcache
is initialized without any mspan
resources and is dynamically requested from mcentral
during use and cached afterwards. When the object is less than or equal to 32KB in size, mcache
is allocated using the corresponding size of mspan
.
mcentral
mcentral
: Provides a sliced mspan
resource for all mcache
. Each central
holds a global mspan
list of a specific size, both allocated and unallocated. Each mcentral
corresponds to one type of mspan
, and the type of mspan
causes it to divide object
s of different sizes. When there is no suitable (i.e., specific size) mspan
in the worker thread’s mcache
, it is fetched from mcentral
.
mcentral
is shared by all worker threads, and there is competition from multiple Goroutines, so it consumes lock resources. Structure definition.
|
|
empty
means that all the mspan
in this linked list are either allocated object
or mspan
that have already been taken by cache
, and this mspan
is exclusive to that worker thread. And nonempty
indicates a list of mspan
s that have free objects. Each central
structure is maintained in mheap
.
To briefly explain the flow of mcache
fetching and returning mspan
from mcentral
.
- Get Locks; finds an available
mspan
from thenonempty
linked list; removes it from thenonempty
linked list; adds the retrievedmspan
to theempty
linked list; returns themspan
to the worker thread; unlocks it. - Return lock; remove
mspan
from theempty
linked list; addmspan
to thenonempty
linked list; unlock.
mheap
mheap
: Represents all the heap space held by the Go program, which uses a global object _mheap
of mheap
to manage heap memory.
When mcentral
does not have a free mspan
, it requests it from mheap
. And when mheap
runs out of resources, it requests new memory from the OS. mheap
is mainly used for memory allocation for large objects and for managing uncut mspan
, which is used to cut mcentral
into smaller objects.
Also we see that mheap
contains mcentral
of all sizes, so when an mcache
requests mspan
from mcentral
, it only needs to use a lock in the separate mcentral
and does not affect requests for mspan
of other sizes.
mheap
structure definition.
|
|
Above we see that the bitmap and arena_start point to the same address. This is because the address of the bitmap grows from high to low, so they point to the same memory location.
Allocation process
Whether a variable is allocated on the stack or on the heap is determined by the result of the escape analysis. Usually, the compiler prefers to allocate variables on the stack because it has less overhead, and the most extreme is “zero garbage”, where all variables are allocated on the stack so that there is no memory fragmentation, garbage collection, or anything like that.
Go’s memory allocator allocates objects in three categories based on their size: small objects (less than or equal to 16B), general objects (greater than 16B, less than or equal to 32KB), and large objects (greater than 32KB).
The general allocation process.
- 32KB objects are allocated directly from the mheap.
- <=16B objects allocated using the tiny allocator of mcache.
- objects of [16B,32KB], first calculate the specification size of the object, and then allocate it using the mspan of the corresponding specification size in the mcache.
- If mcache does not have mspan of the corresponding specification size, apply to mcentral
- If mcentral does not have an mspan of the corresponding specification size, apply to mheap
- If there is no mspan of the appropriate size in mheap either, apply to the operating system
Summary
- Go requests a large chunk of memory from the operating system at program startup and manages it on its own afterwards.
- The basic unit of Go memory management is the mspan, which consists of several pages, each mspan can allocate a specific size of object.
- mcache, mcentral, and mheap are the three major components of Go memory management in a hierarchy. mcache manages the mspan that threads cache locally; mcentral manages the global mspan for use by all threads; and mheap manages all of Go’s dynamically allocated memory.
- Very small objects are allocated in an object to save resources and use tiny allocator to allocate memory; generally small objects are allocated memory by mspan; and large objects are allocated memory directly by mheap.