Since Go 1.12, people have been having problems with monitoring false positives. The reason for this is that Go changed the memory reclamation policy used by the madvise
system call from MADV_DONTNEED
to MADV_FREE
starting in 1.12. From the available documentation, it appears that RSS, the most commonly used memory monitoring metric, does not reflect the portion of memory in the process that is not reclaimed by the OS. Naturally, there are some suggestions that RSS should be replaced with a more appropriate metric, such as PSS or even USS. This leads to some tricky questions, as PSS and USS are not as common as RSS, and the documentation does not say much about how they actually reflect memory consumption. Are they really more appropriate than RSS?
In order to make the problem clear, it is always necessary to explain what the problem is. The question always searches out a whole bunch of repeatedly copied explanations.
1
2
3
4
5
6
7
8
|
VSS, USS, PSS, and RSS are four indicators for measuring memory usage:
- VSS: Virtual Set Size, virtual memory footprint, including shared libraries.
- RSS: Resident Set Size, actual physical memory usage, including shared libraries.
- PSS: Proportion Set Size, the actual physical memory used, shared libraries, etc. are allocated proportionally.
- USS: Unique Set Size, the physical memory occupied by the process, does not calculate the memory usage of the shared library.
-
Generally we have VSS >= RSS >= PSS >= USS.
|
From these descriptions, the overall impression is that USS is better than PSS, PSS is better than RSS, and VSS is basically unusable: because VSS reflects the virtual address space requested and not returned by the current process, RSS contains the so-called shared libraries, PSS shares the size of the shared libraries in proportion to the shared processes, and USS does not count the memory of the shared libraries directly.
By this definition, the difference between RSS, PSS, and USS is only in the shared libraries, but for statically linked programs like Go, shared libraries are not that common. A reasonable doubt is that in most cases: RSS == PSS == USS.
MADV_DONTNEED
vs MADV_FREE
For functions like memory consumption that are directly tied to the kernel, a good kernel will naturally log this information somewhere for review. On Linux, for example, RSS is usually placed in /proc/[pid]/status
, and when a running application wants to query its own consumption behavior, it can even use /prof/self/status
to read its own consumption status directly, like cat
itself.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
$ cat /proc/self/status
Name: cat
...
Pid: 3509083
...
VmPeak: 11676 kB
VmSize: 11676 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 596 kB
VmRSS: 596 kB
RssAnon: 68 kB
RssFile: 528 kB
RssShmem: 0 kB
|
The meaning of each variable can be found in the man page man proc
, for example VmRSS
refers to the value of RSS, and VmSize is the value of VSS, and so on. Of course, the contents of /proc/[pid]/status
are embellished, so you can get this information directly from the more concise /proc/[pid]/stat
statistics file if you’re really programmatic. Let’s take RSS as an example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
var pageSize = syscall.Getpagesize()
// rss returns the resident set size of the current process, unit in MiB
func rss() int {
data, err := ioutil.ReadFile("/proc/self/stat")
if err != nil {
log.Fatal(err)
}
fs := strings.Fields(string(data))
rss, err := strconv.ParseInt(fs[23], 10, 64)
if err != nil {
log.Fatal(err)
}
return int(uintptr(rss) * uintptr(pageSize) / (1 << 20)) // MiB
}
|
For memory management system calls on Linux, the memory from mmap plus PROT_READ
and PROT_WRITE
will result in a missing page error. But eventually the OS will actually allocate this memory to the process anyway. The difference between using MADV_DONTNEED
with madvise
and MADV_FREE
can be directly measured by the rss()
method above. For example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
|
package main
import (
"flag"
"fmt"
"io/ioutil"
"log"
"os"
"runtime"
"strconv"
"strings"
"syscall"
)
/*
#include <sys/mman.h> // for C.MADV_FREE
*/
import "C"
func main() {
useDontneed := flag.Bool("dontneed", false, "use MADV_DONTNEED instead of MADV_FREE")
flag.Usage = func() {
fmt.Fprintf(os.Stderr, "usage: %s [flags] anon-MiB\n", os.Args[0])
flag.PrintDefaults()
os.Exit(2)
}
flag.Parse()
if flag.NArg() != 1 {
flag.Usage()
}
anonMB, err := strconv.Atoi(flag.Arg(0))
if err != nil {
flag.Usage()
}
// anonymous mapping
m, err := syscall.Mmap(-1, 0, anonMB<<20, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_PRIVATE|syscall.MAP_ANON)
if err != nil {
log.Fatal(err)
}
printStats("After anon mmap:", m)
// page fault by accessing it
for i := 0; i < len(m); i += pageSize {
m[i] = 42
}
printStats("After anon fault:", m)
// use different strategy
if *useDontneed {
err = syscall.Madvise(m, syscall.MADV_DONTNEED)
if err != nil {
log.Fatal(err)
}
printStats("After MADV_DONTNEED:", m)
} else {
err = syscall.Madvise(m, C.MADV_FREE)
if err != nil {
log.Fatal(err)
}
printStats("After MADV_FREE:", m)
}
runtime.KeepAlive(m)
}
func printStats(ident string, m []byte) {
fmt.Print(ident, " ", rss(), " MiB RSS\n")
}
|
Assuming a 10M request, you can see the following result.
1
2
3
4
5
6
7
8
9
|
$ go run main.go 10
After anon mmap: 2 MiB RSS
After anon fault: 13 MiB RSS
After MADV_FREE: 13 MiB RSS
$ go run main.go -dontneed 10
After anon mmap: 3 MiB RSS
After anon fault: 13 MiB RSS
After MADV_DONTNEED: 3 MiB RSS
|
The difference is clear: after MADV_FREE
ends, RSS is not reduced, while the MADV_DONTNEED
policy is returned in full.
So how do we get the PSS/USS values? More detailed memory mapping information is actually further documented in /proc/[pid]/smaps
, but it’s a bit tricky to compute because it’s documented by different mmap operations. But this does not prevent us from automating this fetching process.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
|
type mmapStat struct {
Size uint64
RSS uint64
PSS uint64
PrivateClean uint64
PrivateDirty uint64
PrivateHugetlb uint64
}
func getMmaps() (*[]mmapStat, error) {
var ret []mmapStat
contents, err := ioutil.ReadFile("/proc/self/smaps")
if err != nil {
return nil, err
}
lines := strings.Split(string(contents), "\n")
// function of parsing a block
getBlock := func(block []string) (mmapStat, error) {
m := mmapStat{}
for _, line := range block {
if strings.Contains(line, "VmFlags") ||
strings.Contains(line, "Name") {
continue
}
field := strings.Split(line, ":")
if len(field) < 2 {
continue
}
v := strings.Trim(field[1], " kB") // remove last "kB"
t, err := strconv.ParseUint(v, 10, 64)
if err != nil {
return m, err
}
switch field[0] {
case "Size":
m.Size = t
case "Rss":
m.RSS = t
case "Pss":
m.PSS = t
case "Private_Clean":
m.PrivateClean = t
case "Private_Dirty":
m.PrivateDirty = t
case "Private_Hugetlb":
m.PrivateHugetlb = t
}
}
return m, nil
}
blocks := make([]string, 16)
for _, line := range lines {
if strings.HasSuffix(strings.Split(line, " ")[0], ":") == false {
if len(blocks) > 0 {
g, err := getBlock(blocks)
if err != nil {
return &ret, err
}
ret = append(ret, g)
}
blocks = make([]string, 16)
} else {
blocks = append(blocks, line)
}
}
return &ret, nil
}
type smapsStat struct {
VSS uint64 // bytes
RSS uint64 // bytes
PSS uint64 // bytes
USS uint64 // bytes
}
func getSmaps() (*smapsStat, error) {
mmaps, err := getMmaps()
if err != nil {
panic(err)
}
smaps := &smapsStat{}
for _, mmap := range *mmaps {
smaps.VSS += mmap.Size * 1014
smaps.RSS += mmap.RSS * 1024
smaps.PSS += mmap.PSS * 1024
smaps.USS += mmap.PrivateDirty*1024 + mmap.PrivateClean*1024 + mmap.PrivateHugetlb*1024
}
return smaps, nil
}
|
This can eventually be used as follows.
1
2
3
4
5
6
|
stat, err := getSmaps()
if err != nil {
panic(err)
}
fmt.Printf("VSS: %d MiB, RSS: %d MiB, PSS: %d MiB, USS: %d MiB\n",
stat.VSS/(1<<20), stat.RSS/(1<<20), stat.PSS/(1<<20), stat.USS/(1<<20))
|
Well, applying it to the previous program, the performance is as follows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
$ go run main.go 10 # MADV_FREE
After anon mmap: 2 MiB RSS
After anon fault: 13 MiB RSS
After MADV_FREE: 13 MiB RSS
VSS: 1048 MiB, RSS: 13 MiB, PSS: 12 MiB, USS: 12 MiB
$ go run main.go -dontneed 10
After anon mmap: 2 MiB RSS
After anon fault: 13 MiB RSS
After MADV_DONTNEED: 3 MiB RSS
After anon mmap: 2 MiB RSS
After anon fault: 13 MiB RSS
After MADV_DONTNEED: 3 MiB RSS
VSS: 1049 MiB, RSS: 3 MiB, PSS: 2 MiB, USS: 2 MiB
|
Yes, there is no difference. Oh then what to monitor? Three means.
GODEBUG=madvdontneed=1
, for distributions between 1.12 and 1.16
runtime.ReadMemStats
to read the reports periodically. Or use expvar, or the standard pprof hand, except that each is a significant performance penalty for the runtime, since these queries are requires STW.
- upgrade to Go 1.16
Of course, there is a fourth way to do this: no monitoring.
If you know Linux system calls well, you might also think of using the mincore system call to check the page out status, which is one way to do it but not for Go, because the user code does not know the address consumed by the process, much less the page. Even if we could, it would be very expensive. Nonetheless, it is possible to check the whole thing, but only if you query the memory that you requested via mmap.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
/*
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdint.h>
static int inCore(void *base, uint64_t length, uint64_t pages) {
int count = 0;
unsigned char *vec = malloc(pages);
if (vec == NULL)
return -1;
if (mincore(base, length, vec) < 0)
return -1;
for (int i = 0; i < pages; i++)
if (vec[i] != 0)
count++;
free(vec);
return count;
}
*/
import "C"
func inCore(b []byte) int {
n, err := C.inCore(unsafe.Pointer(&b[0]), C.uint64_t(len(b)), C.uint64_t(len(b)/pageSize))
if n < 0 {
log.Fatal(err)
}
return int(uintptr(n) * uintptr(pageSize) / (1 << 20)) // MiB
}
|