How can I avoid Go command line execution creating "orphan" processes?

exec.Command is usually used in Go programs if we want to execute a command, and it works well enough to achieve our goal.

If we logically need to terminate the process, we can quickly use the cmd.Process.Kill() method to end the process. But what happens when the command we want to execute will start other child processes to operate?

Generation of orphan processes

Test.


func kill(cmd *exec.Cmd) func() {
    return func() {
    if cmd != nil {
    cmd.Process.Kill()
    }
    }
}

func main() {
    cmd := exec.Command("/bin/bash", "-c", "watch top >top.log")
    time.AfterFunc(1*time.Second, kill(cmd))
    err := cmd.Run()
    fmt.Printf("pid=%d err=%s\n", cmd.Process.Pid, err)
}

Run.

1
2
3

go run main.go

pid=27326 err=signal: killed

View process information.


ps -j

USER    PID  PPID  PGID   SESS JOBC STAT   TT       TIME COMMAND
king  24324     1 24303      0    0 S    s012    0:00.01 watch top

We can see that the PPID of this “watch top” is 1, which means that this process has become an “orphan” process.

So why is this happening, which is not what we expected, can be found in the Go documentation.

Resolve all child processes through process groups

In linux, there is the concept of session, process group and process group, and Go also uses linux’s kill(2) method to send signals, so is it possible to kill to end all the child processes of the process to be ended?

The definition of kill(2) in linux is as follows.


#include <signal.h>

int kill(pid_t pid, int sig);

and in the description of the method, the following can be seen.

If the pid is positive, it sends a sig signal to the specified pid, and if the pid is negative, it sends a sig signal to the process group, so we can exit all the child processes through the process group? Change the kill method in the Go program.

func kill(cmd *exec.Cmd) func() {
    return func() {
    if cmd != nil {
    // cmd.Process.Kill()
    syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)
    }
    }
}

func main() {
    cmd := exec.Command("/bin/bash", "-c", "watch top >top.log")
    time.AfterFunc(1*time.Second, kill(cmd))
    err := cmd.Run()
    fmt.Printf("pid=%d err=%s\n", cmd.Process.Pid, err)
}

Perform again.

`1`	`go run main.go`

It will be found that the program is stuck, let’s look at the currently executing process.


ps -j

USER    PID  PPID  PGID   SESS JOBC STAT   TT       TIME COMMAND
king 27655 91597 27655      0    1 S+   s012    0:01.10 go run main.go
king 27672 27655 27655      0    1 S+   s012    0:00.03 ..../exe/main
king 27673 27672 27655      0    1 S+   s012    0:00.00 /bin/bash -c watch top >top.log
king 27674 27673 27655      0    1 S+   s012    0:00.01 watch top

You can see that our go run spawned a subprocess 27672 (command is the temporary directory where go executes, it’s long, hence the ellipsis), 27672 spawned a process 27673 (watch top >top.log), and 27673 spawned a process 27674 (watch top). So why aren’t all these subprocesses shut down?

In fact, such a low-level mistake was made. From the above figure, we can see that their process group ID is 27655, but we passed the id of cmd, that is, 27673, which is not the process group ID, so the program did not kill, resulting in cmd.

In Linux, the first process in a process group is called the Process Group Leader, and the ID of this process group is the ID of this process, and other processes created from this process will inherit the process group and session information of this process; from the above, we can see that the PID and PGID of the go run main.go program are both 27655, so this process is the We can’t kill this process group unless we want to “commit suicide”, hahaha.

Then we create a new process group for the process we want to execute, and we can kill it. In linux, the process group ID is set by the setpgid method, defined as follows.


#include <unistd.h>

int setpgid(pid_t pid, pid_t pgid);

If both pid and pgid are set to 0, i.e. setpgid(0,0), then the current process will be used as the process group leader and a new process group will be created.

SysProcAttr to create a new process group, the modified code is as follows.


func kill(cmd *exec.Cmd) func() {
    return func() {
    if cmd != nil {
    // cmd.Process.Kill()
    syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)
    }
    }
}

func main() {
    cmd := exec.Command("/bin/bash", "-c", "watch top >top.log")
  cmd.SysProcAttr = &syscall.SysProcAttr{
    Setpgid: true,
    }
    
    time.AfterFunc(1*time.Second, kill(cmd))
    err := cmd.Run()
    fmt.Printf("pid=%d err=%s\n", cmd.Process.Pid, err)
}

Perform again.


go run main.go

pid=29397 err=signal: killed

Re-viewing the process.


ps -j

USER    PID  PPID  PGID   SESS JOBC STAT   TT       TIME COMMAND

We find that the watch processes no longer exist, so let’s see if there are any orphan processes.

# Since my test environment is a mac, this script can only be executed on a mac
ps -j | head -1;ps -j | awk '{if ($3 ==1 && $1 !="root"){print $0}}' | head

USER    PID  PPID  PGID   SESS JOBC STAT   TT       TIME COMMAND

There are no more orphan processes and the problem has been completely solved at this point.

The child process listens to the parent process to see if it quits (can only be executed under linux)

Assuming that the program to be called is also another application written by ourselves, it can be handled using Linux’s prctl method, which is defined as follows.


#include <sys/prctl.h>

int prctl(int option, unsigned long arg2, unsigned long arg3,
          unsigned long arg4, unsigned long arg5);

This method has an important option: PR_SET_PDEATHSIG, which is used to receive the exit of the parent process.

Let’s construct a problematic program again.

There are two files, main.go and child.go. main.go will call the child.go file.

The main.go file.


package main

import (
        "os/exec"
)

func main() {
        cmd := exec.Command("./child")
        cmd.Run()
}

child.go file.

package main

import (
    "fmt"
    "time"
)

func main() {
    for {
    time.Sleep(200 * time.Millisecond)
    fmt.Println(time.Now())
    }
}

Compile the two files separately in a Linux environment.


// Compile main.go to generate the main binary
go build -o main main.go

// Compile child.go to generate the child binary
go build -o child child.go

To execute the main binary.

`1`	`./main &`

To view their process.


ps -ef

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 06:05 pts/0    00:00:00 /bin/bash
root     11514     1  0 12:12 pts/0    00:00:00 ./main
root     11520 11514  0 12:12 pts/0    00:00:00 ./child

We can see the processes of main and child, child is the child of main, we will kill the main process, and check the process status.


kill -9 11514

ps -ef

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 06:05 pts/0    00:00:00 /bin/bash
root     11520     1  0 12:12 pts/0    00:00:00 ./child

We can see that the PPID of the child process has been changed to 1, which means that this process has become an orphan process.

Then we can use PR_SET_PDEATHSIG to ensure that the parent process exits and the child process exits as well, roughly in two ways: by using CGO calls and by using syscall.

使用 CGO

Amend child to read as follows.

import (
    "fmt"
    "time"
)

// #include <stdio.h>
// #include <stdlib.h>
// #include <sys/prctl.h>
// #include <signal.h>
//
// static void killTest() {
//    prctl(PR_SET_PDEATHSIG,SIGKILL);
// }
import "C"

func main() {
    C.killTest()
  
    for {
    time.Sleep(200 * time.Millisecond)
    fmt.Println(time.Now())
    }
}

In the program, using CGO, for a simple demonstration, the killTest method of C is written in the Go file and the prctl method is called, then the killTest method is called in the Go program, let’s recompile the execution and look at the process again:.


go build -o child child.go
./main & 
ps -ef 

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 06:05 pts/0    00:00:00 /bin/bash
root     11663     1  0 12:28 pts/0    00:00:00 ./main
root     11669 11663  0 12:28 pts/0    00:00:00 ./child

Kill main again, and look at the process.


kill -9 11663
ps -ef

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 06:05 pts/0    00:00:00 /bin/bash

You can see that the child process has also exited, which means that the prctl called by CGO is in effect.

syscall.RawSyscall method

syscall.RawSyscall method provided in Go can also be used instead of calling CGO. In Go’s documentation, you can check the constants defined in the syscall package (check linux, if it is a local godoc, you need to specify GOOS=linux) and see several constants we want to use and their corresponding values.

// Other content is omitted
const(
    ....
    PR_SET_PDEATHSIG                 = 0x1
    ....
)

const(     
    .....
    SYS_PRCTL                  = 157
    .....
)

where the value of the PR_SET_PDEATHSIG operation is 1 and the value of SYS_PRCTL is 157, then modify child.go to read as follows


package main

import (
    "fmt"
    "os"
    "syscall"
    "time"
)

func main() {
    _, _, errno := syscall.RawSyscall(uintptr(syscall.SYS_PRCTL), uintptr(syscall.PR_SET_PDEATHSIG), uintptr(syscall.SIGKILL), 0)
    if errno != 0 {
    os.Exit(int(errno))
    }

    for {
    time.Sleep(200 * time.Millisecond)
    fmt.Println(time.Now())
    }
}

Compile again and execute.

go build -o child child.go
./main & 
ps -ef

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 06:05 pts/0    00:00:00 /bin/bash
root     12208     1  0 12:46 pts/0    00:00:00 ./main
root     12214 12208  0 12:46 pts/0    00:00:00 ./child

To end the main process.


kill -9 12208
ps -ef

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 06:05 pts/0    00:00:00 /bin/bash

The child process has exited, and the final result has been achieved.

Summary

When we use Go programs to execute other programs, if the other programs also open other processes, then when we kill them, they may become orphaned processes and stay in memory. Of course, if our program exits illegally or is called by kill, it will also cause the process we are executing to become an orphan process, so to solve this problem, there are two ways to think about it.

Create a new process group for the program to be executed, and call syscall.Kill, passing a negative pid value to close all processes in this process group (a more perfect solution).
If the program to be called is also written by us, then we can use PR_SET_PDEATHSIG to sense the parent process exit, then this way we need to call Linxu’s prctrl, either by using CGO, or by using syscall.RawSyscall.

But whichever way is used, it just provides an idea that needs special attention when we write server-side service programs to prevent orphan processes from consuming server resources.

Table of Contents