When we are using Linux, if there is a problem with network or disk I/O, we may find that the process is stuck, even if we use kill -9, we cannot kill the process, and many common debugging tools such as strace, pstack, etc. are not working.

stuck process

At this point, we use ps to view the list of processes and see that the status of the stuck process is shown as D.

stuck process

The D state is described in man ps as Uninterruptible Sleep.

Linux processes have two sleep states.

  1. Interruptible Sleep, which is interruptible sleep, is shown in the ps command as S. A process in this sleep state can be woken up by sending it a signal.
  2. Uninterruptible Sleep, uninterruptible sleep, is shown in the ps command as D. A process in this sleep state cannot immediately process any signals sent to it, which is why it cannot be killed with a kill.

There is an answer at Stack Overflow.

kill -9 just sends a SIGKILL signal to the process. When a process is in a special state (signal handling, or in a system call) it cannot handle any signals, including SIGKILL, and the process cannot be killed immediately, which is often referred to as the D state (non-interruptible sleep state). Common debugging tools (e.g. strace, pstack, etc.), which also make use of a particular signal, cannot be used in this state either.

As you can see, a process in state D is usually in a kernel state system call, so how do you know which system call it is and what it is waiting for? Fortunately, Linux provides procfs (the /proc directory under Linux), which allows you to see the current kernel call stack of any process. Let’s simulate this with a process accessing JuiceFS (since the JuiceFS client is based on FUSE, a user state file system, it is easier to simulate I/O failures).

First mount JuiceFS to the foreground (in the . /juicefs mount command with a -f argument), then stop the process with Cltr+Z. If you then use ls /jfs to access the mount point, you will see that ls is stuck.

With the following command you can see that ls is stuck on the vfs_fstatat call, which sends a getattr request to the FUSE device and is waiting for a response. The JuiceFS client process has been stopped by us, so it is stuck on.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ cat /proc/`pgrep ls`/stack
[<ffffffff813277c7>] request_wait_answer+0x197/0x280
[<ffffffff81327d07>] __fuse_request_send+0x67/0x90
[<ffffffff81327d57>] fuse_request_send+0x27/0x30
[<ffffffff8132b0ac>] fuse_simple_request+0xcc/0x1a0
[<ffffffff8132c0f0>] fuse_do_getattr+0x120/0x330
[<ffffffff8132df28>] fuse_update_attributes+0x68/0x70
[<ffffffff8132e33d>] fuse_getattr+0x3d/0x50
[<ffffffff81220c6f>] vfs_getattr_nosec+0x2f/0x40
[<ffffffff81220ee6>] vfs_getattr+0x26/0x30
[<ffffffff81220fc8>] vfs_fstatat+0x78/0xc0
[<ffffffff8122150e>] SYSC_newstat+0x2e/0x60
[<ffffffff8122169e>] SyS_newstat+0xe/0x10
[<ffffffff8186281b>] entry_SYSCALL_64_fastpath+0x22/0xcb
[<ffffffffffffffff>] 0xffffffffffffffff

Pressing Ctrl+C at this point does not exit either.

1
2
3
root@localhost:~# ls /jfs
^C
^C^C^C^C^C

But with strace it wakes up and starts processing the previous interrupt signal, and then exits.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
root@localhost:~# strace -p `pgrep ls`
strace: Process 26469 attached
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=13290, si_uid=0} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
。。。
tgkill(26469, 26469, SIGINT)            = 0
--- SIGINT {si_signo=SIGINT, si_code=SI_TKILL, si_pid=26469, si_uid=0} ---
+++ killed by SIGINT +++

It can also be killed with kill -9 at this point.

1
2
3
4
root@localhost:~# ls /jfs
^C
^C^C^C^C^C
^C^CKilled

Because the simple system call vfs_lstatat() does not block signals like SIGKILL, SIGQUIT, SIGABRT, etc. it can be handled in the usual way.

Let’s simulate a more complex I/O error by configuring JuiceFS with an unwritable storage type, mounting it, and trying to write to it with cp, which also gets stuck:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
root@localhost:~# cat /proc/`pgrep cp`/stack
[<ffffffff813277c7>] request_wait_answer+0x197/0x280
[<ffffffff81327d07>] __fuse_request_send+0x67/0x90
[<ffffffff81327d57>] fuse_request_send+0x27/0x30
[<ffffffff81331b3f>] fuse_flush+0x17f/0x200
[<ffffffff81218fd2>] filp_close+0x32/0x80
[<ffffffff8123ac53>] __close_fd+0xa3/0xd0
[<ffffffff81219043>] SyS_close+0x23/0x50
[<ffffffff8186281b>] entry_SYSCALL_64_fastpath+0x22/0xcb
[<ffffffffffffffff>] 0xffffffffffffffff

Why are you stuck on close_fd()? This is because writing to JFS is asynchronous, when cp calls write(), the data is first cached in the JuiceFS client process and is written to the backend storage asynchronously, when cp finishes writing the data, it calls close to make sure the data is written, which corresponds to the FUSE flush operation. When the flush operation is encountered, it needs to make sure that all the data written is persisted to the backend storage, and when the backend storage fails to write, it is in the process of retrying several times, so the flush operation is stuck and has not yet replied to cp, so cp is stuck too.

At this point it is possible to interrupt cp with Cltr+C or kill, as JuiceFS implements interrupt handling for various filesystem operations, allowing it to abort the current operation (e.g. flush) and return EINTR, which will interrupt the application accessing JuiceFS in case of various network failures.

At this point if I stop the JuiceFS client process so that it can no longer handle any FUSE requests (including interrupt requests), then if I try to kill it, it will not kill it, including kill -9, and when I check the process status with ps, it is already in the D state.

1
root      1592  0.1  0.0  20612  1116 pts/3    D+   12:45   0:00 cp parity /jfs/aaa

But at this point it is possible to use cat /proc/1592/stack to see its kernel call stack

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
root@localhost:~# cat /proc/1592/stack
[<ffffffff8132775d>] request_wait_answer+0x12d/0x280
[<ffffffff81327d07>] __fuse_request_send+0x67/0x90
[<ffffffff81327d57>] fuse_request_send+0x27/0x30
[<ffffffff81331b3f>] fuse_flush+0x17f/0x200
[<ffffffff81218fd2>] filp_close+0x32/0x80
[<ffffffff8123ac53>] __close_fd+0xa3/0xd0
[<ffffffff81219043>] SyS_close+0x23/0x50
[<ffffffff8186281b>] entry_SYSCALL_64_fastpath+0x22/0xcb
[<ffffffffffffffff>] 0xffffffffffffffff

The kernel call stack shows that it is stuck on a flush call from FUSE, which can be interrupted immediately by resuming the JuiceFS client process cp to get it to exit.

Operations like close, which are data-safe, are not restartable and cannot be interrupted at will by SIGKILL, for example, until the FUSE implementation responds to the interrupt.

Therefore, as long as the JuiceFS client process can respond to interrupts in a healthy way, you don’t have to worry about your application accessing JuiceFS getting stuck. Alternatively, killing the JuiceFS client process can end the current mount point and interrupt all applications that are accessing the current mount point .