Although we often think of Redis as a purely in-memory key-value storage system, we also use its persistence features, and RDB and AOF are two of the persistence tools Redis provides us, with RDB being a snapshot of Redis data.
In this article, we want to analyze why Redis needs to use subprocesses when persisting snapshots of data, rather than exporting the in-memory data structures directly to disk for storage.
Overview
Before analyzing today’s problem, we first need to understand what Redis’ persistent storage mechanism, RDB, is. RDB takes snapshots of the current data set in the Redis service every once in a while, and in addition to the Redis configuration file, which can be set for the snapshot interval, the Redis client also provides two commands to generate RDB storage files, SAVE
and BGSAVE
, and we can guess the difference between these two commands by their names.
The SAVE
command blocks the current thread when executed, and since Redis is single-threaded, the SAVE
command blocks all other requests from the client, which is unacceptable for many This is often unacceptable for Redis services that need to provide strong availability guarantees.
When we use the BGSAVE
command, Redis will immediately fork
a child process, which will perform the process of saving the data in memory to disk in RDB format
, while the Redis service can still handle requests from the client during the BGSAVE
process.
rdbSaveBackground
is used to handle the process of saving data to disk in the function that saves data to disk in the background:
|
|
The Redis server will call the redisFork
function when BGSAVE
is triggered to create a child process and call rdbSave
to persist the data in the child process, we have omitted some of the contents of the function here, but the overall structure is still very clear, interested readers can click on the above link to understand the entire function implementation.
The purpose of using fork
must ultimately be to improve the availability of the Redis service without blocking the main process, but here we can actually find two problems:
- why is the child process after
fork
able to access the data in the parent process’ memory? - Does the
fork
function introduce additional performance overhead, and how can we avoid it?
Since Redis has chosen to use fork
to solve the snapshot persistence problem, these two questions have already been answered. First, the child process after fork
can access the data in the parent’s memory, and the additional performance overhead of fork
must be acceptable compared to blocking the main thread, which is the only way Redis will eventually choose this solution.
Design
In order to analyze the two issues raised in the previous section, we need to understand here the following, which are the prerequisites for the Redis server to use the fork
function and which ultimately motivate it to choose this implementation.
- the parent and child processes spawned by
fork
will share resources, including memory space. - the
fork
function does not incur a significant performance overhead, especially for making large copies of memory, and it defers the work of copying memory until it is really needed through write-time copying.
Subprocesses
In the field of computer programming, especially in Unix and Unix-like systems, fork
is an operation used by a process to create a copy of itself. It is often a system call implemented by the operating system kernel and is the main method used by the operating system to create new processes in *nix systems.
Once the program calls the fork
method, we can use the return value of fork
to determine the parent and child processes and thus perform different actions.
- When the
fork
function returns 0, it means that the current process is a child process. - When the
fork
function returns non-zero, it means that the current process is the parent process and the return value is thepid
of the child process.
In the manual of fork
, we find that the parent and child processes after calling fork
will run in different memory spaces, and when fork
happens both memory spaces have exactly the same content, and the memory writing and When fork
occurs, the memory spaces of both processes have exactly the same contents, and writes and modifications to memory and file mapping are independent, and the two processes do not affect each other.
The child process and the parent process run in separate memory spaces. At the time of fork() both memory spaces have the same content. Memory writes, file mappings (mmap(2)), and unmappings (munmap(2)) performed by one of the processes do not affect other.
In addition, the child process is an almost exact duplicate of the parent process, but the two processes differ to a lesser extent in the following ways.
- The child process uses a separate and unique process ID.
- The parent process ID of the child process is identical to the parent process ID.
- The child process does not inherit the memory locks of the parent process.
- The child process resets the process resource utilization and CPU timer.
- …
The key point is that the memory of the parent and child processes is identical at the time of fork
, and writes and modifications after fork
will not affect each other, which in fact solves the problem of the snapshot scenario perfectly – only the data in memory at a certain point in time is needed, and the parent process can continue to make changes to its own memory without blocking or affecting the generated snapshot.
Copy-on-write
Since the parent and child processes have exactly the same memory space and neither writes to memory, does this mean that the child process needs to make a full copy of the parent’s memory when forking
? Assuming that the child process needs to make a copy of the parent’s memory is basically catastrophic for the Redis service, especially in the following two scenarios.
- a large amount of data is stored in memory, and copying the memory space during
fork
consumes a lot of time and resources, which can cause the program to be unavailable for a while. - Redis takes up 10G of memory, while the resource limit of a physical or virtual machine is only 16G, at which point we cannot persist the data in Redis, which means that Redis cannot utilize more than 50% of the maximum memory resources on the machine.
If you can’t solve the two problems above, using fork
to generate a memory image doesn’t really get off the ground and is not a method that can really be used in a project.
Suppose we need to execute a command at the command line, we need to create a new process via
fork
and then execute it viaexec
. The large amount of memory space copied byfork
may be of no use at all to the child process, but it introduces a huge additional overhead.
Copy-on-Write was introduced to solve this problem, and as we described at the beginning of this section, the main purpose of Copy-on-Write is to delay copying until the write operation actually occurs, which avoids a lot of pointless copy operations. On some early *nix systems, the system call fork
did immediately make a copy of the parent process’ memory space, but on most systems today, fork
does not immediately trigger this process.
At the time of the fork
function call, the parent and child processes are allocated by the Kernel to different virtual memory spaces, so it appears to the two processes that they are accessing different memory: * When actually accessing the virtual memory space, the Kernel maps the virtual memory to physical memory, so the parent and child processes share the physical memory space.
- When actually accessing the virtual memory space, the Kernel maps the virtual memory to physical memory, so the parent and child processes share the physical memory space.
- The shared memory is only ** copied on a page-by-page basis** when the parent or child process makes changes to the shared memory, and the parent process keeps the original physical space while the child process uses the new physical space after the copy.
For most Redis services or databases, write requests are often much smaller than read requests, so using fork
with the copy-on-write mechanism can bring very good performance and make the implementation of BGSAVE
very easy.
Summary
The way Redis implements the background snapshot is very clever, through the fork
and copy-on-write feature provided by the operating system, this feature is easily implemented, from here we can see that the author’s knowledge of the operating system is still very solid, most people in the face of similar scenarios, the method may be to manually implement a similar copy-on-write
feature, but this not only increases the workload, but also increases the possibility of program problems.
Let’s briefly summarize why Redis implements snapshots by means of subprocesses when using RDB:
- the child process created by
fork
can get exactly the same memory space as the parent process, and the memory changes made by the parent process are not visible to the child process, so they do not affect each other. - the creation of a child process by
fork
does not immediately trigger a large number of memory copies, and memory is copied on a page-by-page basis when it is modified, which avoids the performance problems caused by a large number of memory copies.
One of these two reasons, one to support child process access to the parent process and the other to reduce additional overhead, are the reasons why Redis uses child processes for snapshot persistence. To conclude, let’s look at some more open-ended related issues, and the interested reader can ponder the following questions.
- What other services use this feature when Nginx’s main process
forks
a set of subprocesses at runtime that can handle requests separately? - Write-time copy is actually a relatively common mechanism, where else would it be used outside of Redis?