Namespace is a feature provided by the Linux kernel that wraps some system resources into an abstract space and makes the processes in that space think that these resources are the only resources available in the system. It isolates processes and resources from the host system and other containers.
There are many types of namespace depending on the system resources they operate on, such as cgroup namespace, mount namespace, etc. We will just take pid namespace as an example and use runC
as the container runtime implementation to demonstrate how namespace works when we perform operations on the container .
As we described in the previous article, most container systems use runC
as the underlying runtime implementation, and if you are using docker
on a Linux distribution, you don’t even need to install it specifically to use the runc
command.
Preparation
filesystem bundle
runC
can only execute containers from a filesystem bundle
(a filesystem bundle
is, as the name implies, a folder that satisfies a specific structure), but we can use docker
to prepare an available bundle
.
At this point, the entire bundle
directory structure is as follows.
System monitoring tools
To complete the demo, we need some third-party system monitoring tools as an aid.
-
monitor the process startup to get the PID of the running process in the container, such as
forkstat
in ubuntu, which can monitor system calls likefork()
,exec()
andexit()
in real time, installed as follows.1
$ apt install forkstat
-
View namespace information, such as cinf, which is a command line tool that can easily list all namespaces on the system or view detailed information about a namespce, is installed as follows.
Running containers with runc
First we need to run forkstat
in a window.
|
|
Then create a new terminal window, switch to the /mycontainer
directory, and use runC
to run the container.
|
|
When executed, it will go directly to the newly created container and run the ps
command.
The forkstat
window will have the following output.
As you can tell from the synchronous printout, the sh
or ps
output by ps
and forkstat
are actually the same process, but since the processes in the container are in a separate pid namespace, they have separate PIDs in the container, and they think they are the only processes in the container, so the PIDs will start at 1.
Find the namespace the process belongs to
Now to find the pid namespace used by the container, you need to adjust the output format of the ps
command for this purpose.
PIDNS is the pid namespace, the above command can get sh
process with PID 33052 belongs to the pid namespace 4026532395. Since we already have the PID of the process in the container, we can actually get all the namespace of the process through the /proc
file system of the host.
|
|
The printout shows the namespace to which a process belongs.
- Each namespace is a soft link, and the name of the soft link indicates the type of namespace, e.g. cgroup for cgroup namespace, pid for pid namespace.
- Each softlink points to the real namespace object to which the process belongs, which is represented by an
inode
number, and eachinode
number is unique in the host system. - If two processes have softlinks of the same type of namespace pointing to the same
inode
, they belong to the same namespace.
Virtually all processes will belong to at least one namespace, and the Linux system creates a default namespace for all types of processes at boot time.
We can also try to get the namespace that sh
belongs to within the container, which requires the PID 1 within the container.
|
|
Watching processes in namespace
We will now look at all the processes in the pid namespace from the namespace’s point of view, which is not provided by the Linux system, so you will need to use the cinf
tool installed above.
|
|
Currently there is only one process in this namespace, and this process is also the init
process of the container we are creating. When a new container is created, some new namespaces will be created and the container’s init
process will be added to these namespaces.
For pid namespace, all processes running in the container can only see other processes in the same pid namespace, pid:[4026532395]
. The sh
process is considered to be the first process running on the system in the container with a PID of 1, but in the host it is just a normal process with a PID of 33052, and the same process has different PIDs in different namespaces, which is the role of the pid namespace. In a way, a container means a new set of namespaces.
Create a new process in a container
Create a new terminal window to run a new process in an already running container.
|
|
From the forkstat
window, we can see the PID of the newly created process.
There is actually a more direct way to see the processes running in the container from the host, we can use the ps
subcommand provided by runC
.
Next, you still use cinf
to find out which namespace the newly created process belongs to.
From the result, no new namespace is created, the namespace of the 32608 process is exactly the same as the namespace to which the init
process-sh
of the mybox container belongs. That is, creating a new process in the container simply adds that process to the namespace of the container’s init
process.
Here is a list of all the processes owned by the 4026532395 namespace.
|
|
If we run ps -ef
inside the container, we can also see these processes, their PIDs will be different due to the pid namespace.
Now we know that docker/runc exec
is actually running a new process in the namespace of the created container.
Summary
When you run a container, new namespaces are created and the init
process is added to those namespaces; when you run a new process in a container, the new process is added to the namespace created when the container was created.
In fact, the behavior of creating new namespaces when creating a container can be changed, we can specify that the new container uses the existing namespace.