This article is a detailed explanation of Docker custom images, how to build your own Docker images, and the Dockerfile instructions.
I. Using Dockerfile to customize images
1.1, Dockerfile customization image
Customization of images is actually customizing the configuration and files added to each layer. If we can write a script for each layer to modify, install, build, and operate the commands, and use this script to build and customize the image, the problem of not being able to repeat, the problem of transparency of image construction, and the problem of volume will all be solved. This script is Dockerfile
.
A Dockerfile
is a text file that contains a set of instructions, each of which builds a layer, so the content of each instruction describes how that layer should be built.
Let’s take the nginx
image as an example, this time we use Dockerfile
to customize it.
In a blank directory, create a text file and name it Dockerfile
.
The contents are as follows.
The Dockerfile
is very simple, just two lines in total. It involves two directives, FROM
and RUN
.
1.2、FROM Specify the base image
The so-called custom image, that must be based on an image, on which to customize. Just like we ran a nginx
image of the container before, and then modify it, the base image must be specified. And FROM
is to specify the base image, so FROM
is a required directive in a Dockerfile
, and must be the first directive.
There are many high-quality official images on the Docker Store
, including service images that can be used directly, such as nginx
, redis
, mongo
, mysql
, httpd
, php
, tomcat
, etc. There are also images for developing, building, and running applications in various languages, such as node
, openjdk
, python
, ruby
, golang
and so on. It is possible to find a image among them that best matches our ultimate goal as the base image for customization.
If you do not find a image that corresponds to the service, the official images also provide some more basic operating system images, such as ubuntu
, debian
, centos
, fedora
, alpine
, etc. The software libraries of these operating systems provide us with a broader scope for expansion.
In addition to choosing existing images as the base image, Docker
also has a special image called scratch
. This image is a virtual concept and does not actually exist; it represents a blank image.
If you use
scratch
as the base image, that means you don’t base it on any image, and the next instructions written will exist as the first layer of the image to begin with.
It is not uncommon to copy executables directly into images without any system base, e.g. swarm
, coreos/etcd
. For statically compiled programs on Linux, there is no need to have runtime support from the operating system, and all the libraries needed are already in the executable, so directly FROM scratch
makes the image much smaller. Many applications developed in Go use this way to create images, which is one of the reasons why some people consider Go
to be a particularly suitable language for container microservices architectures.
1.3、RUN Execute command
The RUN
command is used to execute command line commands. Due to the power of the command line, the RUN
command is one of the most common commands used when customizing images. It comes in two formats.
-
shell format:
RUN <command>
, which is like a command typed directly from the command line. TheRUN
command inDockerfile
that I just wrote is in this format.1
RUN echo '<h1>Hello, Docker!</h1>' > /usr/share/nginx/html/index.html
-
exec format:
RUN ["executable", "argument1", "argument2"]
, which is more like the format used in function calls.
Since RUN
can execute commands just like Shell
scripts, can we have a RUN
for each command just like Shell
scripts? For example, like this.
|
|
As I said before, every command in Dockerfile
creates a layer, and RUN
is no exception. The behavior of each RUN
is the same as the process we just used to create the image manually: create a new layer, execute the commands on it, and after that, commit
the changes on that layer to form a new image.
The way it’s written above, it creates 7 layers of images. This is completely pointless, and a lot of things that are not needed at runtime are loaded into the image, such as compiled environments, updated packages, and so on. The result is a very bloated, multi-layered image that not only increases the time to build and deploy, but is also error-prone. This is a common mistake that many people who are new to Docker
make.
There is a maximum number of layers in Union FS
, such as AUFS
, which used to have a maximum of 42 layers and now has a maximum of 127 layers.
The correct way to write the above Dockerfile
would be as follows:
|
|
First, all the previous commands have only one purpose, to compile and install the Redis
executable. So there is no need to create many layers, this is just one layer. So, instead of using many RUN
pairs corresponding to different commands, there is just one RUN
command, and &&
to concatenate all the required commands. This simplifies the previous 7 layers to 1 layer. When writing a Dockerfile
, always remind yourself that you are not writing a Shell
script, but rather defining how each layer should be built.
And, there are line breaks for formatting purposes. Dockerfile
supports a command line feed with \
at the end of the line for Shell classes, and a comment format with #
at the beginning of the line. Good formatting, such as line breaks, indentation, comments, etc., will make maintenance and troubleshooting easier, which is a better habit.
Also, you can see the cleanup command added at the end of this set of commands, which removes the software needed in order to compile the build, cleans up all downloaded and expanded files, and also cleans up the apt
cache file. This is a very important step, as we said before, images are multi-layer storage, and things on each layer are not deleted at the next layer, they stay with the image. So when building the image, make sure that you only add what you really need to add at each layer, and that anything extraneous is cleaned up.
One of the reasons why many people who are new to Docker
create bloated images is that they forget to clean up extraneous files at the end of each build.
1.4. Building the image
Let’s go back to the Dockerfile
of the custom Nginx
image we made earlier. Now that we understand the contents of the Dockerfile
, let’s build the image.
Execute the following command in the directory where the Dockerfile
file is located.
|
|
From the output of the command, we can clearly see how the image was built. In Step 2
, as we said before, the RUN
command starts a container 9cdc27646c7b
, executes the requested command, and finally commits the layer 44aa4490ce2c
, and then deletes the used container 9cdc27646c7b
.
Here we used the docker build
command to build the image. The format is.
|
|
Here we specify the name of the final image -t nginx:v3
, and after a successful build, we can run this image as we did nginx:v2
before, and the result will be the same as nginx:v2
.
1.5, image build context (Context)
If you pay attention, you will see that the docker build
command ends with a .
, .
means the current directory, and Dockerfile
is in the current directory, so many beginners think that this path is specifying the path where Dockerfile
is located, which is actually inaccurate. If you look at the command format above, you will probably find that it is specifying context path. So what is context?
First we need to understand how docker build
works. Docker
is divided at runtime into the Docker
engine (also known as the server daemon) and the client tools. The Docker
engine provides a set of REST APIs, called the Docker Remote API
, and client tools like the docker
command interact with the Docker
engine through this set of API
s to perform various functions. So, although it seems that we are executing various docker
functions locally, in reality, everything is done on the server side (the Docker
engine) using remote calls. This C/S
design also makes it easy to manipulate the Docker
engine on the remote server.
When we build an image, not all customizations are done with the RUN
command, but often some local files are copied into the image, for example, with the COPY
command, the ADD
command, and so on. The docker build
command builds the image, not locally, but on the server side, i.e. in the Docker
engine. So in this client/server architecture, how can the server get the local files?
This introduces the concept of context. When building, the user specifies the path to the build image context, and the docker build
command learns this path, packages everything under it, and uploads it to the Docker
engine. Once the Docker
engine receives the context package, it expands it and gets all the files it needs to build the image.
If you write this in the Dockerfile
.
|
|
This is not a copy of package.json
in the directory where the docker build
command was executed, nor is it a copy of package.json
in the directory where Dockerfile
is located, but a copy of package.json
in the context directory.
Therefore, the paths to the source files in commands like COPY
are *relative paths. This is why beginners often ask why COPY ... /package.json /app
or COPY /opt/xxxx /app
does not work, because those paths are out of context and the Docker engine cannot get the files in those locations. If you really need those files, you should copy them to the context directory.
Now you can understand the command docker build -t nginx:v3 .
in this .
, you are actually specifying the context directory where the docker build
command will package the contents of that directory to the Docker engine to help build the image.
If we look at the docker build
output, we have actually seen this process of sending a context.
Understanding the build context is important for image building to avoid making mistakes you shouldn’t make. For example, some beginners find that COPY /opt/xxxx /app
doesn’t work, so they simply put Dockerfile
in the root of their hard drive to build it, only to find that docker build
executes and sends a few dozen GB
of stuff, which is extremely slow and prone to build failure. That’s because this approach is asking docker build
to pack the entire hard drive, which is clearly a misuse.
In general, you should put Dockerfile
in an empty directory, or in the root of the project. If there are no required files in that directory, then you should make a copy of the required files. If there are things in the directory that you really don’t want to pass to the Docker engine at build time, then you can write a .dockerignore
with the same syntax as .gitignore
, which is used to weed out files that don’t need to be passed to the Docker engine as context.
So why would anyone mistakenly think that .
is to specify the directory where the Dockerfile
is located? This is because by default, if you don’t specify Dockerfile
additionally, a file named Dockerfile
in the context directory will be used as the Dockerfile.
This is only the default behavior, in fact the filename of Dockerfile
is not required to be Dockerfile
, and it is not required to be located in the context directory, for example you can use -f . /Dockerfile.php
parameter to specify a file as a Dockerfile
.
Of course, it is customary to use the default filename Dockerfile
and to place it in the image build context directory.
1.6. Other uses of docker build
1.6.1. Building directly from the Git repo
docker build
also supports building from a URL
, for example, you can build directly from the Git repo
.
|
|
This command specifies the Git repo
required for the build, and specifies the default master
branch and the build directory as /8.14/
, then Docker will go to the git clone
project itself, switch to the specified branch, and go to the specified directory and start the build.
1.6.2. Build with the given tarball
|
|
If the URL given is not a Git repo
but a tar
archive, then the Docker
engine will download the archive, unpack it automatically, and use it as a context to start the build.
1.6.3. Reading a Dockerfile from standard input for a build
|
|
or
|
|
If the standard input is passed in as a text file, it is treated as a Dockerfile
and the build begins. This form has no context since it reads the contents of the Dockerfile
directly from the standard input, so it is not possible to do things like COPY
the local file into the image like other methods can.
1.6.4, read the contextual zip package from the standard input for construction
|
|
If the standard input file format is gzip
, bzip2
and xz
, it will be made a contextual archive, expand it directly, treat it as a context, and start building.
II. Dockerfile directives
We have already introduced FROM
, RUN
, and also mentioned COPY
, ADD
, in fact Dockerfile
is very powerful, it provides more than ten directives. Let’s continue to explain the other directives.
2.1, COPY
Format.
-
COPY <source path>... <target path>
-
COPY ["<source path1>",... "<target path>"]
Like the RUN
command, there are two formats, one similar to a command line and one similar to a function call.
The COPY
command copies files/directories from the <source path>
in the build context directory to the <target path>
location within the image of the new layer. For example.
|
|
<source path>
can be multiple, or even wildcards, with wildcard rules that satisfy Go
’s filepath.Match
rule, e.g.
<target path>
can be either an absolute path within the container or a relative path to the working directory (the working directory can be specified with the WORKDIR
command). The target path does not need to be created beforehand, if the directory does not exist, the missing directory will be created before copying the file.
It is also important to note that with the COPY
command, all metadata of the source file is preserved. For example, read, write, execute permissions, file change time, etc. This feature is useful for image customization. Especially if the build-related files are being managed using Git
.
2.2、ADD
The format and nature of the ADD
command is basically the same as that of COPY
. But it adds some features to COPY
.
For example, <source path>
can be a URL
, in which case the Docker engine will try to download the linked file to <destination path>
. The downloaded file permissions are automatically set to 600
, and if this is not the desired permissions, then an additional layer of RUN
will be added to adjust the permissions. So it makes more sense to just use the RUN
command and then use the wget
or curl
tool to download, handle permissions, unzip, and then clean up the useless files. Therefore, this feature is not really practical and is not recommended.
If <source>
is a tar
zip file in gzip
, bzip2
or xz
format, the ADD
command will automatically decompress the zip file to <destination>
.
This is useful in some cases, such as in the official image ubuntu
.
However, in some cases, if we really want to copy a zip file without unzipping it, we can’t use the ADD
command.
The official Dockerfile best practices document
asks to use COPY
whenever possible, because the semantics of COPY
are clear: it’s just copying a file, while ADD
contains more complex functionality and its behavior is not always clear. The most suitable situation for using ADD
is the one mentioned, where automatic decompression is required.
Also note that the ADD
command will invalidate the image build cache, which may make image builds slower.
Therefore, when choosing between the COPY
and ADD
directives, you can follow the principle of using the COPY
directive for all file copying, and using ADD
only when automatic decompression is required.
2.3, CMD
The format of the CMD
command is similar to that of RUN
, which is also in two formats.
shell
format:CMD <command>
exec
format:CMD ["executable", "parameter1", "parameter2"...]
- Parameter list format:
CMD ["parameter1", "parameter2"...]
. After specifying theENTRYPOINT
directive, specify the specific parameters withCMD
.
As we said before when introducing containers, Docker is not a virtual machine, containers are processes. Since it is a process, when you start the container, you need to specify the program and parameters to run. The CMD
command is used to specify the default container main process start command.
For example, the default CMD
for the ubuntu
image is /bin/bash
. If we run docker run -it ubuntu
, we will go directly to bash
. We can also specify another command to run at runtime, such as docker run -it ubuntu cat /etc/os-release
. This replaces the default /bin/bash
command with the cat /etc/os-release
command, which outputs the system version information.
In terms of command format, the exec
format is recommended. This format will be parsed as a JSON
array, so be sure to use double quotes "
instead of single quotes.
If you use the shell
format, the actual command will be wrapped as a sh -c
argument. For example.
|
|
In the actual implementation, this will be changed to
|
|
This is why we can use environment variables, because they are parsed by the shell.
Speaking of CMD
, we have to mention the issue of foreground and background execution of applications in containers. This is a common confusion for beginners.
Docker is not a virtual machine, applications in containers should be executed in the foreground, not like virtual machines or physical machines, using upstart/systemd
to start background services, there is no concept of background services in containers.
Some beginners write CMD
as.
|
|
Then I found that the container exited immediately after execution. Even inside the container to use the systemctl
command but found that it does not execute at all. This is because they do not understand the concept of foreground and background, do not distinguish the difference between containers and virtual machines, still in the traditional virtual machine perspective to understand the container.
For the container, its startup program is the container application process, the container is for the main process and exists, the main process exit, the container will lose the meaning of existence, and thus exit, other auxiliary processes are not something it needs to care about.
Using the service nginx start
command, you want upstart to start the nginx
service as a background daemon. And as I said earlier CMD service nginx start
will be interpreted as CMD ["sh", "-c", "service nginx start"]
, so the primary process is actually sh
. Then when the service nginx start
command finishes, sh
also finishes, and sh
exits as the master process, which naturally causes the container to exit.
The correct way to do this is to execute the nginx
executable directly and require it to be run as a foreground. For example.
|
|
2.4, ENTRYPOINT
The format of ENTRYPOINT
is the same as that of the RUN
command, which is divided into exec
format and shell
format.
The purpose of ENTRYPOINT
is the same as CMD
, it is to start the program and parameters in the specified container. ENTRYPOINT
can also be substituted at runtime, but is slightly more cumbersome than CMD
and needs to be specified via the -entrypoint
argument to docker run
.
When ENTRYPOINT
is specified, the meaning of CMD
changes and instead of running its command directly, the contents of CMD
are passed as an argument to the ENTRYPOINT
command, in other words, when actually executed, it becomes.
|
|
So why do we need ENTRYPOINT
after we have CMD
? Is there any benefit to this <ENTRYPOINT> "<CMD>"
? Let’s look at a few scenarios.
2.4.1, Scenario 1: Make the image look like a command
Suppose we need a image that knows our current public IP
, then we can start with CMD
to achieve.
If we use docker build -t myip .
to build the image, if we need to query the current public IP, we just need to run
So it looks like we can use the image as a command, but there are always parameters to the command, what if we want to add parameters? For example, as you can see from the CMD
above, the actual command is curl
, so if we want to display HTTP headers, we need to add the -i
argument. Can we just add the -i
argument to docker run myip
?
We can see the error that the executable file is not found, executable file not found
. As we said before, the image name is followed by command
, which replaces the default value of CMD
when run. So here -i
replaces the original CMD
, instead of being added after the original curl -s http://ip.cn
. And -i
is not a command at all, so naturally it is not found.
So if we want to add the -i
parameter, we have to retype the command in its entirety.
|
|
This is obviously not a very good solution, and using ENTRYPOINT
solves the problem. Now let’s reuse ENTRYPOINT
to implement this image.
This time let’s try it again directly with docker run myip -i
.
|
|
As you can see, it worked this time. This is because when ENTRYPOINT
exists, the contents of the CMD
will be passed as an argument to ENTRYPOINT
, and here -i
is the new CMD
, so it will be passed as an argument to curl
, thus achieving the desired effect.
2.4.2, Scenario 2: Preparations before running the application
Starting the container is to start the main process, but there are times when some preparatory work is needed before starting the main process.
For example, a mysql
class database may require some database configuration, initialization work that has to be solved before the final mysql server can be run.
In addition, you may want to avoid using the root
user to start the service to improve security, and you may need to perform some necessary preparation work as root
before starting the service, and then switch to the service user to start the service. In addition to the service, other commands can still be executed as root
to facilitate debugging, etc.
These preparations are not related to the container CMD
, no matter what CMD
is, a pre-processing work is needed beforehand. In this case, you can write a script and put it in ENTRYPOINT
, which will take the received parameters (i.e. <CMD>
) as commands and execute them at the end of the script. This is how it is done in the official image redis
, for example.
You can see that the Redis
user is created for the Redis
service, and the ENTRYPOINT
is specified at the end for the docker-entrypoint.sh
script.
The script is based on the contents of the CMD
, if it is redis-server
then switch to the redis
user identity to start the server, otherwise it will continue to use the root
identity. For example.
2.5, ENV
There are two formats.
ENV <key> <value>
ENV <key1>=<value1> <key2>=<value2>...
This directive is simple, it just sets the environment variables, either for other directives like RUN
or for runtime applications, you can use the environment variables defined here directly.
This example demonstrates how to break lines and enclose values containing spaces in double quotes, which is consistent with the behavior under Shell
.
Once an environment variable is defined, it can then be used in subsequent commands. For example, in the official node
image Dockerfile
, there is code like this.
|
|
The environment variable NODE_VERSION
is defined here first, and $NODE_VERSION
is used several times in the subsequent RUN
layer to customize the operation. As you can see, when you upgrade your image build in the future, you only need to update 7.2.0
, making Dockerfile
build maintenance much easier.
The following directives can support environment variable expansion: ADD
, COPY
, ENV
, EXPOSE
, LABEL
, USER
, WORKDIR
, VOLUME
, STOPSIGNAL
, ONBUILD
.
You can feel from this list of commands that environment variables can be used in many powerful places. Through environment variables, we can make one copy of Dockerfile
make more images, just by using different environment variables.
2.6. VOLUME
The format is.
-
VOLUME ["<path1>", "<path2>"...]
-
VOLUME <path>
As we said before, container runtime should try to keep the container storage layer free from write operations. For database applications that need to save dynamic data, their database files should be saved in a volume, and we will further introduce the concept of Docker volume in later sections. In order to prevent users from forgetting to mount the directory where dynamic files are stored as a volume at runtime, in Dockerfile
, we can specify some directories to be mounted as anonymous volumes in advance, so that if users do not specify the mount at runtime, their applications can also run normally without writing a lot of data to the container storage layer.
|
|
The /data
directory here is automatically mounted as an anonymous volume at runtime, and any information written to /data
is not recorded into the container storage layer, thus ensuring statelessness of the container storage layer. Of course, this mount setting can be overridden at runtime. For example.
|
|
In this line, the named volume mydata
is mounted to the /data
location, replacing the anonymous volume mount configuration defined in the Dockerfile
.
2.7. EXPOSE
The format is EXPOSE <port 1> [<port 2>...]
.
The EXPOSE
directive is a declaration that the container provides a service port at runtime. This is just a declaration, and the application will not turn on services on this port at runtime because of this declaration. Writing such a declaration in Dockerfile
has two advantages: one is to help image users understand the daemon port of the image service to facilitate configuration mapping; the other is that when random port mapping is used at runtime, i.e. docker run -P
, the port of EXPOSE
is automatically mapped randomly.
In addition, there is a special use in earlier versions of Docker
. Previously, all containers ran on the default bridge network, so all containers had direct access to each other, which had some security issues. So there is a -Docker
engine parameter -icc=false
, when specified, containers will not be able to access each other by default, unless they use the -links
parameter, and only the ports declared by EXPOSE
in the image will be accessible. The use of --icc=false
has been largely eliminated with the introduction of docker network
, and interconnection and isolation between containers can be easily achieved with a custom network.
It is important to distinguish -EXPOSE
from the use of -p <host port>:<container port>
at runtime. -p
, which maps the host port to the container port, in other words, exposes the container’s corresponding port service to the outside world, while EXPOSE
simply declares what port the container intends to use, and does not automatically map the port at the host.
2.8, WORKDIR
The format is WORKDIR <working directory path>
.
You can use the WORKDIR
command to specify the working directory (or called the current directory), and the current directory is changed to the specified directory at each subsequent level. If the directory does not exist, WORKDIR
will create it for you.
As mentioned before, some beginners often make the mistake of writing Dockerfile
as if it were a Shell
script, and this misunderstanding can also lead to errors like the following.
If you run this Dockerfile
as a build image, you will find that you cannot find the /app/world.txt
file, or its content is not hello
. The reason for this is simple: in the shell, two consecutive lines are in the same process execution environment, so the memory state modified by the previous command will directly affect the latter command; in Dockerfile
, the execution environment of the two RUN
commands is fundamentally different, and they are two completely different containers. This is a mistake caused by not understanding the concept of Dockerfile
building tiered storage.
As I said before, each RUN
starts a container, executes the command, and then commits the storage tier file changes. The execution of RUN cd /app
in the first tier is just a change in the working directory of the current process, a memory change, which does not result in any file changes. When you get to the second tier, you start a brand new container, which has nothing to do with the first tier container, so it is not possible to inherit the memory changes from the previous tier build process.
So if you need to change the location of the working directory of each subsequent layer, then you should use the WORKDIR
command.