Registry
Docker Distribution
Docker Distribution is the first tool that implements packaging, distribution, storage and image distribution, acting as a Docker registry. (Currently Distribution has been donated to CNCF). The spec specification in Docker Distribution has since become the OCI distribution-spec specification. It can be assumed that Docker Distribution implements most of the OCI image distribution specifications, and that the two are largely compatible. OCI’s guiding philosophy is to have industry practices first and then summarize those practices into technical specifications, so while OCI’s distribution-spec specification has not yet been officially released (the current version is v1.0.0-rc1), the Docker Distribution-based image repository has become a commonly adopted solution. The Docker registry HTTP API V2 has become the de facto standard.
Harbor
Harbor also uses Docker Distribution (docker registry) as the back-end image storage service. In versions prior to Harbor 2.0, most of the image related functions were handled by Docker Distribution, and metadata for images and OCI artifacts were extracted from the docker registry by the harbor component. After Harbor 2.0, metadata related to mirrors and OCI artifacts is maintained by Harbor itself, and metadata is written to harbor’s database when PUSHing these artifacts. Thanks to this, Harbor is no longer just a service for storing and managing mirrors, but a cloud-native repository service that can store and manage a wide range of OCI-compliant Artifacts such as Helm Chart, CNAB, OPA Bundle, etc.
docker registry to harbor
Well, after all these useless concepts, let’s get back to the problem we are trying to solve in this article: How to migrate images from docker registry to harbor?
Suppose there are two machines in the intranet environment, one machine is running docker registry with the domain name registry.k8s.li. The other machine is running harbor, assuming the domain name is harbor.k8s.li. docker registry now holds 5,000 images. harbor has just been deployed and there are no images in it yet. How can I efficiently migrate the images in the docker registry to harbor without disk and network limitations?
Get a list of all mirrors in the registry
First of all, before migration we have to pull the list and get a list of images in the docker registry, so that we can ensure that no images are lost after migration. In the registry storage directory, the tag of each mirror is pointed to by the current/index
file, so we can get the tags of all mirrors by traversing the current/index
file in the registry storage directory, and thus get the list of all mirrors in the registry. Note that we can only get the mirrors with tags, but not the other mirrors without tags.
A list of mirrors can be obtained from the registry storage directory with the following command.
harbor create project
For new harbor deployments, there will only be a project with a default library on it, so you need to manually create the corresponding project in the docker registry on the harbor. repositories` in the registry storage directory.
Once we have the list of mirrors and the corresponding project created on harbor, we are ready to do the official migration. Depending on the scenario, the following options can be used.
Option 1: docker retag
Option 1 is probably the first way most people think of, and it’s also the easiest and most brutal way. It is to use docker to pull all the images in the docker registry on one machine, then docker retag, and then docker push to the harbor.
This solution is a bit silly, because the docker pull -> docker tag -> docker pull process decompresses the image’s layer. For just copying images from one registry to another, these dockers are doing a lot of useless work in these processes. We won’t go into the details here.
So, in order to pursue efficiency, we will not use docker retag such a stupid way, so we will talk about option 2.
Option 2: skopeo
You can use skopeo copy to copy image raw blobs directly from one registry to another registry without involving image layer decompression during the process. As for performance and time consumption, it is much better than using docker 😂.
- Use skopeo copy
|
|
- Using skopeo sync
|
|
But is there a better way? You know that both docker and skopeo are essentially downloading and uploading images through the registry’s HTTP API, and there are still a lot of HTTP requests in the process. So is there a better way?
Option 3: Migrate the storage directory
As mentioned at the beginning of the article, harbor’s back-end image storage also uses the docker registry. For a registry, as long as it uses Docker Distribution V2, its back-end storage directory structure looks exactly the same. Then why not copy the registry storage directory and extract it to the harbor registry storage directory? This way you can make sure that all the images are migrated and no one is left behind.
For harbor 1.x, migrate the docker registry storage directly to harbor’s registry storage, delete harbor’s redis data (because harbor’s redis caches the image’s metadata information), restart harbor, and you’re done. After restarting harbor, harbor will call the back-end registry to extract the mirror’s metadata information and store it in redis. This completes the migration.
Back up the registry storage directory on the docker registry machine
After the backup is complete, scp the docker.tar to the harbor machine and restore the registry storage directory on the harbor machine
After this migration, you may encounter the problem of not being able to push images to harbor. Because the registry storage directory in the docker registry container belongs to root and the registry storage directory in the harbor registry container belongs to 10000:10000, the permissions are not the same, so harbor cannot push the image. Therefore, you need to change the ownership and group of the harbor registry directory after the migration is completed.
Option 4
For harbor 2.x, because harbor has enhanced the metadata management capabilities of Artifacts, that is, metadata is written to harbor’s own database when it is pushed or synced to harbor. In harbor’s view, as long as there is no manifest information for the Artifact or layer in the database, harbor will assume that the Artifact or layer does not exist and return a 404 error. The direct method of extracting the docker registry storage directory to harbor’s registry storage directory does not work according to option 3. Since the image is extracted to the registry storage, even though the image appears to be there in the harbor registry container, harbor will think there is no image because there is no image in the harbor database. So now it seems that we can only use skopeo to push the mirrors to harbor one by one by option 2.
But for some specific scenarios, you can’t have a docker registry HTTP service like in Scenario 2, but only a docker registry zip, so how do you migrate the mirrors from the docker registry storage directory to harbor 2.0?
The mirror formats
supported by skopeo are as follows.
IMAGE NAMES | example |
---|---|
containers-storage: | containers-storage: |
dir: | dir:/PATH |
docker:// | docker://k8s.gcr.io/kube-apiserver:v1.17.5 |
docker-daemon: | docker-daemon:alpine:latest |
docker-archive: | docker-archive:alpine.tar (docker save) |
oci: | oci:alpine:latest |
For example, docker://
is a registry; docker-daemon:
is a local docker pull; and docker- archive
is the image saved by docker; and dir:
is the image saved as a folder. The same image has these ways of existence, just like water has gas, liquid, and solid. You can understand it this way, they all represent the same image, but in different ways.
Since the image is stored in the registry storage directory, using the dir format to read the image directly from the filesystem is theoretically better than option 2. Although skopeo supports mirrors in dir format, skopeo does not currently support direct use of the registry storage directory, so you still need to find a way to convert each image in the docker registry storage directory into a skopeo dir format.
skopeo dir
So let’s take a look at what skopeo dir looks like.
To test the feasibility of the solution, first pull an image from the docker hub and save it as a dir using the skopeo command as follows.
|
|
Use the tree command to look at the directory structure of the alpine folder, as follows.
|
|
From the file name and size as well as the introspection of the file, we can tell that the manifest file corresponds to the manifests file of the image; the file of type ASCII text
is the image config file of the image, which contains the metadata information of the image. The other gzip compressed data
file is the image layer that has been compressed by gzip. A look at the contents of the manifest file also reaffirms this conclusion.
- The config field of the image corresponds to exactly e50c909a8df2, and the file type is exactly
image.v1+json
text file. - The layer field of the image corresponds to exactly 4c0d98bf9879 and the file type is exactly
.tar.gzip
gzip compressed file.
|
|
Retrieve the image from the registry storage directory
Now comes the better part of this article. How to get the image out of the registry storage and into the dir format supported by skopeo.
- The first thing to do is to get the manifests file of the image, from which you can get all the blob files of the image. For example, for the
library/alpine:latest
image in the registry storage directory, it is stored in the registry like this.
|
|
- get the sha256 value of the manifests file of the alpine mirror lasts tag from the
repositories/library/alpine/_manifests/tags/latest/current/link
file, and then go to blobs to find the manifests file of the mirror;
- Find the corresponding file in the blobs directory according to the sha256 value in the
current/link
file, the corresponding manifests file in the blobs directory is blobs/sha256/39/ 39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01/data;
|
|
- Using regular matching, all sha256 values in the manifests file are filtered out, and these sha256 values correspond to the image config file and the image layer file in the blobs directory;
|
|
- Based on the manifests file, you can get all the layer and image config files of the image in the blobs directory, and then put these files together into a dir format, where the image is copied from the registry storage directory using the cp method, as follows.
|
|
The final image format obtained is as follows.
Compare with the dir folder copied from skopeo above, everything is exactly the same except for an insignificant version file.
- To optimize this, change the cp operation in step 4 to a hard link operation, which will greatly reduce the IO operations on the disk. Note that hard-linked files cannot span partitions, so they must be in the same partition as the registry storage directory.
|
|
Then use skopeo copy or skopeo sync to push the retrieved image to harbor
- Use skopeo copy
- Using skopeo sync
Note that the skopeo sync method synchronizes the project level, and the name and tag of the image correspond to the name of the directory
Shell Script
|
|
In fact, it is possible to seamlessly support registry storage directories with some magic changes to skopeo’s source code, which is currently under study 😃.
Contrast
Option | Scope of application | Disadvantages | |
---|---|---|---|
1 | docker retag | Synchronizing mirrors between two registries | |
2 | skopeo | Synchronizing mirrors between two registries | |
3 | Decompression directory | registry stores the directory to another registry | harbor 1.x |
4 | skopeo dir | registry stores the directory to another registry | Applicable to harbor 2.x |
Compare and summarize the above options.
- Scheme 1: low start-up cost, applicable to the case where the number of mirrors is relatively small and there is no need to install skopeo, with the disadvantage of poor performance.
- Option 1: For synchronous copy of mirrors between two registries, such as copying some public mirrors in docker hub to the company’s intranet mirror repository.
- Option 3: It is suitable for migration between mirror repositories, and the performance is the best among all the options, but it should be noted that if the destination mirror repository is harbor 2.x, it is not possible to use this method.
- Option 4: is a compromise version of Option 3, in order to adapt to harbor 2.0, because you need to push the mirror to harbor again, so the performance is worse than Option 3.