This article describes the proper way to vendor third-party libraries in Python libraries. I know the audience for this article is very narrow, and most Python developers don’t know or need to use this technique, but in the spirit of sharing, I’ll summarize it, and as the author of the software, you should respect the work of all other library authors.
WHAT - What is a vendor?
A vendor is a way of embedding third-party library code directly into software (in languages like C, Go, etc.). It differs from the way it is specified by a dependency file in that the code of the third-party library is included directly in the software and may or may not be kept as is, so you need to be aware of the various license restrictions, especially if the upstream library is under the GPL family of agreements, and the use of vendor software is subject to contagion.
WHY - When do I use vendor in Python?
As I said at the beginning, the scope is very narrow and there are three scenarios.
-
software features restrict it to be self-contained and zero-dependent. In the Python world, the library that uses vendor most heavily is
pip
, which we use every day. There are 25 dependencies inpip._vendor
.pip
is the current standard Python installer, so it can’t have any dependencies that would otherwise have to be installed in order to installpip
, and those dependencies can only be installed throughpip
, which is recursive. In addition to this, there are also basic build tools likesetuptools
. -
the software depends on a specific version of an upstream library. This also includes cases where the upstream library breaks change frequently, leading to API instability. If you simply specify
third-party-lib==1.0.0
in a dependency, it will cause a dependency conflict with software that also relies on this library and does not resolve the version. Switching to vendor removes this very strict dependency restriction. -
the software needs to make some changes to the upstream library, and due to the maintenance of the upstream library, these changes can not be merged into the upstream and released through PR and other means. In the case of open source agreement, you can embed the source code into the software through vendor and modify it by yourself.
In fact, for scenarios 2 and 3 above, you don’t have to be a vendor. In addition to vendor, you can also fork to your own git repository and introduce it using git dependencies or publish it as a new PyPI package. Just vendor is one of the easiest ways to do this.
- There is one more constraint: for Python, only pure Python libraries can be vendor.
HOW - How should I vendor?
A vendor is not a simple copy and paste solution, in my opinion, it has to pay attention to the following two points.
- vendor must comply with the open source protocol and put the protocol files in the vendor directory as well.
- when there are changes to the source code, you need to record the patch file, so that when the time is right, feedback back upstream.
So, a vendor is not a copy-and-paste, but a compromise to the status quo in an open source framework, and our ultimate goal is to eliminate vendors.
In Python, in addition to putting the vendor libraries in a directory under the code base (e.g. mypackage/vendor
), you need to modify all import statements to point to this directory. For example, change import requests
to from mypackage.vendor import requests
. The PDM also contains such a directory, and I use the same tool as pip
to manage vendors. This tool is vendoring and is very poorly documented (because nobody wants to use it). It contains the following functions.
- read a
requirements.txt
to download the dependencies to the specified directory - download the LICENSE files of all libraries into this directory
- read the patch file from a specified path and apply it to the source code
- rewrite all import statements to point to the vendor directory
- update the vendor version
The procedure is roughly the same as above. First create a mypackage/vendor
directory, create a vendors.txt
in it and fill in the dependencies (in requirements.txt
format).
Then in pyproject.toml
under the project root path, add the following.
|
|
Finally, run vendoring sync
and you’ll have the vendor all ready to go automatically.
For patch files, this is actually the output of git diff
, with which git can recreate the vendor directory from the source code. To generate the patch, 1.
-
run
vendoring sync
once after configuration and commit the file to the local repository (commit only, not push) -
modify the source code
-
run
git diff --patch <file_path> > <patches_dir>/<file_name>.patch
to save the patch file topatches_dir
. -
Review the patch file and revert any modified import statements to the original import statements, e.g.
from mypackage.vendor import requests
toimport requests
.As for why we should do this, because apply patch is rewritten before import, so the patch file should be filled with unrewritten import statements. Be careful not to change any whitespace characters when modifying, the patch file is sensitive to whitespace.
-
run
git add . && git commit --amend
to commit the changes -
Run
vendoring sync
again to verify that if everything works, there should be no changes, which means the vendor process is reproducible.