Preface
I’m writing this article because I’ve recently seen a few questions in the Python community that are very frequently asked.
- I installed pip, but why does running it report that the executable is not found?
- Why does import module throw a
ModuleNotFound
exception? - Why can I run it with Pycharm but not with cmd?
To solve these kinds of problems, you need to know how Python finds packages. I hope reading this article will help.
How Python Finds Packages
Nowadays, it’s likely that everyone has more than just one Python on their computer, and more virtual environments, leading to package installations where you accidentally forget to pay attention to the path to the installation package. First let’s tackle the problem of finding packages, which is a simple question to answer, but one that many people don’t know how it works. If the path to your Python interpreter is $path_prefix/bin/python
, then when you start your Python interactive environment or run a script with this interpreter, it will look for the following locations by default.
$path_prefix/lib
(the standard library path)$path_prefix/lib/pythonX.Y/site-packages
(Third-party library paths, X.Y is the major and minor version number of Python, e.g. 3.7, 2.6)- the current working directory (the result of the
pwd
command)
Here $path_prefix
is /usr
if you’re using the default Python on Linux, or $path_prefix
is /usr/local
if you’re compiling with the default options yourself. If you upgrade Python from 3.6 to 3.7, you won’t be able to use any of the previously installed triple libraries. Of course, you can copy the entire folder, and you’ll be fine in most cases.
A few useful functions
sys.executable
: the path to the Python interpreter currently in usesys.path
: a list of search paths for the current packagesys.prefix
: the currently used$path_prefix
Example:
|
|
Adding search paths using environment variables
If your package’s path does not exist in the search path list above, you can add it to the PYTHONPATH
environment variable, separating multiple paths with :
(Windows uses ;
).
Note that you should avoid adding paths to different Python packages to PYTHONPATH
, such as PYTHONPATH=/home/frostming/.local/lib/python2.7/site-packages
, because the paths in PYTHONPATH
take precedence over the default search path, and there are compatibility issues if you use Python 3. In fact, it’s best not to have any paths with site-packages
in PYTHONPATH
.
By the way, PATH
is the search path used to find executable programs. If you run the command my_cmd
in the terminal, the system will scan the paths in PATH
in turn to see if my_cmd
exists in that path, so if it says that the program is not found or the command is not recognized, then you need to see if the path is added to PATH
.
How Python installs packages
Nowadays, you basically use pip
to install Python packages, even if you use pipenv
, poetry
, the underlying pip
still applies. If you don’t have pip
installed please refer to here, if you have it installed and still can’t use the pip
command please refer to the previous section.
There are two ways to run pip
.
pip ...
python -m pip ...
The first way is similar to the second way, except that the Python interpreter used in the first way is written in the shebang of the pip
file, and in general, if your pip
path is $path_prefix/bin/pip
, then the Python path is $path_prefix/bin /python
. If you’re using a Unix system, the first line of cat $(which pip)
contains the path to the Python interpreter. The second way explicitly specifies the location of Python. This rule holds true for all Python executables. The flow is shown in the following figure.
Then, without any customization, the package will be automatically installed under $path_prefix/lib/pythonX.Y/site-packages
using pip
($path_prefix
is from the previous paragraph), and the executable will be installed under $path_prefix/bin
, if If you need to run it directly from the command line using my_cmd
, remember to add it to PATH
.
Options for changing the installation location in pip
-prefix PATH
, replace$path_prefix
with the given value--root ROOT_PATH
, addROOT_PATH
before$path_prefix
, for example--root /home/frostming
,$path_prefix
will change from/usr
to/home/frostming/usr
-target TARGET
to specify the installation location directly toTARGET
Virtual Environments
Virtual environments are designed to isolate packages that depend on different projects and install them under different paths to prevent dependency conflicts. Once you understand the mechanics of how Python installs packages, it’s easy to understand how virtual environments (the virtualenv
, venv
modules) work. In fact, running virtualenv myenv
copies a new Python interpreter to myenv/bin
and creates the myenv/lib
and myenv/lib/pythonX.Y/site-packages
directories (the venv
module is not copied, but the result is basically the same). . Executing source myenv/bin/activate
will then stuff myenv/bin
in front of PATH
, giving the copied Python interpreter the highest priority for searching. This way, when installing packages later, $path_prefix
will be myenv
, thus isolating the installation path.
The effect of script running style on search paths
As you know from the above introduction, the most direct reason for Python to find a package or not is sys.path
, and the further reason is the path of sys.executable
. Once a program is written, we have to run it, and different methods of running it may affect sys.path
and cause different behavior, so let’s discuss this issue.
Suppose you have the following package structure.
The contents of main.py.
|
|
The contents of the b.py file are simple.
Now execute the following command in a directory at the same level as main.py
.
The way python xxx.py
is run is called run directly, where the __name__
value in the file is specified as __main__
, which is the way the Run File
Run Script
in the IDE is used. You can see that the first value of sys.path
is the directory where the script file is located, which varies with the script path, remembering that we always run our tests in the /home/frostming/test_path
directory.
Okay, so if we need to import a.py
into b.py
, which contains the simple line print("I'm a")
, what should we write in the b.py
script?
-
Easy!,
import a
, well, execute the above test againThe first test is wrong. If you’ve read the previous section, this error is to be expected -
sys.path
doesn’t havea.py
in the directory/home/frostming/test_path/my_package
, so of coursea
can’t be found. -
Change to
from my_package import a
. The test is not done, because based on the same analysis, we can predict that the first one will run fine while the second one will report an error thatmy_package
cannot be found. Note that sinceb
is in themy_package
package, you can use relative import and writefrom . import a
andfrom my_package import a
will have the same effect.
So is there a way for me to make both runs without reporting an error? There is. We need to realize that there are a limited number of entry points in a project, and there won’t actually be executable code that is both at the top level and in a subdirectory. We should put the main runtime logic in main.py
(not necessarily that name, e.g. Django project is manage.py
), and if we do need to run the code of a script in a subdirectory at this point, we should use python -m <module_name>
, and the statement to import a
in b.py
should be from my_package import a
, and let’s see how it works.
You can see that the contents of sys.path
are the same for both runs, and its first value is the directory where it is currently running. This is called running as a module, and the argument after python -m
is (separated by .
) module name instead of pathname. Because of this uniformity, all your imports in your project can be defined in the same way, regardless of which script they are in. This is why the official Django documentation recommends import names of the form myapp.models.users
for all imports.
In addition, when running as a module, each level of parent module (or package) passed in the module name will run as a module, which means you can use relative imports in modules (not when running directly) and the value of __name__
in the passed-in module will be set to __main__
and you can still apply if __name__ == " __main__":
. If the module passed in python -m <module_name>
is a package, then the __main__.py
script in the package directory (if it exists) will be executed, and the __name__
value of that script will be __main__
.
Summary
As you can see here, the most important thing about package path search is the $path_prefix
path prefix, which is derived from the path of the Python interpreter used. So to find the package path, you just need to know the path to the interpreter, and if you encounter a change in the package path, you just need to specify the Python interpreter you want with the correct PATH
setting.
Now back to the three questions at the beginning, can everyone solve this kind of problem?