For Python based file, directory and path operations, we generally use the os.path
module.
pathlib
is its replacement, which is a wrapper around os.path
, objectizing paths, making the api more general, easier to use, and more in line with programming habits.
The pathlib module provides a number of classes that use semantic representation of file system paths, which are suitable for a variety of operating systems. The path classes are divided into pure paths (which provide pure computational operations without I/O), and concrete paths (which are inherited from pure paths but provide I/O operations).
Let’s first look at the organization of the pathlib module, whose core is composed of 6 classes, the base class of which is the PurePath class, from which the other 5 classes are derived.
The arrows connect two classes with inheritance relationship, take PurePosixPath and PurePath class as an example, PurePosixPath inherits from PurePath, that is, the former is a subclass of the latter.
- PurePath class: treats a path as an ordinary string, and it can stitch multiple specified strings into the path format applicable to the current operating system, and also can determine whether any two paths are equal. From the English name, Pure means pure, which means that PurePath class is purely concerned with the operation of paths, regardless of the reality of whether the paths are valid in the real file system, whether the files exist, whether the directories exist, etc.
- PurePosixPath and PureWindowsPath are subclasses of PurePath, the former is used to manipulate paths for UNIX (including Mac OS X) style operating systems, and the latter is used to manipulate paths for Windows operating systems. We all know that there are some differences in path separators between the two styles of operating systems.
- The Path class differs from the above 3 classes in that it manipulates paths along with files/directories and interacts with the real file system, for example to determine if the path is real or not.
- PosixPath and WindowsPath are subclasses of Path and are used to manipulate Unix (Mac OS X) style paths and Windows style paths respectively.
The three pure path classes PurePath, PurePosixPath and PureWindowsPath are often used in special cases, such as
- If you need to manipulate a Windows path in a Unix device, or a Unix path in a Windiws device. Because we can’t instantiate a real Windows path on Unix, but we can instantiate a pure Windows path and pretend we are manipulating windows.
- You want to make sure that your code only manipulates the path and does not interact with the OS for real.
The format of paths is completely different between UNIX type operating systems and Windows operating systems, the main difference is the root path and path separator, the UNIX system root path is a slash (
/
), while the Windows system root path is the disk character (C:
); UNIX system paths use the separator is a forward slash (/
), while Windows uses a backslash (\
).
1. PurePath Class
The PurePath class (as well as the PurePosixPath class and the PureWindowsPath class) provide a number of constructors, instance methods, and class instance properties for us to use.
The PurePath class is automatically adapted to the operating system when it is instantiated. If you are on a UNIX or Mac OS X system, the constructor method actually returns a PurePosixPath object; conversely, if you are using PurePath to create an instance on a Windows system, the constructor method returns a PureWindowsPath object.
For example, the following statement is executed on a Windows system.
PurePath also supports passing in multiple path strings when creating objects, and they will be stitched together into a single path. Example.
As you can see, the output is a path in Windows platform format because the runtime environment is a Windows wipe o do system.
If you want to create UNIX-style paths in Windows, you need to specify the use of the PurePosixPath class, and vice versa. Example.
Emphasis: When you do pure path operations, you are actually operating on strings, which are not actually associated with the local file system and do not do any disk IO operations. paths constructed by PurePath are essentially strings, and can be converted to strings using str()
.
In addition, if you use the constructor of the PurePath class without passing any string parameters, it is equivalent to passing the point .
(the current path) as an argument.
If multiple parameters passed into the PurePath constructor contain multiple root paths, only the last root path and subsequent subpaths will take effect. Example.
As an extra reminder, when constructing strings in Python, be sure to pay attention to the difference between forward/backward slashes when escaping and not escaping. and the use and non-use of r-native strings. Don’t ever write it wrong!
If the argument passed to the PurePath constructor contains an extra slash or .
will be ignored outright, but ..
will not be ignored.
PurePath instances support comparison operators, which can determine equality and compare size for paths of the same style (in effect, comparing the size of strings); for paths of different styles, they can only determine equality (obviously, it is impossible to be equal), but cannot compare size.
The following is a list of methods and properties commonly used by PurePath instances.
Instance properties and methods | Function Description |
---|---|
PurePath.parts | Returns the sections contained in the path string. |
PurePath.drive | Returns the drive letter in the path string. |
PurePath.root | Return the root path in the path string. |
PurePath.anchor | Returns the disk character and root path in the path string. |
PurePath.parents | Return all the parent paths of the current path. |
PurPath.parent | Returns the previous level of the current path, equivalent to the return value of parents[0]. |
PurePath.name | Returns the name of the file in the current path. |
PurePath.suffixes | Returns all suffixes of the files in the current path. |
PurePath.suffix | Returns the suffix name of the file in the current path. That is, the last element of the suffixes property list. |
PurePath.stem | Return the name of the master file in the current path. |
PurePath.as_posix() | Converts the current path to a UNIX-style path. |
PurePath.as_uri() | Only absolute paths can be converted, otherwise a ValueError will be raised. |
PurePath.is_absolute() | Determine if the current path is an absolute path. |
PurePath.joinpath(*other) | Linking multiple paths together works similarly to the slash (/) linker described earlier. |
PurePath.match(pattern) | Determine if the current path matches the specified wildcard character. |
PurePath.relative_to(*other) | Get the result after removing the base path from the current path. |
PurePath.with_name(name) | Replaces the file name in the current path with the new file name. If there is no filename in the current path, a ValueError will be raised. |
PurePath.with_suffix(suffix) | Replaces the file suffix name in the current path with a new suffix name. If there is no suffix name in the current path, a new suffix name will be added. |
2. The Path class
More often than not, we use the Path class directly instead of PurePath.
Path is a subclass of PurePath. In addition to supporting various constructors, properties and methods provided by PurePath, it also provides methods to determine the validity of the path, and even determine whether the path corresponds to a file or a folder, and if it is a file, it also supports operations such as reading and writing to the file.
Path has 2 subclasses, PosixPath and WindowsPath, the role of these two subclasses is obvious and will not be repeated.
Basic usage
|
|
Directory operations
Traversing the directory
Create file
|
|
File Operations
|
|
File information
|
|
File read/write
open(mode=‘r’, bufferiong=-1, encoding=None, errors=None, newline=None)
Used in a similar way to Python’s built-in open function, returning a file object.
read_bytes() : reads the file in 'rb'
mode and returns data of type bytes
write_bytes(data) : writes data to the file in 'wb'
mode
read_text(encoding=None, errors=None) : Read the file with 'r'
and return the text.
write_text(data, encoding=None, errors=None) : Write a string to the path corresponding to the file in 'w'
way.
Judgment operation
-
returns a boolean
-
is_dir() : whether it is a directory
-
is_file() : whether it is a normal file
-
is_symlink() : whether it is a soft link
-
is_socket() : whether it is a socket file
-
is_block_device() : if or not it is a block device
-
is_char_device() : whether it is a character device
-
is_absolute() : whether it is an absolute path
Path splicing and decomposition
In pathlib, paths are stitched together by the stitching operator /
in three main ways:
- Path object / Path object.
- Path object / Strin.
- String / Path object.
Decomposing paths is mainly done by the parts method.
|
|
Wildcards
- glob(pattern) : wildcard the given pattern
- rglob(pattern) : wildcard the given pattern and recursively search the directory
Return value: a generator
|
|
Regular match
Use match method for pattern matching and return True if successful.