In your daily work, in addition to using Python to process text files, sometimes you will also be involved in processing compressed files.
The compression file formats that are usually involved are
- rar: the more used compression under Windows environment, the more famous GUI tool is winrar
- tar: a packaging tool for Linux systems, only packaging, not compression
- gz: or gzip, usually can only compress one file. Combined with tar, it can be packaged first, then compressed.
- tgz: that is, gz. first pack with tar, then compress with gz to get the file.
- zip: different from gzip, although using a similar algorithm, you can package and compress multiple files, but compress the files separately, the compression rate is lower than tar
- 7z: a format supported by 7zip compression software, with higher compression efficiency.
Of course, in addition to using Python, you can also choose to use compression and decompression software or command dynamic processing.
zip file
zipfile is a module used to compress and decompress zip files in Python. zipfile has two very important classes: ZipFile and ZipInfo. zipFile is the main class used to create and read zip files, while ZipInfo stores information about each file in the zip file.
Sample code.
|
|
tar.gz file
The tarfile module can be used to read and write tar archives, including archives compressed with gzip, bz2 and lzma. The mode must be understood when using tarfile.
mode must be a string of the form ‘filemode[:compression]’, whose default value is ‘r’. The following is a complete list of mode combinations:
Mode | Action |
---|---|
'r' or 'r:*' |
Open and read using transparent compression (recommended). |
'r:' |
Open and read without compression. |
'r:gz' |
Open and read with gzip compression. |
'r:bz2' |
Open and read with bzip2 compression. |
'r:xz' |
Open and read using lzma compression. |
'x' or 'x:' |
Create tarfile without compression. If the file already exists, a FileExistsError exception is thrown. |
'x:gz' |
Create a tarfile using gzip compression. throw a FileExistsError if the file already exists. |
'x:bz2' |
Create a tarfile using bzip2 compression, or throw a FileExistsError if the file already exists. |
'x:xz' |
Create a tarfile using lzma compression. |
'a' or 'a:' |
Open to append without compression. If the file does not exist, create it. |
'w' or 'w:' |
Open for uncompressed writing. |
'w:gz' |
Open for gzip-compressed writes. |
'w:bz2' |
Open for bzip2 compressed writes. |
'w:xz' |
Turn on writing for lzma compression. |
For special purposes, a second mode format exists: ‘filemode|[compression]’. tarfile.open() will return a TarFile object that treats its data as a stream of data blocks.
Mode | Action |
---|---|
‘r/*’ |
打开 tar 块的 流 以进行透明压缩读取。 |
‘r/’ |
打开一个未压缩的 tar 块的 stream 用于读取。 |
‘r/gz’ |
打开一个 gzip 压缩的 stream 用于读取。 |
‘r/bz2’ |
打开一个 bzip2 压缩的 stream 用于读取。 |
‘r/xz’ |
打开一个 lzma 压缩 stream 用于读取。 |
‘w/’ |
打开一个未压缩的 stream 用于写入。 |
‘w/gz’ |
打开一个 gzip 压缩的 stream 用于写入。 |
‘w/bz2’ |
打开一个 bzip2 压缩的 stream 用于写入。 |
‘w/xz’ |
打开一个 lzma 压缩的 stream 用于写入。 |
Code example.
|
|
rar file
We can use rarfile to decompress .rar files, but rarfile is not supported to compress rar files. rarfile follows the unrar component, but after installing it using pip install unrar, the following error is reported.
Couldn’t find path to unrar library…
This is because unrar under Python also relies on the official RAR library.
Installation for Windows
- Go to RARLab and download the official library file, https://www.rarlab.com/rar/UnRARDLL.exe, and install it.
- Installation is best to choose the default path, usually under C:\Program Files (x86)\UnrarDLL\ directory.
- Add environment variables, create a new variable name UNRAR_LIB_PATH in the system variable, if it is a 64-bit system, enter C:\Program Files (x86)\UnrarDLL\x64\UnRAR64.dll, if it is a 32-bit system is C:\Program Files (x86)\ UnrarDLL\UnRAR.dll.
- After making sure to save the environment variables, do the pip install unrar installation, and then the code will not report an error when you run it again.
Installation of Linux
- Download the rar source file: https://www.rarlab.com/rar/rarlinux-6.0.0.tar.gz
- Unzip the installation package, enter the installation package directory, compile and install, generate the so file
- Configure the environment variables, and when you are done, do the pip install unrar installation
Code example.
7z file
To compress and decompress .7z files you need to use py7zr component. Code example.