Background
Deep learning environment configuration is often a cumbersome task, especially on servers shared by multiple users. Although conda integrates tools like virtualenv to isolate different dependent environments, this solution still has no way to uniformly allocate compute resources. Now, we can use container technology to create a container for each user and allocate compute resources to the container accordingly. There are many container-based deep learning platform products on the market, such as AiMax, which has a lot of integrated features, but if you just need to call the GPU inside the container, you can refer to the following steps.
Calling the GPU using the Docker Client
Dependent installation
The docker run --gpu
command relies on the nvidia Linux driver and the nvidia container toolkit, if you want to see the full installation documentation click here.
Installing nvidia drivers on a Linux server is very simple, if you have a GUI installed you can install them directly in Ubuntu’s “Additional Drivers” application, or you can download them from the nvidia website.
The next step is to install the nvidia container toolkit, our server needs to meet some prerequisites.
- GNU/Linux x86_64 kernel version > 3.10
- Docker >= 19.03 (note not Docker Desktop, if you want to use toolkit on your desktop, please install Docker Engine instead of Docker Desktop, because Desktop versions are running on top of virtual machines)
- NVIDIA GPU architecture >= Kepler (currently RTX20 series cards are Turing architecture, RTX30 series cards are Ampere architecture)
- NVIDIA Linux drivers >= 418.81.07
Then you can officially install NVIDIA Container Toolkit on Ubuntu or Debian, if you want to install it on CentOS or other Linux distributions, please refer to the official installation documentation.
Install Docker
1
2
|
$ curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
|
Of course, please refer to the official series of operations to be performed after installation after the installation is completed here. If you encounter problems with the installation, please refer to the official installation documentation.
Set up Package Repository and GPG Key
1
2
3
4
5
|
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
Please note: If you want to install NVIDIA Container Toolkit versions prior to 1.6.0, you should use the nvidia-docker repository instead of the libnvidia-container repositories above.
If you encounter problems please refer directly to the Installation Manual.
Installing nvidia-docker2 should automatically install libnvidia-container-tools libnvidia-container1 and other dependencies, if not you can install them manually
Install nvidia-docker2
after completing the previous steps.
1
|
$ sudo apt install -y nvidia-docker2
|
Restart Docker Daemon.
1
|
$ sudo systemctl restart docker
|
Next you can test if the installation is correct by running a CUDA container.
1
|
docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
|
The output displayed in the shell should look similar to the following.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
|
--gpus
Usage
Note that if you are installing nvidia-docker2, it already registers the NVIDIA Runtime with Docker at installation time. If you are installing nvidia-docker, please follow the official documentation to register the runtime with Docker.
If you have any questions, please move to the documentation referenced in this section.
GPUs can be assigned to the Docker CLI using options that start with Docker or using environment variables. this variable controls which GPUs are accessible within the container.
-gpus
NVIDIA_VISIBLE_DEVICES
possible values |
description |
0,1,2 or GPU-fef8089b |
Comma-separated GPU UUID(s) or GPU index |
all |
All GPUs are accessible by the container, default value |
none |
No access to the GPU, but you can use the functions provided by the driver |
void or empty or unset |
nvidia-container-runtime will have the same behavior as (i.e. neither GPUs nor capabilities are exposed)runc |
This parameter should be used when specifying a GPU using this option. The format of the parameter should be wrapped in single quotes followed by double quotes of the device to be enumerated to the container. Example: Enumerate GPUs 2 and 3 to the container. --gpus '"device=2,3"'
When using the NVIDIA_VISIBLE_DEVICES variable, you may need to set --runtime nvidia
unless it is set to the default value.
-
Set up a container with CUDA support enabled
1
|
$ docker run --rm --gpus all nvidia/cuda nvidia-smi
|
-
Specify nvidia as the runtime and specify the variable NVIDIA_VISIBLE_DEVICES
1
2
|
$ docker run --rm --runtime=nvidia \
-e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda nvidia-smi
|
-
Allocate 2 GPUs to the launched container
1
|
$ docker run --rm --gpus 2 nvidia/cuda nvidia-smi
|
-
Specify the use of GPUs with indexes 1 and 2 for containers
1
2
|
$ docker run --gpus '"device=1,2"' \
nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv
|
1
2
3
|
uuid
GPU-ad2367dd-a40e-6b86-6fc3-c44a2cc92c7e
GPU-16a23983-e73e-0945-2095-cdeb50696982
|
-
You can also use NVIDIA_VISIBLE_DEVICES
1
2
3
|
$ docker run --rm --runtime=nvidia \
-e NVIDIA_VISIBLE_DEVICES=1,2 \
nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv
|
1
2
3
|
uuid
GPU-ad2367dd-a40e-6b86-6fc3-c44a2cc92c7e
GPU-16a23983-e73e-0945-2095-cdeb50696982
|
-
Use nvidia-smi
to query the GPU UUID and assign it to the container
1
|
$ nvidia-smi -i 3 --query-gpu=uuid --format=csv
|
1
2
|
uuid
GPU-18a3e86f-4c0e-cd9f-59c3-55488c4b0c24
|
1
2
|
docker run --gpus device=GPU-18a3e86f-4c0e-cd9f-59c3-55488c4b0c24 \
nvidia/cuda nvidia-smi
|
For settings on using the driver’s capabilities within the container, and other settings see here.
Use the Docker Go SDK to assign GPUs to containers
NVIDIA/go-nvml
provides Go language bindings for the NVIDIA Management Library API (NVML). Currently only supported for Linux, repository.
The following demo code obtains various information about the GPU. For other functions, please refer to the official documentation of NVML and go-nvml.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
|
package main
import (
"fmt"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"log"
)
func main() {
ret := nvml.Init()
if ret != nvml.SUCCESS {
log.Fatalf("Unable to initialize NVML: %v", nvml.ErrorString(ret))
}
defer func() {
ret := nvml.Shutdown()
if ret != nvml.SUCCESS {
log.Fatalf("Unable to shutdown NVML: %v", nvml.ErrorString(ret))
}
}()
count, ret := nvml.DeviceGetCount()
if ret != nvml.SUCCESS {
log.Fatalf("Unable to get device count: %v", nvml.ErrorString(ret))
}
for i := 0; i < count; i++ {
device, ret := nvml.DeviceGetHandleByIndex(i)
if ret != nvml.SUCCESS {
log.Fatalf("Unable to get device at index %d: %v", i, nvml.ErrorString(ret))
}
// Get UUID
uuid, ret := device.GetUUID()
if ret != nvml.SUCCESS {
log.Fatalf("Unable to get uuid of device at index %d: %v", i, nvml.ErrorString(ret))
}
fmt.Printf("GPU UUID: %v\n", uuid)
name, ret := device.GetName()
if ret != nvml.SUCCESS {
log.Fatalf("Unable to get name of device at index %d: %v", i, nvml.ErrorString(ret))
}
fmt.Printf("GPU Name: %+v\n", name)
memoryInfo, _ := device.GetMemoryInfo()
fmt.Printf("Memory Info: %+v\n", memoryInfo)
powerUsage, _ := device.GetPowerUsage()
fmt.Printf("Power Usage: %+v\n", powerUsage)
powerState, _ := device.GetPowerState()
fmt.Printf("Power State: %+v\n", powerState)
managementDefaultLimit, _ := device.GetPowerManagementDefaultLimit()
fmt.Printf("Power Managment Default Limit: %+v\n", managementDefaultLimit)
version, _ := device.GetInforomImageVersion()
fmt.Printf("Info Image Version: %+v\n", version)
driverVersion, _ := nvml.SystemGetDriverVersion()
fmt.Printf("Driver Version: %+v\n", driverVersion)
cudaDriverVersion, _ := nvml.SystemGetCudaDriverVersion()
fmt.Printf("CUDA Driver Version: %+v\n", cudaDriverVersion)
computeRunningProcesses, _ := device.GetGraphicsRunningProcesses()
for _, proc := range computeRunningProcesses {
fmt.Printf("Proc: %+v\n", proc)
}
}
fmt.Println()
}
|
Using Docker Go SDK to assign GPUs to containers
The first thing you need to use is the ContainerCreate
API.
1
2
3
4
5
6
7
8
9
|
// ContainerCreate creates a new container based in the given configuration.
// It can be associated with a name, but it's not mandatory.
func (cli *Client) ContainerCreate(
ctx context.Context,
config *container.Config,
hostConfig *container.HostConfig,
networkingConfig *network.NetworkingConfig,
platform *specs.Platform,
containerName string) (container.ContainerCreateCreatedBody, error)
|
The API requires a number of structs to specify the configuration, one of which is Resources
in the struct container.HostConfig
, which is of type container.Resources
, and inside it is a slice of the structure container.DeviceRequest
, which is used by the driver of the GPU device.
1
2
3
4
5
6
7
8
9
10
11
12
13
|
container.HostConfig{
Resources: container.Resources{
DeviceRequests: []container.DeviceRequest {
{
Driver: "nvidia",
Count: 0,
DeviceIDs: []string{"0"},
Capabilities: [][]string{{"gpu"}},
Options: nil,
}
}
}
}
|
The following is the definition of the container.DeviceRequest
structure.
1
2
3
4
5
6
7
8
9
|
// DeviceRequest represents a request for devices from a device driver.
// Used by GPU device drivers.
type DeviceRequest struct {
Driver string // The name of the device driver here is "nvidia" can be
Count int // Number of requested devices (-1 = All)
DeviceIDs []string // A list of device IDs that can be recognized by the device driver, either as an index or as a UUID
Capabilities [][]string // An OR list of AND lists of device capabilities (e.g. "gpu")
Options map[string]string // Options to pass onto the device driver
}
|
Note: If you specify the Count
field, you cannot specify GPUs by DeviceIDs
, they are mutually exclusive.
Next we try to start a pytorch container using the Docker Go SDK.
First we write a test.py
file and let it run inside the container to check if CUDA is available.
1
2
3
4
|
# test.py
import torch
print("cuda.is_available:", torch.cuda.is_available())
|
Here is the experimental code that starts a container named torch_test_1
and runs the command python3 /workspace/test.py
, then gets the output from stdout
and stderr
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
|
package main
import (
"context"
"fmt"
"github.com/docker/docker/api/types"
"github.com/docker/docker/api/types/container"
"github.com/docker/docker/client"
"github.com/docker/docker/pkg/stdcopy"
"os"
)
var (
defaultHost = "unix:///var/run/docker.sock"
)
func main() {
ctx := context.Background()
cli, err := client.NewClientWithOpts(client.WithHost(defaultHost), client.WithAPIVersionNegotiation())
if err != nil {
panic(err)
}
resp, err := cli.ContainerCreate(ctx,
&container.Config{
Image: "pytorch/pytorch",
Cmd: []string{},
OpenStdin: true,
Volumes: map[string]struct{}{},
Tty: true,
}, &container.HostConfig{
Binds: []string{`/home/joseph/workspace:/workspace`},
Resources: container.Resources{DeviceRequests: []container.DeviceRequest{{
Driver: "nvidia",
Count: 0,
DeviceIDs: []string{"0"}, // Either the GPU index or the GPU UUID can be entered here
Capabilities: [][]string{{"gpu"}},
Options: nil,
}}},
}, nil, nil, "torch_test_1")
if err != nil {
panic(err)
}
if err := cli.ContainerStart(ctx, resp.ID, types.ContainerStartOptions{}); err != nil {
panic(err)
}
fmt.Println(resp.ID)
execConf := types.ExecConfig{
User: "",
Privileged: false,
Tty: false,
AttachStdin: false,
AttachStderr: true,
AttachStdout: true,
Detach: true,
DetachKeys: "ctrl-p,q",
Env: nil,
WorkingDir: "/",
Cmd: []string{"python3", "/workspace/test.py"},
}
execCreate, err := cli.ContainerExecCreate(ctx, resp.ID, execConf)
if err != nil {
panic(err)
}
response, err := cli.ContainerExecAttach(ctx, execCreate.ID, types.ExecStartCheck{})
defer response.Close()
if err != nil {
fmt.Println(err)
}
// read the output
_, _ = stdcopy.StdCopy(os.Stdout, os.Stderr, response.Reader)
}
|
As you can see, the program outputs the Contrainer ID of the created container and the output of the executed command.
1
2
3
4
|
$ go build main.go
$ sudo ./main
264535c7086391eab1d74ea48094f149ecda6d25709ac0c6c55c7693c349967b
cuda.is_available: True
|
Next, use docker ps
to check the container status.
1
2
3
|
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
264535c70863 pytorch/pytorch "bash" 2 minutes ago Up 2 minutes torch_test_1
|
Extended Reading: NVIDIA Multi-Instance GPUs
The Multi-Instance GPU (MIG) feature allows GPUs based on the NVIDIA Ampere architecture, such as the NVIDIA A100, to be securely partitioned into up to seven separate GPU instances for CUDA applications, providing separate GPU resources for multiple users to achieve optimal GPU utilization. This feature is particularly useful for workloads that do not fully saturate the compute capacity of the GPU, so users may want to run different workloads in parallel to maximize utilization.