Workstation Setup for Docker with the New NVIDIA Container Toolkit (nvidia-docker2 is deprecated)

Table of Contents

Introduction

It's time for a "Docker with NVIDIA GPU support" update. This post will guide you through a useful Workstation setup (including User-name-spaces and performance tuning) with the new versions of Docker and the NVIDIA GPU container toolkit.

If you read many of my blog posts you will know that I user Docker with NVIDIA GPU support a lot. I have been writing about Docker for a few years now and a search for docker on the Puget Systems HPC blog brings up over 100 posts! The most recent was a "careful" post, How To Install Docker and NVIDIA-Docker on Ubuntu 19.04 that went through a setup with attention to version "gotchas". That was using the Docker 18.09 release with nvidia-docker2 2.0.3. Things have changed!

As of Docker release 19.03 NVIDIA GPUs are natively supported as devices in the Docker runtime. And, nvidia-docker2 is deprecated!

I'll go through the docker + NVIDIA GPU setup in a series of steps. The end result will be an up-to-date configuration suitable for a personal Workstation. This **will include performance tunning for GPU acceleration and a "single user friendly" file ownership config utilizing UserNamespaces.**

I'm doing this guide with Ubuntu 18.04 default, and including notes for 19.04.

Why Update?

Well, the main reason is that nvidia-docker2 is deprecated and will not be maintained going forward.
The move to the built-in GPU support in Docker and the nvidia-container-toolkit should eliminate the constant update-version mismatch battle between fresh releases of docker-ce and the lag before nvidia-docker2 was update. ( I was constantly running into that because from having docker on several "production" and testing systems.)
The "practical" usage side of the change is mostly small.

Instead of,

docker run --runtime=nvidia ...

or

nvidia-docker run ...

It will now look like

docker run --gpus all ...

There are several arguments (like "all") to "–gpus", see the NVIDIA docker Wiki and the documentation entry for "run –gpus"

I suspect it will be a fairly slow transition to docker-ce 19.03+ and nvidia-container-toolkit. NVIDIA's own NGC container repository has it's documentation using the old "run" syntax. In general there is lots of documentation and example code that will need to be updated. However, I haven't run into any difficulties myself since most of the old "docker run" arguments are the same. Just try substituting "docker run –gpus all" for "docker run –runtime=nvidia".

Step 1) Preparation and Clean Up

Docker:

If you have an old docker install you should remove that as suggested in the docker install documentation.

sudo apt-get remove docker docker-engine docker.io containerd runc

If you have a docker-ce install, version 18.xx or older, that you have disabled updates on, you should re-enable the repo i.e. remove a "hold" or uncomment the repo in /etc/apt/sources.list. The updated docker will upgrade to 19.03+ with normal updates.

NVIDIA GPU Setup:

You don't have to do much here if you already have an NVIDIA GPU and an updated driver. However, you should be sure to have a recent NVIDIA driver installed.

If you are using a new OS install then an easy way to install/update the NVIDIA display driver is to use the "graphics drivers" ppa.

sudo add-apt-repository ppa:graphics-drivers/ppa

sudo apt-get update

sudo apt-get install build-essential dkms

sudo apt-get install nvidia-driver-435

The version installed above is "435". You can see what is the most recent driver available by visiting the ppa site or by hitting [Tab] twice to expand the with apt-get i.e.

sudo apt-get install nvidia-driver- [Tab][Tab]

Remove nvidia-docker2

If you have been using docker-ce 18.0x along with nvidia-docker2 then you will need to remove it in order to upgrade to docker-ce 19.x.

sudo apt-get purge nvidia-docker2

That should be all you need to do. We will setup the nvidia-container-toolkit in a later section.

Step 2) Install docker-ce

The Docker community edition is simple to install and keep up-to-date on Ubuntu by adding the official repo.

We'll be able to follow the install described in the official documentation, https://docs.docker.com/install/linux/docker-ce/ubuntu/

Install some required packages, (they may already be installed)

sudo apt-get install 
    apt-transport-https 
    ca-certificates 
    curl 
    gnupg-agent 
    software-properties-common

Add and check the docker key,

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

sudo apt-key fingerprint 0EBFCD88

The fingerprint should match "9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88"

Now add the repository

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

Note for Ubuntu 19.04 You're in luck! docker-ce 19.03+ is fully supported on Ubuntu 19.04. Since the NVIDIA GPU support is "in" docker-ce now there is no need to force the repo to "Bionic" to get compatibility with the NVIDIA docker setup. (However, you will have to force "ubuntu18.04" for the nvidia-container-toolkit install since NVIDIA doesn't officially support 19.04. We'll take care of that later.)

Install docker-ce.

sudo apt-get update

sudo apt-get install docker-ce docker-ce-cli containerd.io

Note: "containerd.io" is independent of docker but included in that repository.

Check docker,

sudo docker run --rm hello-world

That should pull and run the hello-world container from Docker Hub. At this point only "root" can run docker. We will add user configuration in step 4).

Step 3) Install NVIDIA Container Toolkit

Go to https://github.com/NVIDIA/nvidia-docker and check what is the latest Ubuntu version supported.:

When I wrote this supported Ubuntu versions were 14.04, 16,04, and 18.04 (all long term support versions). If you are using Ubuntu 18.04 or one of the other LTS releases then this will be simple. If you are using Ubuntu 19.04 then you will need to make one small change.

Configure repo and install:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list


sudo apt-get update

sudo apt-get install nvidia-container-toolkit


sudo systemctl restart docker

—

Note for Ubuntu 19.04:

You will need to force the the repo to Ubuntu 18.04. The simplest thing will be to set "distribution" to ubuntu18.04 where it is used in "curl",

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list


sudo apt-get update

sudo apt-get install nvidia-container-toolkit


sudo systemctl restart docker

I've used this on several Ubuntu 19.04 installs and have not had any problems.

—

Test nvidia-container-toolkit:

For a quick test of your install try the following (still running as root for now),

sudo docker run --gpus all --rm nvidia/cuda nvidia-smi

After the container downloads you should see the nvidia-smi output from the latest cuda release.

Step 4) User Group, UserNamespace Configuration, and GPU Performance Tuning

This section is where we turn docker into a more pleasant to use experience on a personal Workstation.

The most important aspect of this section will be setting up a configuration where "you" will "own" any files you create while using a container. You will be able to mount working directories from your home directory into a container and retain full ownership of any files after the container is shutdown.

Add your user name to the docker group:

The first thing to do is add your user account to the docker group so that you can run docker without sudo.

sudo usermod -aG docker your-user-name

Note:You will need to logout and back in for this to take effect.

Add your user and group id's as "subordinate id's":

Now we do the configuration to give you process and file ownership from inside a container.

See, How-To Setup NVIDIA Docker and NGC Registry on your Workstation – Part 3 Setup User-Namespaces, for a detailed discussion of how and why you might want to do this.

Add a subordinate user and group id. I used my user-id 1000 and user-name "kinghorn". Use your own user-name and id. You can use the `id` command to check your user and group ids.

sudo usermod -v 1000-1000 kinghorn

and a subordinate group

sudo usermod -w 1000-1000 kinghorn

Gotcha #1 “subuid ordering”

The commands above append the subuid (gid) to /etc/subuid and /etc/subgid. This "used" to be fine but now the order of entries matters. By default there will be other entries in those files including a "high ID range" default for your user. My setup looked like the following after running the above commands,

lxd:100000:65536
root:100000:65536
kinghorn:165536:65536
kinghorn:1000:1

You can ignore entries for lxd and root. You can also leave entries like "kinghorn:165536:65536" in-place but you will need to move the entry like "kinghorn:1000:1" to be above higher numbered subuid's. You will need to be root to edit the file so first do, sudo -s to get a root shell and then use you editor of choise (nano is fine) to edit the files to change the ordering, i.e. it should look more like,

lxd:100000:65536
root:100000:65536
kinghorn:1000:1
kinghorn:165536:65536

Edit both /etc/subuid and /etc/subgid to reflect this reordering

Setup the docker configuration file for your User Namespace, and GPU compute performance optimizations:

You cannot access the docker configuration directory unless you are "root" so before you edit the file /etc/docker/daemon.json

do, sudo -s to give yourself a root shell and then cd /etc/docker.

If there is an existing docker "daemon.json" config file create a backup and then create a new file with the following JASON. (when I did this for testing there was no existing daemon.json file in the directory)

{
    "userns-remap": "kinghorn",
    "default-shm-size": "1G",
    "default-ulimits": {
	     "memlock": { "name":"memlock", "soft":  -1, "hard": -1 },
	     "stack"  : { "name":"stack", "soft": 67108864, "hard": 67108864 }
    }
}

Now do,

chmod 600 daemon.json

to secure the config file.

Note: Be sure to to change that "kinghorn" to your own user name!

Docker will read this file on startup.

This config sets up your "user" in the docker User Namespace. That means that even though you see a root prompt in a docker container any files you create will belong to you in your system namespace. Please reed the post I mentioned above to understand the full meaning of this.

The last part ot the config JSON changes the (pitiful) default performance parameters so the you can run large GPU accelerated compute jobs.

Restart docker:

sudo systemctl restart docker.service

Done!

Examples

If you want to use some of the (many) nice containers that NVIDIA has put together on NGC then go to their site and see what they have put together. It's pretty impressive!

Gotcha #2 NGC needs updated documentation to reflect the nvidia-docker2 deprecation

If you look at the TensorFlow page on NGC you will see the old start-up syntax. That wont work with the new nvidia-container-toolkit! Hopefully this will get fixed soon (that may be the case by the time you read this post??)

On that page the documentation lists, (old syntax)

nvidia-docker run -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:<xx.xx>-py<x>

To use your new docker + nvidia-container-toolkit install you would need to do something like, (new syntax)

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:<xx.xx>-py<x>

I would start up their TensorFlow container with something like,

docker run --gpus all --rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:19.08-py3

Note: The -v $HOME/projects:/projects part of that line is binding a directory named "projects" in my home directory to the directory "/projects" in the container. Any files created there will be owned by me and remain there after exiting from the container instance.