Docker and NVIDIA-docker on your workstation: Setup User Namespaces

In this third post about "Docker and NVIDIA-Docker on your Workstation" I will go through configuration of Linux Kernel User-Namespaces for use with Docker. This is the last important system configuration detail needed to fulfill the setup suggested in the first post describing my motivation for this project.

User-Namespaces will allow the configuration of a much more secure and convenient to use as a "Single-User-Docker-Workstation".

It is assumed that you have already installed the host OS, Docker-Engine, and NVIDIA-Docker as described in the "installation post".

Note: In the short time since the “installation” post both Docker-Engine and NVIDIA-Docker have been updated. Docker changed it’s version naming scheme. The latest release (as of this writing) is 17.03.0 and they are now on a monthly release cycle. The instructions in the installation post will automatically pull the new version with the normal Ubuntu updates.

NVIDIA-Docker is now at version 1.0.1. You can find the newest releases on GitHub. You will need to manually update using the latest .deb file.


What are Kernel "Namespaces" and what is the User-Namespace?

In general [Linux Namespaces(http://man7.org/linux/man-pages/man7/namespaces.7.html) provide an abstraction for system resources that can give a running process the appearance of using an isolated instance of the resource. There are namespaces for process IDs, network, interprocess communication, user IDs and others. This namespace mechanism along with system "control groups (cgroups)" are major components of containerization.

The User-Namespace is the latest namespace to be utilized by Docker. It was added with version 1.10. The main idea of a user-namespace is that a processes UID (user ID) and GID (group ID) can be different inside and outside of a containers namespace. The significant consequence of this is that a container can have it’s root process mapped to a non-privileged user ID on the host. The configuration mechanism for this mapping is given by the subordinate user and group ID files /etc/subuid and /etc/subgid.

Currently Docker allows only a single user and group ID to be remapped for containers. This will require starting the Docker service using the --userns-remap flag.

The restriction to a single user re-mapping is not limiting in our case since we are configuring a "Single User Workstation".


Why use User-Namespaces with Docker?

A few experiments should clarify why we want to configure Docker with user-namespaces.

Experiment 1

I’m going to start the tiny Alpine Linux in a container running /bin/sh. I will bind the host /opt directory to /opt in the container. I put a file owned by root on the host in /opt/root-file-on-host

Note: The command prompt on my host system for this testing is kinghorn@i7:~$

kinghorn@i7:~$ docker run --rm -i -t -v /opt:/opt alpine /bin/sh
 
/ # cd /opt
 
/opt # ls -l
 
-rw-r--r--    1 root     root             0 Mar  9 03:38 root-file-on-host
 
/opt # rm root-file-on-host
 
/opt # ls
 

I was able to delete that file owned by root! Root inside the container could act like root on the host.

…start an Alpine container and run the ping command;

/opt # ping ubuntu.com
 
PING ubuntu.com (91.189.94.40): 56 data bytes
64 bytes from 91.189.94.40: seq=0 ttl=45 time=151.856 ms
...

…look for that ping process on my host system

kinghorn@i7:~$ ps xua | grep ping
 
root     13425  0.0  0.0   1524     4 pts/2    S+   19:50   0:00 ping ubuntu.com

ping in the container is running as root on the host!


Experiment 2

Now I’ll stop docker and restart the daemon manually using the --userns-remap flag. I’m remapping to the user Name kinghorn (it wont be my user ID on the host and we’ll see why later).

kinghorn@i7:~$ sudo systemctl stop docker.service
kinghorn@i7:~$ sudo docker daemon --userns-remap=kinghorn

We can repeat experiment 1, but now we are running Docker with user namespace remapping.

kinghorn@i7:~$ docker run --rm -i -t -v /opt:/opt alpine /bin/sh
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
627beaf3eaaf: Pull complete
Digest: sha256:58e1a1bb75db1b5a24a462dd5e2915277ea06438c3f105138f97eb53149673c4
Status: Downloaded newer image for alpine:latest

Note that this time docker pulled a new copy of Alpine from Docker Hub. I explain that shortly.

/ # cd /opt
 
/opt # ls -l
 
-rw-r--r--    1 nobody   nobody           0 Mar  9 04:06 root-file-on-host
 
/opt # rm root-file-on-host
 
rm: remove 'root-file-on-host'? y
rm: can't remove 'root-file-on-host': Permission denied

Good! Now the files on the host file-system are owned by "nobody" in the container and I can not delete the file.

…lets look at the ping process as before.

/opt # ping ubuntu.com
 
PING ubuntu.com (91.189.94.40): 56 data bytes
64 bytes from 91.189.94.40: seq=0 ttl=45 time=158.330 ms
...
kinghorn@i7:~$ ps xua | grep ping
165536   20297  0.0  0.0   1524     4 pts/4    S+   20:15   0:00 ping ubuntu.com

The ping process in the container is now owned by user ID 165536 instead of root.
That’s much better but who is 165536?


Lets take a look at what’s going on in experiment 2.

The first thing to notice is that when I started the Alpine container it pulled a new copy instead of using the copy I had from earlier. If we look in /var/lib/docker there is now a new directory.

drwx------ 11 165536 165536 4096 Mar  8 20:04 165536.165536

The directory is named "165536.165536" owned by user 165536. This a new directory for Docker files owned by this mystery user 165536. The rest of the experiment showed that this new user could not act as root on the host system and the container processes are owned by this user instead of root. That’s almost what we want. Where did this user 165536 come from? Lets look at the /etc/subuid file. This is the subordinate user ID file used by User-Namespaces.

kinghorn@i7:~$ cat /etc/subuid
lxd:100000:65536
root:100000:65536
kinghorn:165536:65536

There are three entries, one for lxd one for root and one for kinghorn. (The entries for lxd and root are junk that the Ubuntu server install put in there. I have no interested in Canonical’s LXD. We’ll delete that rubbish later.) Each line in this file has a starting UID and a range of additional UIDs that can be used (65536 of them). The subordinate UID for kinghorn is the user ID that was seen in the experiment above. That’s because I started the Docker daemon with --userns-remap=kinghorn.


How to configure Docker User-Namespaces

There are two things we need to do to configure User-Namespaces for Docker;

  • Setup the systemd unit files for docker.service to start with the --userns-remap flag.

  • Edit /etc/subuid and /etc/subgid with the actual user ID we want Docker to use for our container programs, — our own real user ID.

Modifying Docker’s systemd configuration

Most people don’t like messing with systemd and that is completely understandable. However, if you want to set things up right that’s what you need to do. Docker uses systemd. On Ubuntu you may stumble across /etc/default/docker don’t get your hopes up, that’s junk. Docker doesn’t use it now.

Systemd default service configurations can be overridden by "add-in" files in directories with names like /etc/system.d/system/"service-name".service.d. Lets create that for the docker service.

sudo mkdir /etc/systemd/system/docker.service.d

Now create the file /etc/systemd/system/docker.service.d/userns-remap.conf. This file should contain the following;

[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// --userns-remap=kinghorn

Notes: The empty ExecStart= is needed to clear out the systemd default before the changes are made in the next line.

I used "kinghorn" as the remap user name. That’s my login account name and I’m using that because I want to "own" what happens in the containers I start up. If you use "default" Docker will create a user named "dockremap" and add an entry in /etc/subuid. You can use any user you want that has an account on your host system and that has an entry in subuid and subgid. At this time you can only use one user for this with Docker in the future expect this to become more general.

Setup /etc/subuid and /etc/subgid

This is the step that will achieve the desired configuration! I want the process that starts in a container to be owned by my user ID from the viewpoint of my host system. To make this happen I will use my host user ID as the first subordinate ID.

Edit /etc/subuid and /etc/subgid files to contain the following (same content for both files and use "your" host user ID)

kinghorn:1000:1
kinghorn:100001:65535

Notes: My (kinghorn) user/group ID is 1000. I’m setting the first subordinate ID to that and then configuring the remaining 65535 (65536 – 1) IDs to start with the arbitrary ID 100001. That first ID will be used by the container startup process for the program or environment I’m trying to run. I can bind a subdirectory in my home directory to the container so that any files there can be used in the container. Also, any files I create in the container will be owned by my user on the host.

Restart to be sure the new configuration starts correctly on boot

sudo shutdown -r now

Experiment 3

Check that everything is working as expected.

kinghorn@i7:~$ docker run --rm -it alpine /bin/sh
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
627beaf3eaaf: Pull complete
Digest: sha256:58e1a1bb75db1b5a24a462dd5e2915277ea06438c3f105138f97eb53149673c4
Status: Downloaded newer image for alpine:latest

Starting Alpine again pulled a new image and there is now a new directory 1000.1000 in /var/lib/docker owned by "kinghorn"

kinghorn@i7:~$ sudo ls -l /var/lib/docker/
 
drwx------ 11 kinghorn kinghorn 4096 Mar  9 16:09 1000.1000

…lets start ping in the container and see who owns it on the host.

/ # ping ubuntu.com
PING ubuntu.com (91.189.94.40): 56 data bytes
64 bytes from 91.189.94.40: seq=0 ttl=45 time=150.701 ms
...
kinghorn@i7:~$ ps xua | grep ping
kinghorn  3351  0.0  0.0   1524     4 pts/2    S+   16:11   0:00 ping ubuntu.com

The ping command that I started in the container is now being run by "kinghorn". That is exactly what I want!


Two Important Notes

You may have noticed that shell prompt in the Alpine container, / #. That is indicating that the shell is running as root in the container. This is normal for Docker containers. The main process in a container starts in the containers PID-namespace using PID 1 owned by the container root. The big difference with the configuration that was setup above is that now root in the container is just my user account from the perspective of the host. (You need to fully understand that last statement!) Running a program in a container with this configuration is effectively the same as if I had started it from the host. Exactly what I wanted.

You might be thinking "that is a pretty radical configuration change from the default Docker behavior" and you would be correct! If you do this and you want to start a container without using User-Namspaces you can override the configuration when you start a container by using the command-line flag --userns=host That will start up a container in the "normal" way i.e. owned by root.


Test: Compiling a CUDA GPU program on a system "without CUDA installed" by Using Nvidia-docker and Docker

I have created a directory projects/docker-test and put a copy of the CUDA sample program sources in it. I’ll bind this directory to a container that has a CUDA setup in it and compile a program. I can run it in the container and then exit the container and the program will still be there owned and runnable by me.

(This is one of my home systems with a GTX980 … I’ll clean up the compiling terminal output a little to make this example more readable.)

kinghorn@i7:~$ nvidia-docker run --rm -it -v /home/kinghorn/projects/docker-test:/projects nvidia/cuda
Using default tag: latest
latest: Pulling from nvidia/cuda
d54efb8db41d: Pull complete
f8b845f45a87: Pull complete
e8db7bf7c39f: Pull complete
9654c40e9079: Pull complete
6d9ef359eaaa: Pull complete
cdfa70f89c10: Pull complete
3208f69d3a8f: Pull complete
eac0f0483475: Pull complete
4580f9c5bac3: Pull complete
6ee6617c19de: Pull complete
Digest: sha256:2b7443eb37da8c403756fb7d183e0611f97f648ed8c3e346fdf9484433ca32b8
Status: Downloaded newer image for nvidia/cuda:latest
 
**THE COMMAND ABOVE PULLED THE nvidia/cuda:latest IMAGE FROM DOCKER HUB AUTOMATICALLY
**THE NEXT TIME I RUN IT IT WILL START INSTANTLY
 
**HERE YOU CAN SEE THE "projects" DIRECTORY I MOUNTED INTO THE CONTAINER
root@0d74ca66f4c9:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  projects  root  run  sbin  srv  sys  tmp  usr  var
 
** NOW cd TO THE SOURCE DIRECTORY AND RUN "make" TO COMPILE matrixMulCUBLAS
root@0d74ca66f4c9:/# cd projects/samples/0_Simple/matrixMulCUBLAS/
 
root@0d74ca66f4c9:/projects/samples/0_Simple/matrixMulCUBLAS# make
 
/usr/local/cuda-8.0/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode
+++...
-o matrixMulCUBLAS matrixMulCUBLAS.o  -lcublas
mkdir -p ../../bin/x86_64/linux/release
cp matrixMulCUBLAS ../../bin/x86_64/linux/release
+++...
 
** NOW RUNNING THE PROGRAM IN THE CONTAINER
root@0d74ca66f4c9:/projects/samples/0_Simple/matrixMulCUBLAS# ./matrixMulCUBLAS
 
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "GeForce GTX 980" with compute capability 5.2
 
MatrixA(640,480), MatrixB(480,320), MatrixC(640,320)
Computing result using CUBLAS...done.
Performance= 2508.57 GFlop/s, Time= 0.078 msec, Size= 196608000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS
 
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
 
** NOW EXIT FROM THE CONTAINER
root@0d74ca66f4c9:/projects/samples/0_Simple/matrixMulCUBLAS# exit

I am now out of the container. The program I just compiled in the container is in my directory and I own it.

kinghorn@i7:~$ cd projects/docker-test/samples/0_Simple/matrixMulCUBLAS/
 
kinghorn@i7:~/projects/docker-test/samples/0_Simple/matrixMulCUBLAS$ ls -l
total 636
-rw-r--r-- 1 kinghorn kinghorn   9134 Feb 26 18:10 Makefile
-rwxr-xr-x 1 kinghorn kinghorn 582360 Mar  9 17:39 matrixMulCUBLAS
-rw-r--r-- 1 kinghorn kinghorn  13019 Feb 26 18:10 matrixMulCUBLAS.cpp
-rw-r--r-- 1 kinghorn kinghorn  26464 Mar  9 17:39 matrixMulCUBLAS.o
-rw-r--r-- 1 kinghorn kinghorn   2546 Feb 26 18:10 NsightEclipse.xml
-rw-r--r-- 1 kinghorn kinghorn    403 Feb 26 18:10 readme.txt
 
kinghorn@i7:~/projects/docker-test/samples/0_Simple/matrixMulCUBLAS$ ./matrixMulCUBLAS
 
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "GeForce GTX 980" with compute capability 5.2
 
MatrixA(640,480), MatrixB(480,320), MatrixC(640,320)
Computing result using CUBLAS...done.
Performance= 2511.51 GFlop/s, Time= 0.078 msec, Size= 196608000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS
 
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Note that the performance in and out of the container is essentially identical. Perfect!

In the next post in this Docker series I’ll write about using GUI programs running in Docker containers and give more usage information.

Happy computing –dbk