Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1095
Dr Donald Kinghorn (Scientific Computing Advisor )

How-To Setup NVIDIA Docker and NGC Registry on your Workstation - Part 1 Introduction and Base System Setup

Written on January 26, 2018 by Dr Donald Kinghorn
Share:

Item number 4 in my recent post "My 2018 New Years Sys Admin and Dev Resolutions" reads

"4) Adopt a Docker based workflow on Linux (and try that on Windows)"

I've been occasionally using Docker on my local workstation for well over a year now. My use has been sporadic but I am doing more machine learning and development work now and docker can be a big help for that. Getting assorted machine learning frameworks and development environments working together can be difficult, time consuming and, will occasionally fail from incompatible dependencies. There is a trade-off to using docker. You trade the complexity of dealing with traditional build, install, configure and optimize, for the challenge of the learning curve for docker. In the long run I believe it is worth the effort to learn to use docker -- even on your desktop workstation or laptop.

The place where docker has been very useful for me is with system testing I do here at Puget Systems. A good example of that is the testing I recently did on the new NVIDIA Titan V, "NVIDIA Titan V vs Titan Xp Preliminary Machine Learning and Simulation Tests". For that work I did an Ubuntu 16.04 install with NVIDIA GPU support, installed and configured Docker, installed the new version 2 of NVIDIA-Docker. With that foundation I was able to use NVIDIA NGC (NVIDIA GPU Cloud) docker registry to pull nicely setup and tuned GPU accelerated versions of several machine learning frameworks and do the testing. Using that docker, NVIDIA-docker plus NGC setup probably saved me from several weeks of build/install and configuration work. I was impressed with the quality of the docker container builds in NGC enough that I added that as a strong secondary recommendation in that Titan V testing post.

"Secondary Recommendation: The docker images available in the NVIDIA NGC repository are very good. This is another example of NVIDIA's excellent support of the ecosystem around GPU accelerated computing. Highly recommended! I will be writing about how to setup and utilize NVIDIA docker and this repository soon."

As promised to you in that quote above, and to myself in my New Years resolutions I'm here to get started on that.

This first post in this series is introductory. It presents the motivation some references and the fist step of establishing a base platform. Later posts will get to the details of the what is suggested by the main title of this post.


Introduction

Now even though I said "and try that on Windows" this series of posts will be using Linux. The reason is that I want to fulfill the second promise that I would do a new series of posts on NVIDIA docker v2 and show how to get started with the NGC docker registry. That stuff does not work on Windows even though docker itself does. (There is now both a native Windows docker supporting Windows containers and a supported Linux docker on top of HyperV).

So, Linux it is. In particular (at this point in time) I recommend the use of, Ubuntu 16.04LTS.

I should note that I have done a full docker + NVIDIA-docker + NGC repository on the alpha-1 release of the upcoming Ubuntu 18.04LTS. Perhaps surprisingly, I didn't have any trouble with that but it is completely unsupported at this time. When Ubuntu 18.04LTS is officially released at the end of March I will do a quick refresh on the setup. (Using docker for the difficult software setups will make updating very easy. That's another good argument for using docker.)

If you are not familiar with docker or you just need some arguments on why you might want to use it, please see the post I did in early 2017, "Docker and NVIDIA-docker on your workstation: Motivation". If you look at that post you will see there is a whole series of posts after that on setup and configuration of docker and NVIDIA-docker. Those posts are good and "mostly" valid. However, things have changed in the docker world and NVIDIA-docker is now at version 2. What I do in this current series of posts will deprecate most of those older post and I will add a header to them stating that. They will however still be good general references and I will use them as such.


Base Ubuntu 16.04 system install and setup

If you have an existing install that you want to use for the docker setup that we'll be doing you can skip the rest of this section. You might want to look through it anyway and note it as a reference. Besides that, I'd be honored if you read it!

I timed this method when I tested for this post.

  • It took under 6min from power on to reboot for the server install including time for manual disk partitioning.
  • The script for the desktop setup and other configuration took 12min including the download and install time for all 1400 packages for the MATE desktop.

For a detailed set of install instructions please see my post The Best Way To Install Ubuntu 16.04 with NVIDIA Drivers and CUDA. That is especially useful to understand why I use a "server" install as a base for a "desktop" setup. It will also, help you understand some of the potential problems that can come up during an install.

Install the "server" base image

Install Ubuntu from the server base. This is my recommended method for doing installs. A simple server install will usually work on nearly any hardware configuration. Even on pre-release testing hardware (or old hardware!). It's easy to add any desktop configuration you want on top of a server install. This gives you more control of the install and helps to avoid some of the potential difficulties you could encounter otherwise. For example in our case we need to get the proprietary NVIDA graphics driver installed. That can sometimes be difficult.

You can install from a live desktop image. That will often work OK, but, when it doesn't, it can be very frustrating. I like to keep things simple and start from a small server install.

Starting from a simple server install I had an Ubuntu 18.04 pre alpha 1 running Tensorflow test jobs on new hardware with a full docker and NVIDIA-docker v2 configuration in about an hour ... and none of that is supported yet! ("don't try this at home!") ]

Server install steps

  • I'm assuming a "bare-metal" install. If you are installing over an existing setup or onto an extra boot disk please make backups.
  • Get Ubuntu Server 16.04.3 LTS and create your install media (i.e. USB). I would recommend sticking with LTS releases. If you are experimenting and have some system administration experience then you might want to try a pre-release of 18.04 LTS, but don't count on that being trouble free or stable. I did it and surprisingly everything worked ... but pretend I didn't say that...
  • Install with UEFI. I don't recommend legacy MBR installs. There is some new hardware that wont even work unless you use UEFI. Turn off "secure-boot" in the BIOS otherwise you will have trouble when you go to install the NVIDIA driver.
  • Install with the HWE kernel. The "Hardware Enhanced Kernel" is a stable and security patched 4.13 LTS kernel. You would want that for newer hardware and I recommend it over the default 4.4 in any case. (note: at install it will be a 4.10 kernel but will update to 4.13.)
  • Do the install. If you need any guidance or want to see my detailed recommendations on things like partitioning see The Best Way To Install Ubuntu 16.04 with NVIDIA Drivers and CUDA.

After you have a simple server the install finished you can boot into the OS. But ... you could run into a problem. Unfortunately, Ubuntu server starts up with a console framebuffer instead of a simple VGA console ... that can be a problem with an NVIDIA GPU in your system without the proprietary driver installed.

You might need "nomodeset" until you get GPU drivers installed!

If on first boot after install you get a corrupted display then booting with a nomodeset parameter will probably take care of that. Here's how to add "nomodeset" at boot time,

When you get to the Grub selection screen type e That will let you edit the kernel boot parameter line (for this boot only). You need to find the line that starts "Linux" go to the end of that line and add nomodeset Then press F10 to boot. That should get you to a login screen. We will have everything installed including the NVIDIA driver before the next boot so this should be the only time you have to do this and there should be no reason to have to add nomodeset permanently to the boot parameters.

nomodeset end


Install your Desktop Environment and the NVIDIA Drivers before next boot

I'll present the rest of the OS setup as a shell script. You can edit and run this script or just run the commands it contains by hand.

Before the I give you the script I'll make a few comments;

  • Ubuntu server included a really nice utility called tasksel it's part of what makes starting from the server install an attractive first step to install a desktop. It makes a large number of configurations a one command job and can be easily included in a script. Here's an image of some of the options,
    tasksel

  • I have the NVIDIA display driver installing from the well maintained ppa:graphics-drivers/ppa At this writing there is a new beta driver in there, nvidia-390, I recommend staying with nvidia-387 for now. The 387 driver is also the one that is installed from the latest CUDA 9.1 repo. Since we are going to be using docker for things like the CUDA dev environment you wont necessarily need to install CUDA. If you do want to have "local" CUDA install you can follow the instructions near the end of the post The Best Way To Install Ubuntu 16.04 with NVIDIA Drivers and CUDA after you execute the script below.

  • The EXTRAS variable is a place to to add any extras you might want to be sure get installed. I recommend that you add at least build-essential, and dkms to be sure everything needed for building dynamic kernel modules is loaded.

  • At the end of the script I have a couple of sed lines to enable NetworkManager for the interface used during the install (server doesn't use that by default) I you want that then set the variable ENABLE_NETWORKMANAGER to 1. [ Note: it looks like they changed network configuration in 18.04 ... I see something called /etc/netplan ... For now if you are using 18.04 set ENABLE_NETWORKMANAGER=0 ]

    #!/bin/bash
    #
    # Do a Desktop GUI and NVIDIA driver setup on top of Ubuntu server
    #
    ##############
    # VARIABLES: #
    ##############
    # DESKTOP -- Ubuntu desktop to install
    # ubuntu-desktop
    # kubuntu-desktop
    # ubuntu-gnome-desktop
    # ubuntu-mate-desktop
    # xubuntu-desktop
    # ... run tasksel without an argument to see other options
    DESKTOP='ubuntu-mate-desktop'
    
    # NVDRIVER -- NVIDIA driver version from ppa:graphics-drivers/ppa
    NVDRIVER='nvidia-375'
    
    # EXTRAS -- Extra packages (your taste may vary)
    EXTRAS="build-essential dkms synaptic emacs ssh gdebi"
    
    # Set to 1 to enable NetworkManager for install interface
    # !! This will not work for Ubuntu 18.04 !! #
    ENABLE_NETWORKMANAGER=1
    
    ###########
    # DO IT:  #
    ###########
    
    # Do system Updates
    apt-get update
    apt-get -y dist-upgrade
    
    # Desktop environment install
    tasksel install $DESKTOP
    
    # Extra programs
    apt-get install -y $EXTRAS
    
    # NVIDIA driver
    add-apt-repository -y ppa:graphics-drivers/ppa
    apt-get update
    apt-get install -y $NVDRIVER
    
    # Move manual network config that happened from server install to NetworkManager control
    if [ $ENABLE_NETWORKMANAGER -eq 1 ]
    then
      # This sed line comments out the primary nic interface
      sed -i '/The primary network interface/,/^$/ s/^/#/' /etc/network/interfaces
      # This line enables NetworkManager for everything
      sed -i 's/managed=false/managed=true/' /etc/NetworkManager/NetworkManager.conf
    fi
    
    #end of script -- return 0
    exit 0
    

    I'm going to stop here. What? What about all that good docker and NVIDIA stuff?! I want to keep things simple and present manageable chunks. In the next posts I'll do docker and user name spaces, NVIDIA docker version 2, getting started with NVIDIA NGC and their docker registry. After those posts I'll present ideas for workflow and do examples with some of the machine learning frameworks and applications in the NGC registry. I'll also write about using other applications with a docker workflow. That "Adopt a Docker based workflow" New Year's resolution was serious. I really want to try that and I'll keep you posted on how it goes!

    Happy computing! --dbk

Tags: Docker, NVIDIA, Linux, NGC