Dr Donald Kinghorn (HPC and Scientific Computing)

Install Ubuntu 16.04 or 14.04 and CUDA 8 and 7.5 for NVIDIA Pascal GPU

Written on August 29, 2016 by Dr Donald Kinghorn
Share:

In this post I will walk you through setting up a CUDA dev environment on Ubuntu 16.04 (or 14.04). We will install both CUDA 8.0 and 7.5 and go through all to the tricks you need to get a working setup. There are enough tricks that hopefully this page will have a long life because some of them seem to be timeless, unfortunately! This install method will work with the latest NVIDIA Pascal GPU's. It will also work just fine with Ubuntu 16.04 and 14.04.

This is more up-to-date and tested than what I had described a few months back about getting CUDA working on Ubuntu 16.04. This procedure will work with the NVIDA Pascal cards and will also work with Ubuntu 14.04.

Note: The following CUDA setup is reasonable for a development environment. For a production environment you might want to rethink a few of the things I've done!

The focus in this post is on the software install and I will try to do the install in a way that it is mostly hardware independent. The system I'm using as I write this is,

Peak Tower Single
MB: ASUS X99-E WS
CPU: Intel Core-i7 6950X 8-core @ 3.2GHz (3.5GHz All-Core-Turbo)
Memory: 64 GB DDR4 2133MHz Reg ECC
PCIe: (4) X16-X16 v3
GPU: (4) NVIDIA Titan X Pascal

Install Ubuntu 16.04 server

Start with a fresh server install. It's quick and will usually install without trouble even on bleeding-edge hardware.

You may want/need to add some boot time kernel options while doing the install. The systemd config in Ubuntu 16.04 is doing ugly things with network interfaces on some multi-nic boards and the motherboard I'm using doesn't like power management on the PCIe bus.

When the grub screen comes up to boot into the install you can hit "e" to edit the boot line. Look for the line that starts "Linux" and ends "quiet ---". You can remove the "quiet ---" part, (so you can see all the sys messages during boot), and then add any kernel options that you want. For this motherboard I add,

net.ifnames=0 biosdevname=0 pcie_aspm=off
that leaves the network interfaces names as eth0, eth1 etc. and leaves the PCIe bus in "performance" power state.

The above boot options are not necessarily needed. You need to know the quirks of your hardware and how it interacts with your install.

Do the base install following the prompts with choices you want.

Reboot

If you added any kernel options during the install boot you will want to interrupt the boot process by hitting "e" and edit the Linux line again. Note: it will look different this time since this is your actual install and not the installer boot. We can get the rest of the install complete before we need to do another reboot.

Updates, desktop environment, extra packages and grub update

Pretty much everything from here on will need to be done as root so add sudo to the beginning of the commands or just sudo -s to get a root shell.

Do updates.

apt-get update
apt-get dist-upgrade

Install your desktop environment

The easiest way to take care of adding a desktop GUI is to run the "tasksel" command. If you run it without any arguments it will give you a very nice menu with many options for different desktop setups and other good stuff. It's a handy tool.

You can run tasksel as follows to add the default Ubuntu desktop without looking at the menu.

tasksel install ubuntu-desktop

Add extra programs

You now have your base desktop install so you might want to add a few extras at this point. I usually add the following.

apt-get install build-essential emacs dkms synaptic ssh

Grub update

If you are using any kernel options on startup you should probably take care of that now before the next reboot. Use your editor of choice and do something like the following.

edit the file /etc/default/grub 
the server install will have an empty option line like this,

GRUB_CMDLINE_LINUX_DEFAULT=""

For the motherboard I'm using I would change this to,

GRUB_CMDLINE_LINUX_DEFAULT="net.ifnames=0 biosdevname=0 pcie_aspm=off"

Note that a "normal" desktop install would also have "quiet splash" 
in this line to hide all of the boot messages during startup 
... sometimes I like to see them!

After editing that file update grub with (surprise)

update-grub

Install the NVIDIA display driver

I've been using the well maintained "graphics-drivers" ppa for adding the NVIDIA display drivers. These have been up-to-date and well packaged. Using this will give you a convenient update path for new drivers. So far I haven't had any trouble with new drivers rebuilding against kernel source using dkms.

add-apt-repository ppa:graphics-drivers/ppa
apt-get update
apt-get install nvidia-367

CUDA install, setup and fixes

Dependencies

We are doing a manual CUDA toolkit install since we want both version 7.5 and 8.0rc (and since the packaged .deb and .rpm files are basically broken right now!)

I did a "dry-run" CUDA install from the deb files to pull a list of system packages that would get installed as dependencies (outside of the CUDA repo). You may or may not need these, but it is what the old (working) deb install from the CUDA repo would have pulled in. I put them in a file called cuda-deps... sorry for the long scroll line but I didn't want any line breaks in there in case you want to copy that to a file.

cat cuda-deps 
ca-certificates-java default-jre default-jre-headless fonts-dejavu-extra freeglut3 freeglut3-dev java-common libatk-wrapper-java libatk-wrapper-java-jni  libdrm-dev libgl1-mesa-dev libglu1-mesa-dev libgnomevfs2-0 libgnomevfs2-common libice-dev libpthread-stubs0-dev libsctp1 libsm-dev libx11-dev libx11-doc libx11-xcb-dev libxau-dev libxcb-dri2-0-dev libxcb-dri3-dev libxcb-glx0-dev libxcb-present-dev libxcb-randr0-dev libxcb-render0-dev libxcb-shape0-dev libxcb-sync-dev libxcb-xfixes0-dev libxcb1-dev libxdamage-dev libxdmcp-dev libxext-dev libxfixes-dev libxi-dev libxmu-dev libxmu-headers libxshmfence-dev libxt-dev libxxf86vm-dev lksctp-tools mesa-common-dev  x11proto-core-dev x11proto-damage-dev  x11proto-dri2-dev x11proto-fixes-dev x11proto-gl-dev x11proto-input-dev x11proto-kb-dev x11proto-xext-dev x11proto-xf86vidmode-dev xorg-sgml-doctools xtrans-dev libgles2-mesa-dev

If you put those package names in a file called cuda-deps you can do the following to install them easily,

cat cuda-deps | xargs sudo apt-get -y install

CUDA toolkit installs

Download the ".run" install files from NVIDIA (you will need to be registered as a developer to get the 8.0rc version)

You want to run these install scripts and NOT install the bundled display drivers!

You can run the scripts and answer the prompts or you can do,

./cuda_7.5.18_linux.run --help
to see the script options. Then, if you trust me, you can do the following,

chmod 755 cuda_*
./cuda_7.5.18_linux.run --silent --toolkit --samples --samplespath=/usr/local/cuda-7.5/samples --override
./cuda_8.0.27_linux.run --silent --toolkit --samples --samplespath=/usr/local/cuda-8.0/samples --override

That will give you both CUDA toolkit versions with the sample code directories where they belong.

There will be a symbolic link from /usr/local/cuda-8.0 to /usr/local/cuda. You can change this link to the 7.5 version like this,

sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-7.5 /usr/local/cuda

I like doing the version switching this way because then I can set the system up to expect the toolkit at /usr/local/cuda regardless of which version is actually linked there.

System CUDA environment

I like to have have base development system tools like CUDA on the default bin and lib path so I create the following files,

/etc/profile.d/cuda.sh

export PATH=$PATH:/usr/local/cuda/bin
export CUDADIR=/usr/local/cuda
export GLPATH=/usr/lib

and for libs,

/etc/ld.so.conf.d/cuda.conf

/usr/local/cuda/lib64

Run "ldconfig" after adding that last file.

Fix "broken" stuff

The default gcc compiler version for Ubuntu 16.04 is now 5.4 and the CUDA configurations were not tested against that so they will error out when you try to build any code. The easiest thing to do is just comment out the error line in the appropriate header file. You should be aware of what compiler version you are using when you build code and realize that NVIDIA may not have tested everything against that! This is a "development" setup not a "production" setup!

The file you want to edit is "host_config.h" in the toolkit "include" directory. All you really need to do (as a hack) is to add // at the beginning of the error line to comment it out.

Here's a couple of little sed lines to do that for you.

sed -i '/unsupported GNU version/ s/^/\/\//' /usr/local/cuda-7.5/include/host_config.h
sed -i '/unsupported GNU version/ s/^/\/\//' /usr/local/cuda-8.0/include/host_config.h

The other thing that is broken is that many of the sample source files have hard wired display driver versions. If you want to build the samples for testing then you will want to fix this.

The following "find" and "sed" lines will fix this for you.

find /usr/local/cuda-7.5/samples -type f -exec sed -i 's/nvidia-3../nvidia-367/g' {} +
find /usr/local/cuda-8.0/samples -type f -exec sed -i 's/nvidia-3../nvidia-367/g' {} +

REBOOT

That's it! You have been doing all of this from the base server install command console, now reboot to your desktop environment and your CUDA setup should be ready to go!

Happy computting! --dbk

Tags: NVIDIA, CUDA, Pascal GPU, Linux
MichaelSB

Great post! Do you recommend going with Ubuntu 16.04, or staying with 14.04 (using Pascal GPUs for deep learning)?

Posted on 2016-09-22 03:35:12
David Selinger

Since the (awesome) author hasn't replied: My response. 14.04 unless you're a 16.04 expert, or a (theano, keras, tensorflow, opencv) expert and want to increase your stackoverflow reputation.

I've used both and the reason I'm reading this page right now is because I'm uninstallilng all of my 16.04 boxes and going back to 14.04. There are a whole series of unique issues you can run into, and when it comes to finding support on the broader web (stackoverflow, etc.), 14.04 is massively better supported.
The release of CUDA 8.0 will level the playing field a little bit in that the level of support will be pretty even in another 6 months, but I found at least 5-10 issues a month where I was having to break new ground on 16.04 (docker configuration, specific theano issues, compiler with OpenCV + CUDA) that I am personally giving up.

14.04 FTW.

Posted on 2016-09-30 00:33:15
Andrew Wilkie

Awesome advice. Thank you!

Posted on 2016-10-21 23:58:20
David Selinger

Haha and add even one more thing I learned today: opencv is having problems supporting cuda8.0. The 3.1 release(the latest) does not support it at all, and master currently has other compile issues so we all are a little stuck...

Posted on 2016-10-22 01:10:25
David Selinger

Thanks--this was super helpful. Crisp, clean and easy to follow.

And it worked.
:)

FYI my config is dual titan x's on an asus custom build.

Posted on 2016-09-30 00:34:44
Chris Anderson

You can overwrite a symbolic link using ln -sf. That way you don't have to remove it first.

Posted on 2016-10-21 12:56:51
David Selinger

Thought I'd add one more comment WRT using OpenCV on this machine (which I'm sure some people will want to do):
OpenCV3.1.0 does *NOT* work with CUDA8.0 out of the box (https://github.com/opencv/open... ), so if you want to make it work, scroll down to the bottom of the github issue link and then follow those instructions.

Then, when you're running the cmake command for opencv (for Python), ensure to set -D CUDA_GENERATION="" -D CUDA_ARCH_PTX=6.1 (Assuming you're using 6.1 architecture). If you set CUDA_GENERATION to anything else, you'll end up with an empty "build/lib/pythonx/" directory (i.e., no OpenCV).

Posted on 2016-10-24 23:00:31
Donald Kinghorn

Hi Guys, thanks for the kind words :-) I wasn't getting notices about comments so I didn't see the questions until now.

David's advise on using 14.04 is good .... and thanks for adding those comments, I'm sure everyone (including me) appreciates that feedback!

14.04 is pretty stable and well understood so life should be easier using it. 16.04 did make a lot of changes and systemd is troublesome. I'm getting most of the quirks worked out now and getting used to it and will take a shot at doing a good base configuration with it and see how it goes. I'll write all of that up and I plan to start using screencasts to see if I can get more things posted. --Don

Posted on 2016-11-04 17:16:33
Donald Kinghorn

A note on NetworkManager:

One thing about doing an install from "server" is that you usually configure a network interface during the install and that will be be an "unmanaged" config. After you get the desktop installed NetworkManager will be there and it will automatically grab any interfaces that were not already configured. That might give you some grief!

If you want to leave the network configured during install as it is then you might want to change "auto" to "allow-hotplug" in /etc/network/interfaces i.e.

# The primary network interface
allow-hotplug eth0
iface eth0 inet dhcp

That will make systemd happy and keep it from waiting for the interface to start up if you are not plugged into it.

You can also just give the interface to NetworkManager. For that you can comment out the interfaces lines and give "ifupdown" to NetworkManager i.e.

in /etc/network/interfaces

# The primary network interface
#allow-hotplug eth0
#iface eth0 inet dhcp

and in /etc/NetworkManager/NetworkManager.conf

[ifupdown]
managed=true

Take care --Don

Posted on 2016-11-04 17:34:59
SC

After the rebooting step, I get stuck in a loop on the login page for Ubuntu. Any idea where the issue might be? It doesn't throw any errors at any point in the process and I've been following this step by step on Ubuntu 14.04 with a Titan X Pascal GPU.

Posted on 2016-11-07 21:45:24