Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1405
Dr Donald Kinghorn (Scientific Computing Advisor )

How To Install CUDA 10.1 on Ubuntu 19.04

Written on April 5, 2019 by Dr Donald Kinghorn
Share:

Introduction

Ubuntu 19.04 has entered beta as I write this and will be released in a few weeks. I decided to install it and give it a try. My initial impression is very positive. Subjectively, it feels like it has been optimized for performance. It is the first Linux distribution release using the new 5.0 kernel. Everything is up-to-date. There is a lot to like.

Even though this a xx.04 release, it is not an LTS (long term support) release. It is a short term release that will be supported for 1 year. The next LTS release will be 20.04 two years after the current LTS, Ubuntu 18.04. For a stable "production" install I still strongly recommend using Ubuntu 18.04.

I consider Ubuntu 19.04 an experimental release and that is exactly what I am doing with it, experimenting. I wanted to see if I could get some currently unsupported packages running. So far I have installed CUDA 10.1, docker 18.09.4 and NVIDIA-docker 2.03 and run TensorFlow 2 alpha with GPU support. They are all working fine. In this post I'll just go over how to get CUDA 10.1 running on Ubuntu 19.04. Fortunately it was straight forward do get it working.

dbk

"Teaser" info output from this Ubuntu 19.04 install

kinghorn@u19:~$ lsb_release -a

Distributor ID:	Ubuntu
Description:	Ubuntu Disco Dingo (development branch)
Release:	19.04
Codename:	disco
kinghorn@u19:~$ uname -a
Linux u19 5.0.0-7-generic #8-Ubuntu SMP Mon Mar 4 16:27:25 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
kinghorn@u19:~$ gcc --version
gcc (Ubuntu 8.3.0-3ubuntu1) 8.3.0
kinghorn@u19:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
kinghorn@u19:~$ docker run --runtime=nvidia -u $(id -u):$(id -g) --rm -it tensorflow/tensorflow:2.0.0a0-gpu-py3 bash

________                               _______________                
___  __/__________________________________  ____/__  /________      __
__  /  _  _ _  __ _  ___/  __ _  ___/_  /_   __  /_  __ _ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ / 
/_/    ___//_/ /_//____/ ____//_/    /_/      /_/  ____/____/|__/


You are running this container as user with ID 1000 and group 1000,
which should map to the ID and group for your user on the Docker host. Great!

tf-docker / > python -c "import tensorflow as tf; print(tf.__version__)"
2.0.0-alpha0

Steps to install CUDA 10.1 on Ubuntu 19.04


Step 1) Get Ubuntu 19.04 installed!

The first thing I tried to for installing Ubuntu 19.04 was to use the "Desktop" ISO installer. That failed! It hung during the install and I couldn't get it to work (I didn't try very hard to make it work since I have an easier method.) Out of fairness this was the "nightly" ISO build from March, 26th 2019, a few days before the beta release. By the time you read this the "beta" will be out (or the full release if you are reading this after mid April), hopefully it will install from the "Desktop/Live" ISO without trouble.

I used my fallback "standard" method for installing Ubuntu. I use the server installer and the wonderful Ubuntu tool `tasksel` to install a desktop. I installed my favorite MATE desktop. You can read how to do this in the following post,

The Best Way To Install Ubuntu 18.04 with NVIDIA Drivers and any Desktop Flavor. That almost always works and those instruction for 18.04 are still valid for 19.04. But, if you follow the guide linked above please see the next step about the display driver.

Step 2) Get the NVIDIA driver installed

You will need to have the NVIDIA display driver 410 or greater installed to work with CUDA 10.1. Otherwise you will get the dreaded "Status: CUDA driver version is insufficient for CUDA runtime version". I recommend using the most recent driver. The simplest way to install the driver is from the "graphics-drivers ppa".

sudo add-apt-repository ppa:graphics-drivers/ppa

Install dependencies for the system to build the kernel modules,

sudo apt-get install dkms build-essential

Then install the driver, (418 was the most recent at the time of this writing. If you do the command below and hit tab after typing nvidia-driver- you should see a list of all the available driver versions in the ppa.)

sudo apt-get update
sudo apt-get install nvidia-driver-418

After the driver install go ahead and reboot.

sudo shutdown -r Now

Step 3) Install CUDA "dependencies"

There are a few dependencies that get installed when you run the full CUDA deb file but, since we are not going to use the deb file, you will want to install them separately. It's simple since we can get what's needed with just four package installs,

sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev

Those packages will get the needed GL, GLU, Xi, Xmu libs and several other libraries that will be installed as dependencies from those.

Step 4) Get the CUDA "run" file installer (Use the Ubuntu 18.10 installer)

Go to the CUDA Zone and click the Download Now button. Then click the link buttons until you get the following,

CUDA download

Download that.

Step 5) Run the "runfile" to install the CUDA toolkit and samples

This is where we get the CUDA developer toolkit and samples onto the system. We will not install the included display driver since the latest driver was installed in step 2). You can use `sh` to run the shell script (".run" file),

sudo sh cuda_10.1.105_418.39_linux.run

This is new installer and is much slower to start-up than the older scripts (in case you have done this before).

You will be asked to accept the EULA, of course, after which you will be presented with a "selector". Un-check the "Driver" block and then select "Install" and hit "Enter".

┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer                                                               │
│ - [ ] Driver                                                                 │
│      [ ] 418.39                                                              │
│ + [X] CUDA Toolkit 10.1                                                      │
│   [X] CUDA Samples 10.1                                                      │
│   [X] CUDA Demo Suite 10.1                                                   │
│   [X] CUDA Documentation 10.1                                                │
│   Install                                                                    │
│   Options                                                                    │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

This will do the "right thing". It will,

  • install the CUDA toolkit in /usr/local/cuda-10.1
  • create a symbolic link to /usr/local/cuda
  • install the samples in /usr/local/cuda/samples and in your home directory under NVIDIA_CUDA-10.1_Samples
  • add the appropriate library path
cat /etc/ld.so.conf.d/cuda-10-1.conf 
/usr/local/cuda-10.1/targets/x86_64-linux/lib

It does not setup your PATH for the toolkit. That's the next section.

Step 6) Setup your environment variables

There are two good ways to setup your environment variables so you can use CUDA.

  • Setup system environment
  • Setup user environment

In the past I would typically do system-wide environment configuration. You can do this even for a single user workstation but you might prefer to create a small script that sets things up just for the terminal you are working in when you need it.

System-wide alternative

To configure the CUDA environment for all users (and applications) on your system create the file (use sudo and a text editor of your choice)

/etc/profile.d/cuda.sh

with the following content,

export PATH=$PATH:/usr/local/cuda/bin
export CUDADIR=/usr/local/cuda

Environment scripts that are in /etc/profile.d/ get read by your local .bashrc file when you start a terminal or login. It's automatic.

The next time you login your shells will start with CUDA on your path and be ready to use. If you want to load that environment in a shell right now without logging out then just do,

source /etc/profile.d/cuda.sh

Note on LIBRARY PATH:

The cuda-toolkit install did add a .conf file to /etc/ld.so.conf.d but what it added is not idea and seems to not always work right. If you are doing a system-wide environment configuration I suggest doing the following;

Move the installed conf file out of the way,

sudo mv /etc/ld.conf.d/cuda-10-1.conf  /etc/ld.conf.d/cuda-10-1.conf-orig

Then create, (using sudo and your editor of choice), the file

/etc/ld.so.conf.d/cuda.conf

containing,

/usr/local/cuda/lib64

Then run

sudo ldconfig

This cuda.conf file in /etc/ld.so.conf.d/ will be pointing at the symbolic link to cuda-xx in /usr/local so it will still be correct even if you change the cuda version that the link is pointing to. (This is my "normal" way of setting up system-wide environments for CUDA.)

User per terminal alternative

If you want to be able to activate your CUDA environment only when and where you need it then this is a way to do that. You might prefer this method over a system-wide environment since it will keep your PATH cleaner and allow you easy management of multiple CUDA versions. If you decide to use the ideas in this post to install another CUDA version, say 9.2, along with your 10.1 this will make it easier to switch back and forth.

For a localized user CUDA environment create the following simple script. You don't need to use sudo for this and you can keep the script anywhere in your home directory. You will just need to "source" it when you want a CUDA dev environment.

I'll create the file with the name `cuda10.1-env`. Add the following lines to this file,

export PATH=$PATH:/usr/local/cuda-10.1/bin
export CUDADIR=/usr/local/cuda-10.1
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64

Note: I explicitly used the full named path to version 10.1 i.e `/usr/local/cuda-10.1` rather than to the symbolic link `/usr/local/cuda`. You can use the symbolic link path if you want. I just did this in case I want to install another version of CUDA and make another environment script pointing to the different version.

Now when you want your CUDA dev environment just do `source cuda10.1-env`. That will set those environment variables in your current shell. (you could copy that file to your working directory or else give the full path to it when you use the `source` command.)

Step 7) Test CUDA by building the "samples"

Let's make sure everything is working correctly. You can use the copy of the samples that the installer put in you home directory under `NVIDIA_CUDA-10.1_Samples` or copy the samples from `/usr/local/cuda/samples`.

cd  ~/NVIDIA_CUDA-10.1_Samples

source cuda-10.1-env

make -j4

Running that make command will compile and link all of the source examples as specified in the Makefile. ( the -j4 just means run 4 "jobs" make can build objects in parallel so you can speed up the build time by using more processes. )

After everything finishes building you can `cd` to `bin/x86_64/linux/release/` and see all of the sample executables. All of the samples seem to have built without error even though this is an unsupported Ubuntu version. I ran several of the programs and they were working as expected including the ones that were using OpenGL graphics.

Just because the samples built OK doesn't mean that there are not any problems with the install but it is a really good indication that you can proceed with confidence for your development work!

Extras not discussed ... docker, nvidia-docker, TensorFlow

I have only talked about setting up CUDA 10.1 on Ubuntu 19.04. I also installed the latest docker and nvidia-docker. This was done using repo setups based on "bionic" i.e. Ubuntu 18.04. Those deb packages installed fine on 19.04. My basic procedure for installing and setting up docker is presented in a series of 5 posts from the beginning of 2018 (still relevant), How-To Setup NVIDIA Docker and NGC Registry on your Workstation - Part 5 Docker Performance and Resource Tuning. That post has links to the first 4.

After setting up docker and nvidia-docker I ran TensorFlow 2.0 alpha from google's docker image on DockerHub. I could also, have attempted to build TensorFlow 2 alpha against the CUDA 10.1 install here but I'm not that brave. It would be best to stick with docker or an Ubuntu 18.04 setup for that.

[I did try installing TensorFlow from the pip package but ended up with a segmentation fault on a system library. I don't recommend trying this.]

Recommendation

Like I said at the beginning of the post, this is an experimental setup. Ubuntu 19.04 looks like it will be a good Linux platform and it has all the latest packages which will be tempting for you (since you are reading this). My serious recommendation is do it if you want to experiment with a bleeding edge dev environment, otherwise stick with Ubuntu 18.04. Your stable "production" platform should be Ubuntu 18.04. 18.04 will be supported for several more years which means it will be an attractive default Linux platform for software builds. It should remain stable and well supported.

I will be doing more posts on setting up Machine Learning/ AI/ Data Science/ HPC /etc. configurations. That includes setup's for Windows 10 and Ubuntu 18.04. I probably wont do anything else about Ubuntu 19.04 unless I get talked into it, lol. It does look like a good release to me. Congratulations to Canonical and the Ubuntu team!

Happy computing! --dbk


Looking for a
Scientific Compute System?

Do you have a project that needs serious compute power, and you don't know where to turn? Puget Systems offers a range of HPC workstations and servers tailored for both CPU and GPU workloads.

Why Choose Puget Systems?


Built specifically for you

Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.

Fast Build Times

By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry leading ship time of 7-10 business days on nearly all our system orders.

We're Here, Give Us a Call!

We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!

Lifetime Support/Labor Warranty

Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.

Click here for even more reasons!

Tags: Cuda, Ubuntu 19.04, Tensorflow 2, Docker
Ankkit Modi

https://launchpad.net/ubunt... have you tried installing it with this?

Posted on 2019-04-23 11:21:14
Donald Kinghorn

I have not ... I generally don't like to use the "normal" package management for stuff (special dev tools) like that because then it is controlled by the maintainer (and not me) It might be OK but it will probably install stuff all over your system. I like to have tools like cuda in /usr/local or /opt That way I can control multiple versions and clean things up easily.

Honestly, These days I prefer to use docker and just keep my local system install clean. Ubuntu 19.04 looks really good to me. I may go ahead and update my primary workstation from 18.04 and then make more use of docker. ... I'm about ready for my Spring cleaning/refresh anyway :-)

Posted on 2019-04-24 00:05:36
Donald Kinghorn

If I do this I may do another post with more details about the setup ... now that 19.04 is officially released. I will probably wait for a few weeks though to see if the docker and NVIDIA folks add official support ... they might?? ... or they may just wait until the next LTS release in 2020

Posted on 2019-04-24 00:08:24
Ankkit Modi

also how did you install docker and go on 19.04?

Posted on 2019-04-23 11:22:15
Donald Kinghorn

I setup the docker install by adding the 18.04 repo and then just used apt-get install docker-ce I was a bit surprised that it worked without trouble. My old docker setup post that I linked to near the end of the post are still valid.

Posted on 2019-04-24 00:05:36
Ankkit Modi

Also (sorry for the spam) I got all done perfectly until step 6. When I do "gedit /etc/profile.d/cuda.sh" it says no such file or directory. but cat /etc..........10-1.conf gives the right output /usr..........linux/lib

Posted on 2019-04-23 11:36:05
Donald Kinghorn

you have to be "root" to do that. When I have system file edits/creation to do I usually just start a root shell with sudo -s and then work from that. I'm getting to where I like to just source a local evn script like I gave as an alternative.

Posted on 2019-04-24 00:05:37
Robin

Love the guide - thanks!

I've been working really hard to get it to install on 19.04, CUDA is all good, but nvidia-docker2 won't install.
I've tried pinning the version for it and Nvidia-container, but get the same error as below.
Any tips?

The following packages have unmet dependencies:

nvidia-docker2 : Depends: docker-ce (= 5:18.09.5~3-0~ubuntu-bionic) but 5:18.09.5~3-0~ubuntu-cosmic is to be installed or

docker-ee (= 5:18.09.5~3-0~ubuntu-bionic) but it is not installable

E: Unable to correct problems, you have held broken packages.

Posted on 2019-04-25 09:31:27
Donald Kinghorn

AAhhh! I hate that error :-) Unfortunately I know it well or at least variations of it. Nvidia-docker is tied to the version of docker-ce They regularly get out of sync when the official docker gets updated. It seems to take the nvidia guys a couple of days to get their repo back in sync with them (I really like the nvidia guys working of this they have done a great job! ) I'm not sure where things diverge but I hit this every now and then especially if I'm doing something out of the norm like installing on 19.04 :-)

You will need to be explicit with the version numbers you use for docker i.e. apt-get install docker-ce=5:18.09.5~3-0~ubuntu-bionic

The problem is I don't remember exactly what to do ... I usually have to fight with it a bit ...
look at the output from "madison" for some version hints that you have available

apt-cache madison nvidia-docker2 docker-ce

In my case I have as the most recent
nvidia-docker2 | 2.0.3+docker18.09.5-3 and docker-ce | 5:18.09.5~3-0~ubuntu-bionic

For my repo config I have the bionic repo in /etc/apt/sources.list
deb [arch=amd64] https://download.docker.com... bionic stable

and nvidia-docker.list as
kinghorn@i9:~$ cat /etc/apt/sources.list.d/nvidia-docker.list
deb https://nvidia.github.io/li... /
deb https://nvidia.github.io/nv... /
deb https://nvidia.github.io/nv... /

Check to see what you have and maybe do some clean up with remove or purge to get rid of what you have already installed and then do a specific version install of each
sudo apt-get install docker-ce=5:18.09.5~3-0~ubuntu-bionic
sudo apt-get install nvidia-docker2=2.0.3+docker18.09.5-3

I think that will work! But like I said I usually have to fight with it a bit when they are out of sync

Posted on 2019-04-25 15:43:55
Donald Kinghorn

you might have to install docker-ce-cli and containerd.io too with the most recent version of docker-ce they may come as dependencies when you do docker-ce but you may have to install them ... not sure... if you have to install them containerd.io should be OK but you might have to match version during the install for docker-ce-cli i.e. docker-ce-cli=5:18.09.5~3-0~ubuntu-bionic

Posted on 2019-04-25 15:50:19
Robin

Thanks so much for your help! not had a huge amount of time to fiddle, but I'm sure I'll get there.
Again, thanks so much for taking the time- I know you don't have to!

Posted on 2019-05-05 23:37:20
Donald Kinghorn

Hey Robin, check back in a couple of days. I'm putting up a post on the docker/nvidia-docker setup

I really enjoy what I'm doing so ... no problem :-)

Posted on 2019-05-06 21:09:27
Kevin Zeidler

I was hesitant to use the 18.10 installer on my fresh new 19.04 desktop, but sure enough, your instructions worked beautifully. Thank you for your post!

I did have a very minor hiccup after installation when a few of the samples failed to execute (e.g., 'error while loading shared libraries: libcurand.so.10: cannot open shared object file: No such file or directory'), but the fix is quite easy. May I suggest adding "export LD_LIBRARY_PATH=/usr/local/cuda/lib64 && sudo ldconfig" to the snippet excerpted below?

The next time you login your shells will start with CUDA on your path and be ready to use. If you want to load that environment in a shell right now without logging out then just do,

source /etc/profile.d/cuda.sh
Posted on 2019-05-01 02:06:25
Donald Kinghorn

Glad it's working for you! I'm thinking about updating my main workstation to 19.04 too. ( I might do that today! Thinking about writing up a guide on installing docker too ... )

I'm surprised you needed to export the LIB path at that point in the setup (I probably just used that local "env" script that I showed in the section after that)
I don't have a 19.04 running right now to check, but I thought the install had put a library file in /etc/ld.so.conf.d/ that had that library directory set. Since you have this setup could you look in the directory and see if there is a cuda .conf file in there that has /usr/local/cuda/lib64 in it ... I may have missed checking, or maybe ldconfig didn't get run ... not sure.

If the .conf file is missing then it can be added as cuda.conf with /usr/local/cuda/lib64 in it and then a quick sudo ldconfig. That way it would be set for all shells.

Let me know what you have in /etc/ld.so.conf.d thanks

Posted on 2019-05-01 19:16:14
Kevin Zeidler

Sure thing:


$ cat /etc/ld.so/conf.d/cuda-10-1.conf
/usr/local/cuda-10.1/targets/x86_64-linux/lib

I have a hunch since this is a *.conf.d folder that I could have just rebooted the machine instead (is there a startup daemon associated with this directory perchance?) but I can at least verify that a reboot isn't strictly required as long as LD_LIBRARY_PATH is set. (I ran ldconfig afterward, but that may or may not be required)

Posted on 2019-05-02 22:41:45
Donald Kinghorn

Thanks for getting back with that ... I thought they may have put something like that in there. Yes, the stuff in ld.conf.d should get read read into the loader path on startup (or when ldconfig is run by root).

What I would normally do for a "global" setup is create /etc/ld.conf.d/cuda.conf with

/usr/local/cuda/lib64

in it. Note that I use the same thing you added to your LD_LIBRARY_PATH That way it is pointing to the symbolic link so it doesn't mater what cuda version you have linked there. I think this is better.

I'll add this into the post. Thanks again for checking this! --Don

Posted on 2019-05-03 15:33:09
Donald Kinghorn

FYI I'm going to write up a post on setting up docker and nvidia-docker on Ubuntu 19.04 ( should be up on https://www.pugetsystems.co... in a few days )

I did update my personal workstation to 19.04 yesterday because, well, I just could't resist ... I know, I know, my bad! :-) I wanted to clean up my install so I'll try 19.04 for a while. So far so good, I like it! ...

I have everything working the way I want. Getting NVIDIA-Docker right was a bit tricky but not too bad. It's stable and working the way it should. I'll write it up!

Posted on 2019-05-03 15:46:15
Kevin Zeidler

Looking forward to it!

Posted on 2019-05-03 19:17:16
tinco

Just a warning, I ran into segfaults trying this. I'm not sure if I corrupted anything else, but when I switched back to a supported image (on Google Compute, with Tesla V100's) the segfaults went away.

Posted on 2019-05-05 19:16:07
Donald Kinghorn

seg faults are disturbing ( and often annoying to track down) ... Thanks for the heads-up!

I hope nothing broke with an update since I wrote this! Compiler update maybe? ... I'll update on my test system and if anything breaks. It's always risky going to an unsupported distro version. Especially since 18.04 has been really solid ...

I sometimes can't resist trying the "latest greatest" I went ahead and updated my main workstation to 19.04 (can revert back quickly if needed)

These days I mostly use docker. NVIDIA has some nice container images on NGC, https://ngc.nvidia.com/cata... they can be a big time saver. They have a bunch of CUDA version containers maintained on public DockerHub too https://hub.docker.com/r/nv... I'm going to have a post up today or tomorrow with details of the docker/ nvidia-docker setup on 19.04. Containers are nice since you can run local or cloud with the same image.

It looks like Docker will have official support for Ubuntu 19.04 soon (They have the stub up for "disco") I used the "bionic" repos for the setup. nvidia-docker likely wont be officially updated for Ubuntu until 20.04 but they do maintain the build against the current docker release numbers (of course). My setup is really just getting apt to install the packages I want.

It's a little strange using containers sometimes but it's such a huge admin time saver that it's usually worth it. I'll put a nice "single user workstation" setup in the post.

Posted on 2019-05-06 16:14:52
tinco

Thanks for the response. I'm not an Ubuntu expert, so I might have messed something up myself. My first approach was to simply install the cuda for 18.10 on 19.04. That seemed to work until I got a segfault, so then I googled and found your post, and applied all your steps, that also seemed to work but then segfaulted again anyway. I'm not 100% sure I undid everything I did in the first step, so maybe it was still my old work that was corrupting the work in your tutorial. Anyway, after that I gave up and just provisioned a new machine with Debian stretch.

I've used docker-nvidia for a machine learning project before, it worked great so I'm definitely on board with you on that.

BTW, will you be posting a blog post on Metashape performance? I saw you posted a benchmark tool for it which we're definitely going to try out. Right now I've been benchmarking with our live data which takes 20 hours to render which is a huge pain..

Thanks for the excellent blog, I just noticed this was the same blog that posted the benchmarks :D

Posted on 2019-05-06 16:34:15
Julio argumedo

Hello sir:

First of all, thank you for your kind responses and posts, its really a pleasure to read you.

I know this is kind off-topic, I recently installed Disco Dingo (better support for fractional scaling makes multi monitor less stressful) and after installing driver 418, I installed Anaconda and ran conda install tensorflow-gpu, which came without any errors. After that, i ran nvidia-smi and numba -s to check if everything was installed and correctly recognized. Everything seems ok until I run tf.Session(), which freezes the whole computer requiring a hard restart. Any ideas? I tried this iun both python 3.6 and 3.7

Posted on 2019-05-06 20:51:22
Donald Kinghorn

Hi Julio, that sounds like a serious problem! I want to use Anaconda too but I haven't set and tried it up yet ... I will do this right now ... OK, I just tested by running a CNN training on MNIST with tensorflow-gpu. It worked fine on 2 systems (I used an example jupyter notebook that I had in a post a couple of weeks ago about setting up TF on Win10 ... It's interesting how similar Linux and Win10 are becoming :-)

One thing that I can't confirm though is if having the CUDA install in your PATH can cause trouble. On my personal system with 19.04 I don't have CUDA installed (I run it from nvidia-docker). The other system I tested on was the one I used for testing the setup in the docker post I just did, so, it didn't have CUDA either!

... anaconda tensorflow-gpu installs it's own copy of the the cuda 10.0 toolkit in it's env ... the CUDA install in this post is 10.1 and that would likely crash that tensorflow-gpu if it showed up on your loader library path when it started from the tf.Session()

Here's something to try;
if you did the CUDA install and setup the system wide environment then try disabling that. You could move /etc/cuda.sh to cuda.sh-norun and /etc/ld.so.config.d/cuda.conf to cuda.conf-norun (move that cuda-10-1.conf too) Then to be sure the library path is clear restart your system.
Then try running tensorflow again from your conda environment. If that works then I suggest that you use the "per shell" CUDA environment setup I have in this post so that cuda is only on your paths when you need it.

That's the only thing I think of that would be related directly to the cuda setup in this post. If it's not that then there could be some other issue. ???

Posted on 2019-05-08 02:05:52
Bob Saget

I spent over a week trying to get cuda to work on Ubuntu last time I worked on a project. Your guide helped me get it working the first time with zero problems. You are a life saver!

Posted on 2019-05-07 00:17:22
Donald Kinghorn

I did the post on getting docker and nvidia-docker working
https://www.pugetsystems.co...

Posted on 2019-05-08 00:04:38
Soumen Pramanik

Hello, I am getting the below error while running cuda on ubuntu 19.04 by using RTX2080Ti
Installation failed. See log at /var/log/cuda-installer.log for details.

Posted on 2019-05-19 10:20:44
Donald Kinghorn

are there any hints in that log file? You should check to be sure that the display driver install went OK. You should see your card when you run nvidia-smi If you don't had the driver loaded then the cuda install will fail.

Also, the driver has been updated recently. If your driver install from the ppa went OK then your system should update to 430 with the normal Ubuntu updates. If your driver setup didn't go right then check that you got the ppa setup right and then just install the more recent driver i.e. sudo apt-get install nvidia-driver-430

Posted on 2019-05-20 15:40:27
Soumen Pramanik

ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to u
[INFO]: pgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.

Posted on 2019-05-22 08:18:20
Soumen Pramanik

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 On | N/A |
| 35% 24C P8 3W / 260W | 331MiB / 10986MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1315 G /usr/lib/xorg/Xorg 24MiB |
| 0 1351 G /usr/bin/gnome-shell 58MiB |
| 0 2203 G /usr/lib/xorg/Xorg 105MiB |
| 0 2354 G /usr/bin/gnome-shell 141MiB |
+-----------------------------------------------------------------------------+

Posted on 2019-05-22 08:19:01
Donald Kinghorn

That is interesting, ... it looks like you forgot to exclude the driver install when you did the cuda install part

│ CUDA Installer
│ - [ ] Driver
│ [ ] 418.39
│ + [X] CUDA Toolkit 10.1
│ [X] CUDA Samples 10.1
│ [X] CUDA Demo Suite 10.1
│ [X] CUDA Documentation 10.1
│ Install
│ Options

If you had [X] next to Driver and 418.39 it would fail.

I recommend that you go ahead and update your driver and reboot. Then try the CUDA install part again. You only want to install the CUDA Toolkit, Samples, Demo and Documentation. The CUDA runime is in the driver ... it would be good to move the driver to the 430 release

sudo apt-get update
sudo apt-get install nvidia-driver-430
sudo shutdown -r

... I hope this gets everything working for you! --Don

Posted on 2019-05-22 16:10:54
Soumen Pramanik

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.14 Driver Version: 430.14 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 On | N/A |
| 35% 25C P8 3W / 260W | 301MiB / 11016MiB | 1% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1647 G /usr/lib/xorg/Xorg 24MiB |
| 0 1688 G /usr/bin/gnome-shell 58MiB |
| 0 2040 G /usr/lib/xorg/Xorg 101MiB |
| 0 2191 G /usr/bin/gnome-shell 115MiB |
+-----------------------------------------------------------------------------+

I removed the 418.56 driver and installed 430 as you mentioned without any success. I have got same error.

after running the below command

sudo sh Downloads/cuda_10.1.168_418.67_linux.run

┌──────────────────────────────────────────────────────────────────────────────┐

│ CUDA Installer │

│ - [X] Driver │

│ [X] 418.67 │

│ + [X] CUDA Toolkit 10.1 │

│ [X] CUDA Samples 10.1 │

│ [X] CUDA Demo Suite 10.1 │

│ [X] CUDA Documentation 10.1 │

│ Options │

│ Install │

│ │

│ │

│ │

│ │

│ │

│ │

│ │

│ │

│ │

│ │

│ │

│ │

│ │

│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced

Posted on 2019-05-22 17:17:17
Soumen Pramanik

Installation failed. See log at /var/log/cuda-installer.log for details.

[INFO]: ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to u
[INFO]: pgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.

Posted on 2019-05-22 17:19:00
Donald Kinghorn

I'm at a loss ... I don't know what is going on and why you are seeing that message.

I don't know what all you have tried but maybe that message is not that important?? If the files are there in /usr/local/cuda then I would try coping the "samples" directory to your home directory and doing a "make" to see if things build OK ...

Otherwise, I afraid I'm out of ideas.

Posted on 2019-05-23 21:49:37
Omar

Mr Kinghorn, thanks a lot. Ubuntu 19.04 was the only one that made things work (didn't try 18.10 though) in a recently purchased gaming laptop that I'm using to get into AI, and I was new to CUDA. The instructions in this post of your worked perfect for me. Thanks a lot for that.

Best,

Omar

Posted on 2019-05-08 23:07:52
Matthew Byrd

It looks like there is an error at setting up the PATH. Instead of "sudo mv /etc/ld.conf.d/cuda-10-1.conf /etc/ld.conf.d.cuda-10-1.conf-orig" it should be "sudo mv "/etc/ld.so.conf.d/cuda-10-1.conf /etc/ld.so.conf.d.cuda-10-1.conf-orig" At least that's what worked for me, it could be that the drivers have been updated since this post.

Posted on 2019-06-02 18:15:08
Donald Kinghorn

Yes, thank you! Had a . in there should have been a / I fixed it in the post --Don

Posted on 2019-06-03 14:38:39
Ivan Tishaiev

Hi Donald,

Thank you very much for the guide.
Could you help me with the next problem (in step 5)?

sudo sh cuda_10.1.168_418.67_linux.run
cuda_10.1.168_418.67_linux.run: 2: cuda_10.1.168_418.67_linux.run: Syntax error: newline unexpected

Thank you.

Posted on 2019-06-04 12:29:37
Donald Kinghorn

Hi Ivan, It looks like there is an error in line 2 of that file??. You may have gotten a corrupted download ??? not sure. I would try downloading the file again ...
... let me check something ... OK, I just downloaded a copy of that run file. I started it and it did start up OK ... It's a shell archive file and the first few lines of what I got look like,

#!/bin/sh
# This script was generated using Makeself 2.1.4

CRCsum="104861082"
MD5="00000000000000000000000000000000"
TMPROOT=${TMPDIR:=/tmp}

label="NVIDIA CUDA PACKAGE"
script="./cuda-installer"
scriptargs=""
targetdir="pkg"
filesizes="2526887892"
keep=n

You can use "head" or "less" to check the first few lines of the file you got. It is a very large file so it is possible that it was corrupted somewhere during your download.

You could try downloading it with wget

wget https://developer.nvidia.co...

Hope this helps --Don

Posted on 2019-06-05 16:52:16
Ivan Tishaiev

Hi Donald,
You are absolutely right, the file was corrupted.
I re-downloaded it and finish install successfully.
Thank you very much for your help!

Posted on 2019-06-10 20:25:09