Dr Donald Kinghorn (HPC and Scientific Computing)

NVIDIA CUDA install on CentOS 6.6 [SOLVED]

Written on November 4, 2014 by Dr Donald Kinghorn
Share:

If you have done a fresh install of CentOS 6.6 or “updated” to it from a 6.5 install and you are setting up NVIDIA CUDA 6.5 you may be having trouble with a failed build of the nvidia-uvm kernel module. Read on for a fix …

I was setting up a machine for GPU compute using NVIDIA CUDA a few days ago and hit a snag. I did a CentOS 6.5 install, ran updates and noticed that it had updated to release 6.6 (which had just gotten to the repo mirrors). I checked the NVIDA CUDA download pages and saw there wasn’t any specific CUDA release for RHEL / CentOS 6.6 I decided to go ahead and try the setup using the cuda repo package anyway.

The CUDA yum repo makes it really easy to setup CUDA and get the NVIDIA drivers working. Just grab the repo rpm

wget http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-6.5-14.x86_64.rpm

install that with yum and then do

yum install cuda

easy!

Everything looked fine during the install so I compiled the code from the samples directory and fired up the nbody job as a test and I see the following,

FATAL: Module nvidia_uvm not found.
Error: only 0 Devices available, 1 requested.  Exiting.

Trying to load the module gives,

[root@tower cuda]# modprobe nvidia-uvm
FATAL: Module nvidia_uvm not found.

Then starts the long process of trying to track down a log file to see what happened … finally found this,

/var/lib/dkms/nvidia-uvm/340.29/build/make.log

[root@tower build]# cat make.log
DKMS make.log for nvidia-uvm-340.29 for kernel 2.6.32-504.el6.x86_64 (x86_64)
Mon Nov  3 16:50:28 PST 2014
Makefile:213: /var/lib/dkms/nvidia/340.29/build/nvidia-modules-common.mk: No such file or directory
make: *** No rule to make target `/var/lib/dkms/nvidia/340.29/build/nvidia-modules-common.mk'.  Stop.

OK, so it looks like there is a problem in the “nvida” directory not the “nvidia-uvm” directory.

Looking at the directories in the “nvidia-uvm” directory I see,

[kinghorn@tower 340.29]$ ls /var/lib/dkms/nvidia-uvm/340.29/
build  source

and in the “nvidia” directory we have,

[kinghorn@tower 340.29]$ ls /var/lib/dkms/nvidia/340.29/
2.6.32-504.el6.x86_64  source

THERE IS NO “build” DIRECTORY! However the “source” directory does contain the nvidia-modules-common.mk file, so …

HERE’S THE FIX:

create a symbolic link from source to build,

[root@tower 340.29]# ln -s source build

[root@tower 340.29]# ls -l
total 4
drwxr-xr-x 3 root root 4096 Nov  3 16:46 2.6.32-504.el6.x86_64
lrwxrwxrwx 1 root root    6 Nov  3 17:31 build -> source
lrwxrwxrwx 1 root root   22 Nov  3 16:46 source -> /usr/src/nvidia-340.29

****
reboot
****

Now dkms triggers correctly, the modules build and we have CUDA joy!

Happy computing! --dbk

Tags: CentOS-6.6, CUDA
Miguel

Kudos to you.. Thanks :)

Posted on 2014-12-02 12:22:33
Dan Hogan

Thanks for the write up, this additional step will build the nvidia-uvm without the reboot:

dkms autoinstall -k $(uname -r)

Posted on 2014-12-09 17:33:30

I am getting kdump failed issue, can nay one help

Posted on 2015-02-12 10:02:48
TexasDex

Fixed my issue. Thanks!

Posted on 2015-02-27 16:12:44
Hadayat Seddiqi

This worked for me (I'm running RHEL 6.6, it broke my CUDA 6.5 installation but now, after several reinstalls, I'm using CUDA 7). But I wonder, why did this happen? My CUDA programs just stopped working one day. Should we file a bugreport with RHEL/CentOS, or contact Nvidia themselves?

Posted on 2015-03-26 17:22:38
Michael Danziger

Thanks for the great fix. Given how simple it is, it's frustrating that the package maintainers (NVIDIA) haven't taken care of it upstream.

Posted on 2015-05-12 12:14:17