NVIDIA CUDA install on CentOS 6.6 [SOLVED]

If you have done a fresh install of CentOS 6.6 or “updated” to it from a 6.5 install and you are setting up NVIDIA CUDA 6.5 you may be having trouble with a failed build of the nvidia-uvm kernel module. Read on for a fix …

I was setting up a machine for GPU compute using NVIDIA CUDA a few days ago and hit a snag. I did a CentOS 6.5 install, ran updates and noticed that it had updated to release 6.6 (which had just gotten to the repo mirrors). I checked the NVIDA CUDA download pages and saw there wasn’t any specific CUDA release for RHEL / CentOS 6.6 I decided to go ahead and try the setup using the cuda repo package anyway.

The CUDA yum repo makes it really easy to setup CUDA and get the NVIDIA drivers working. Just grab the repo rpm

wget http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-6.5-14.x86_64.rpm

install that with yum and then do

yum install cuda

easy!

Everything looked fine during the install so I compiled the code from the samples directory and fired up the nbody job as a test and I see the following,

FATAL: Module nvidia_uvm not found.
Error: only 0 Devices available, 1 requested.  Exiting.

Trying to load the module gives,

[root@tower cuda]# modprobe nvidia-uvm
FATAL: Module nvidia_uvm not found.

Then starts the long process of trying to track down a log file to see what happened … finally found this,

/var/lib/dkms/nvidia-uvm/340.29/build/make.log

[root@tower build]# cat make.log
DKMS make.log for nvidia-uvm-340.29 for kernel 2.6.32-504.el6.x86_64 (x86_64)
Mon Nov  3 16:50:28 PST 2014
Makefile:213: /var/lib/dkms/nvidia/340.29/build/nvidia-modules-common.mk: No such file or directory
make: *** No rule to make target `/var/lib/dkms/nvidia/340.29/build/nvidia-modules-common.mk'.  Stop.

OK, so it looks like there is a problem in the “nvida” directory not the “nvidia-uvm” directory.

Looking at the directories in the “nvidia-uvm” directory I see,

[kinghorn@tower 340.29]$ ls /var/lib/dkms/nvidia-uvm/340.29/
build  source

and in the “nvidia” directory we have,

[kinghorn@tower 340.29]$ ls /var/lib/dkms/nvidia/340.29/
2.6.32-504.el6.x86_64  source

THERE IS NO “build” DIRECTORY! However the “source” directory does contain the nvidia-modules-common.mk file, so …

HERE’S THE FIX:

create a symbolic link from source to build,

[root@tower 340.29]# ln -s source build

[root@tower 340.29]# ls -l
total 4
drwxr-xr-x 3 root root 4096 Nov  3 16:46 2.6.32-504.el6.x86_64
lrwxrwxrwx 1 root root    6 Nov  3 17:31 build -> source
lrwxrwxrwx 1 root root   22 Nov  3 16:46 source -> /usr/src/nvidia-340.29

****
reboot
****

Now dkms triggers correctly, the modules build and we have CUDA joy!


Happy computing! –dbk

Tags: ,