Puget Systems print logo
https://www.pugetsystems.com
Read this article at https://www.pugetsystems.com/guides/2005
Dr Donald Kinghorn (Scientific Computing Advisor)

How To Install TensorFlow 1.15 for NVIDIA RTX30 GPUs (without docker or CUDA install)

Written on December 9, 2020 by Dr Donald Kinghorn
Share:

Introduction

If you are one of the lucky folks who have managed to get one (or several) of the new NVIDIA RTX3090 or RTX3080 GPUs and are wanting to do some ML/AI work with TensorFlow 1.15 you may have run into some trouble! The new GPUs need the latest NVIDIA driver and you will need/want a build of TensorFlow that is linked against the new CUDA 11.1 and cuDNN 8.0 libraries (or newer versions). If you look at the official Google build you will find it is linked to CUDA 10 and cuDNN 7. What about the Anaconda build? Nope, it's an old build.

So, what are you going to do? You could do a CUDA development setup and try to build TensorFlow yourself. Doesn't sound like fun? You could do the "best practices" solution and install docker or other container runtime and use the NVIDIA NGC docker image. That's my highest recommended solution but it may not be what you really want to do for many reasons. Especially if you are not familiar with docker and container usage, or really just want a good local install!

In this post I will show you how to install NVIDIA's build of TensorFlow 1.15 into an Anaconda Python conda environment. This is the same TensorFlow 1.15 that you would have in the NGC docker container, but no docker install required and no local system CUDA install needed either. Let's do it.

This setup will be with Ubuntu 20.04 and a recent Anadonda3 or Miniconda3 Python.

UPDATE! Quick Setup -- Wed 15 Sep 2021

Update your NVIDIA Driver

The following should work to update your driver. You could use the distro repo or a manual update too, just be sure you have things up-to-date.

sudo apt-get dist-upgrade
sudo shutdown -r now
sudo apt-get install dkms build-essential
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get install nvidia-driver-455
sudo shutdown -r now

Run "nvidia-smi" to confirm your update and check that it is on the 11.1 (or newer) CUDA runtime.

Installing NVIDIA's build of TensorFlow 1.15 in a conda env

NVIDIA maintains a lot of great software and configuration setup material on GitHub. You should check it out if you haven't been there.

That includes their source builds of TensorFlow.

I'm including a copy of NVIDIA's license notice from the link above;


License information

By using the software you agree to fully comply with the terms and

conditions of the SLA (Software License Agreement):

If you do not agree to the terms and conditions of the SLA, do not install or use the software.


Step 1) Setup a conda env

We will create a conda env named "tf1-nv" and initialize it with Python version 3.6 to support the build. You will want a newer release of pip in this environment too. (> 19.0)

conda create --name tf1-nv  python=3.6

conda activate tf1-nv

conda install pip

Step 2) Create a local index for the "wheel" and supporting dependencies

Pip will be used for the install but this NVIDIA package index is not available on PyPI.org. The index will be added locally in

$HOME/.local/bin

The following command sets up the index, (note, we are still in the activated tf1-nv conda env from step 1)

pip install --user nvidia-pyindex

Here, is the output on my system so you can see what happens when you run that.

(tf1-nv) kinghorn@PSML1:~$ pip install --user nvidia-pyindex
Collecting nvidia-pyindex
  Downloading nvidia-pyindex-1.0.5.tar.gz (6.1 kB)
Building wheels for collected packages: nvidia-pyindex
  Building wheel for nvidia-pyindex (setup.py) ... done
  Created wheel for nvidia-pyindex: filename=nvidia_pyindex-1.0.5-py3-none-any.whl size=4169 sha256=aa680b8c1d986bd867af1af2bda1ca46ab0e11b1470e4e7086a223cdb7b98d97
  Stored in directory: /home/kinghorn/.cache/pip/wheels/93/56/f1/2609d85af643eac34c360dd01b95feb483afd8f856f2fc9953
Successfully built nvidia-pyindex
Installing collected packages: nvidia-pyindex
  WARNING: The script nvidia_pyindex is installed in '/home/kinghorn/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed nvidia-pyindex-1.0.5

There is a note about "... .local/bin" not being on PATH. On Ubuntu 20.04 the directory '$HOME/.local/bin' will automatically be on your PATH if it exists. For other distributions you may need to add the following to your .bashrc file so that it is on PATH when you start up a shell.

export PATH=$PATH:$HOME/.local/bin

You can look in that directory to see what has been included from pip install of nvidia-pyindex.

Step 3) Setup MPI dependencies for Horovod multi-GPU

Horovod is used for multi-GPU support in this build and you will need an MPI config available for that. There are OpenMPI components installed with the nvidia-pyindex packages but I had difficulties getting that working correctly. You could also have some conflicts if you have a local MPI install on your system. The simplest thing to do to resolve issues will be to add the OpenMPI package to the conda env.

(note, we are still in the activated tf1-nv conda env from step 1)

conda install -c conda-forge openmpi

That adds the needed OpenMPI components to your tf1-nv env. You will need to add these to your LD_LIBRARY_PATH.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/anaconda3/envs/tf1-nv/lib/

Note: I have "anaconda3" in that path, you should substitute "miniconda3" if that is what you are using. You can add that export line to your .bashrc file if you want that to be set automatically when you start a new shell.

Step 4) Install the NVIDIA TensorFlow Build (along with Horovod)

The following command will "pip" install the NVIDIA TensorFlow 1.15 build using the nvidia-pyindex files installed in step 2).

pip install --user nvidia-tensorflow[horovod]

That's it! You now have a the same highly optimized TensorFlow 1.15 build that NVIDIA uses in in their NGC TensorFlow-1 docker container.

Note: There are two separate groups of files installed; the conda tf1-nv env ($HOME/anaconda3/envs/tf1-nv) and the nvidia-pyindex files installed in $HOME/.local/bin and $HOME/.local/lib.

Test your install with a ResNet-50 benchmark

I placed part of the nvidia-examples directory in a GitHub repo for convenience. You can download a copy with,

wget https://github.com/dbkinghorn/NGC-TF1-nvidia-examples/archive/main/NGC-TF1-nvidia-examples.tar.gz

Then, un-tar the directory, cd to cnn and run a ResNet50 benchmark with synthetic data,

tar xf NGC-TF1-nvidia-examples.tar.gz

cd NGC-TF1-nvidia-examples-main/cnn/

Activate the tf1-nv conda env if you had deactivated it.

conda activate tf1-nv

and run the benchmark

(for single NVIDIA GPU)

python resnet.py --layers=50 --batch_size=64 --precision=fp32

(for multi GPU -np 2 for 2 GPUs ) ...be sure you have the library path set for MPI. See step 3)

mpiexec --bind-to socket -np 2 python resnet.py --layers=50 -batch_size=128 --precision=fp32

Enjoy!

Happy computing! --dbk @dbkinghorn


Computer System

Looking for a GPU Accelerated Workstation?

Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.

Configure a System!

Labs technician talking with customer

Labs Consultation Service

Our Labs team is available to provide in-depth hardware recommendations based on your workflow.

Find Out More!

Why Choose Puget Systems?


Built specifically for you

Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.

Fast Build Times

By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry leading ship time.

We're Here, Give Us a Call!

We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!

Lifetime Support/Labor Warranty

Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.

Click here for even more reasons!

Puget Systems Hardware Partners

Intel Logo
AMD Logo
NVIDIA Logo
Samsung Logo
Tags: oneAPI, Intel, Programming, Scientific Computing
Comments