Table of Contents
If you are one of the lucky folks who have managed to get one (or several) of the new NVIDIA RTX3090 or RTX3080 GPUs and are wanting to do some ML/AI work with TensorFlow 1.15 you may have run into some trouble! The new GPUs need the latest NVIDIA driver and you will need/want a build of TensorFlow that is linked against the new CUDA 11.1 and cuDNN 8.0 libraries (or newer versions). If you look at the official Google build you will find it is linked to CUDA 10 and cuDNN 7. What about the Anaconda build? Nope, it's an old build.
So, what are you going to do? You could do a CUDA development setup and try to build TensorFlow yourself. Doesn't sound like fun? You could do the "best practices" solution and install docker or other container runtime and use the NVIDIA NGC docker image. That's my highest recommended solution but it may not be what you really want to do for many reasons. Especially if you are not familiar with docker and container usage, or really just want a good local install!
In this post I will show you how to install NVIDIA's build of TensorFlow 1.15 into an Anaconda Python conda environment. This is the same TensorFlow 1.15 that you would have in the NGC docker container, but no docker install required and no local system CUDA install needed either. Let's do it.
This setup will be with Ubuntu 20.04 and a recent Anadonda3 or Miniconda3 Python.
UPDATE! Quick Setup — Wed 15 Sep 2021
Do this:wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.shbash Miniconda3-latest-Linux-x86_64.shconda update condaconda update --allconda create --name TF1.15 python=3.8conda activate TF1.15pip install nvidia-pyindexpip install nvidia-tensorflow[horovod]conda install -c conda-forge openmpiexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/miniconda3/envs/TF1.15/lib/mkdir tf-testcd tf-testwget https://github.com/dbkinghorn/NGC-TF1-nvidia-examples/archive/main/NGC-TF1-nvidia-examples.tar.gztar xf NGC-TF1-nvidia-examples.tar.gzcd NGC-TF1-nvidia-examples-main/cnn/python resnet.py --layers=50 --batch_size=64 --precision=fp32You now have a tested and working TF 1.15 This is the latest build of what they are using on NGC nv21.08If you have already installed Anaconda Python instead of miniconda3 then just start at conda update conda and conda update --all (That is important so that you are using a new enough version of pip to resolve the dependencies properly)
Update your NVIDIA Driver
The following should work to update your driver. You could use the distro repo or a manual update too, just be sure you have things up-to-date.
sudo apt-get dist-upgrade sudo shutdown -r now sudo apt-get install dkms build-essential sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt-get install nvidia-driver-455 sudo shutdown -r now
Run "nvidia-smi" to confirm your update and check that it is on the 11.1 (or newer) CUDA runtime.
Installing NVIDIA’s build of TensorFlow 1.15 in a conda env
NVIDIA maintains a lot of great software and configuration setup material on GitHub. You should check it out if you haven't been there.
That includes their source builds of TensorFlow.
I'm including a copy of NVIDIA's license notice from the link above;
By using the software you agree to fully comply with the terms and
conditions of the SLA (Software License Agreement):
If you do not agree to the terms and conditions of the SLA, do not install or use the software.
Step 1) Setup a conda env
We will create a conda env named "tf1-nv" and initialize it with Python version 3.6 to support the build. You will want a newer release of pip in this environment too. (> 19.0)
conda create --name tf1-nv python=3.6 conda activate tf1-nv conda install pip
Step 2) Create a local index for the “wheel” and supporting dependencies
Pip will be used for the install but this NVIDIA package index is not available on PyPI.org. The index will be added locally in
The following command sets up the index, (note, we are still in the activated tf1-nv conda env from step 1)
pip install --user nvidia-pyindex
Here, is the output on my system so you can see what happens when you run that.
(tf1-nv) kinghorn@PSML1:~$ pip install --user nvidia-pyindex Collecting nvidia-pyindex Downloading nvidia-pyindex-1.0.5.tar.gz (6.1 kB) Building wheels for collected packages: nvidia-pyindex Building wheel for nvidia-pyindex (setup.py) ... done Created wheel for nvidia-pyindex: filename=nvidia_pyindex-1.0.5-py3-none-any.whl size=4169 sha256=aa680b8c1d986bd867af1af2bda1ca46ab0e11b1470e4e7086a223cdb7b98d97 Stored in directory: /home/kinghorn/.cache/pip/wheels/93/56/f1/2609d85af643eac34c360dd01b95feb483afd8f856f2fc9953 Successfully built nvidia-pyindex Installing collected packages: nvidia-pyindex WARNING: The script nvidia_pyindex is installed in '/home/kinghorn/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. Successfully installed nvidia-pyindex-1.0.5
There is a note about "… .local/bin" not being on PATH. On Ubuntu 20.04 the directory '$HOME/.local/bin' will automatically be on your PATH if it exists. For other distributions you may need to add the following to your .bashrc file so that it is on PATH when you start up a shell.
You can look in that directory to see what has been included from pip install of nvidia-pyindex.
Step 3) Setup MPI dependencies for Horovod multi-GPU
Horovod is used for multi-GPU support in this build and you will need an MPI config available for that. There are OpenMPI components installed with the nvidia-pyindex packages but I had difficulties getting that working correctly. You could also have some conflicts if you have a local MPI install on your system. The simplest thing to do to resolve issues will be to add the OpenMPI package to the conda env.
(note, we are still in the activated tf1-nv conda env from step 1)
conda install -c conda-forge openmpi
That adds the needed OpenMPI components to your tf1-nv env. You will need to add these to your LD_LIBRARY_PATH.
Note: I have "anaconda3" in that path, you should substitute "miniconda3" if that is what you are using. You can add that export line to your .bashrc file if you want that to be set automatically when you start a new shell.
Step 4) Install the NVIDIA TensorFlow Build (along with Horovod)
The following command will "pip" install the NVIDIA TensorFlow 1.15 build using the nvidia-pyindex files installed in step 2).
pip install --user nvidia-tensorflow[horovod]
That's it! You now have a the same highly optimized TensorFlow 1.15 build that NVIDIA uses in in their NGC TensorFlow-1 docker container.
Note: There are two separate groups of files installed; the conda tf1-nv env ($HOME/anaconda3/envs/tf1-nv) and the nvidia-pyindex files installed in $HOME/.local/bin and $HOME/.local/lib.
Test your install with a ResNet-50 benchmark
I placed part of the nvidia-examples directory in a GitHub repo for convenience. You can download a copy with,
Then, un-tar the directory, cd to cnn and run a ResNet50 benchmark with synthetic data,
tar xf NGC-TF1-nvidia-examples.tar.gz cd NGC-TF1-nvidia-examples-main/cnn/
Activate the tf1-nv conda env if you had deactivated it.
conda activate tf1-nv
and run the benchmark
(for single NVIDIA GPU)
python resnet.py --layers=50 --batch_size=64 --precision=fp32
(for multi GPU -np 2 for 2 GPUs ) …be sure you have the library path set for MPI. See step 3)
mpiexec --bind-to socket -np 2 python resnet.py --layers=50 -batch_size=128 --precision=fp32
Happy computing! –dbk @dbkinghorn
Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.
Why Choose Puget Systems?
Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.
We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!
By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry-leading ship time.
Puget Systems Hardware Partners