Puget Systems print logo
https://www.pugetsystems.com
Read this article at https://www.pugetsystems.com/guides/2005
Dr Donald Kinghorn (Scientific Computing Advisor )

How To Install TensorFlow 1.15 for NVIDIA RTX30 GPUs (without docker or CUDA install)

Written on December 9, 2020 by Dr Donald Kinghorn
Share:

Introduction

If you are one of the lucky folks who have managed to get one (or several) of the new NVIDIA RTX3090 or RTX3080 GPUs and are wanting to do some ML/AI work with TensorFlow 1.15 you may have run into some trouble! The new GPUs need the latest NVIDIA driver and you will need/want a build of TensorFlow that is linked against the new CUDA 11.1 and cuDNN 8.0 libraries (or newer versions). If you look at the official Google build you will find it is linked to CUDA 10 and cuDNN 7. What about the Anaconda build? Nope, it's an old build.

So, what are you going to do? You could do a CUDA development setup and try to build TensorFlow yourself. Doesn't sound like fun? You could do the "best practices" solution and install docker or other container runtime and use the NVIDIA NGC docker image. That's my highest recommended solution but it may not be what you really want to do for many reasons. Especially if you are not familiar with docker and container usage, or really just want a good local install!

In this post I will show you how to install NVIDIA's build of TensorFlow 1.15 into an Anaconda Python conda environment. This is the same TensorFlow 1.15 that you would have in the NGC docker container, but no docker install required and no local system CUDA install needed either. Let's do it.

This setup will be with Ubuntu 20.04 and a recent Anadonda3 or Miniconda3 Python.

Update your NVIDIA Driver

The following should work to update your driver. You could use the distro repo or a manual update too, just be sure you have things up-to-date.

sudo apt-get dist-upgrade
sudo shutdown -r now
sudo apt-get install dkms build-essential
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get install nvidia-driver-455
sudo shutdown -r now

Run "nvidia-smi" to confirm your update and check that it is on the 11.1 (or newer) CUDA runtime.

Installing NVIDIA's build of TensorFlow 1.15 in a conda env

NVIDIA maintains a lot of great software and configuration setup material on GitHub. You should check it out if you haven't been there.

That includes their source builds of TensorFlow.

I'm including a copy of NVIDIA's license notice from the link above;


License information

By using the software you agree to fully comply with the terms and

conditions of the SLA (Software License Agreement):

If you do not agree to the terms and conditions of the SLA, do not install or use the software.


Step 1) Setup a conda env

We will create a conda env named "tf1-nv" and initialize it with Python version 3.6 to support the build. You will want a newer release of pip in this environment too. (> 19.0)

conda create --name tf1-nv  python=3.6

conda activate tf1-nv

conda install pip

Step 2) Create a local index for the "wheel" and supporting dependencies

Pip will be used for the install but this NVIDIA package index is not available on PyPI.org. The index will be added locally in

$HOME/.local/bin

The following command sets up the index, (note, we are still in the activated tf1-nv conda env from step 1)

pip install --user nvidia-pyindex

Here, is the output on my system so you can see what happens when you run that.

(tf1-nv) kinghorn@PSML1:~$ pip install --user nvidia-pyindex
Collecting nvidia-pyindex
  Downloading nvidia-pyindex-1.0.5.tar.gz (6.1 kB)
Building wheels for collected packages: nvidia-pyindex
  Building wheel for nvidia-pyindex (setup.py) ... done
  Created wheel for nvidia-pyindex: filename=nvidia_pyindex-1.0.5-py3-none-any.whl size=4169 sha256=aa680b8c1d986bd867af1af2bda1ca46ab0e11b1470e4e7086a223cdb7b98d97
  Stored in directory: /home/kinghorn/.cache/pip/wheels/93/56/f1/2609d85af643eac34c360dd01b95feb483afd8f856f2fc9953
Successfully built nvidia-pyindex
Installing collected packages: nvidia-pyindex
  WARNING: The script nvidia_pyindex is installed in '/home/kinghorn/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed nvidia-pyindex-1.0.5

There is a note about "... .local/bin" not being on PATH. On Ubuntu 20.04 the directory '$HOME/.local/bin' will automatically be on your PATH if it exists. For other distributions you may need to add the following to your .bashrc file so that it is on PATH when you start up a shell.

export PATH=$PATH:$HOME/.local/bin

You can look in that directory to see what has been included from pip install of nvidia-pyindex.

Step 3) Setup MPI dependencies for Horovod multi-GPU

Horovod is used for multi-GPU support in this build and you will need an MPI config available for that. There are OpenMPI components installed with the nvidia-pyindex packages but I had difficulties getting that working correctly. You could also have some conflicts if you have a local MPI install on your system. The simplest thing to do to resolve issues will be to add the OpenMPI package to the conda env.

(note, we are still in the activated tf1-nv conda env from step 1)

conda install -c conda-forge openmpi

That adds the needed OpenMPI components to your tf1-nv env. You will need to add these to your LD_LIBRARY_PATH.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/anaconda3/envs/tf1-nv/lib/

Note: I have "anaconda3" in that path, you should substitute "miniconda3" if that is what you are using. You can add that export line to your .bashrc file if you want that to be set automatically when you start a new shell.

Step 4) Install the NVIDIA TensorFlow Build (along with Horovod)

The following command will "pip" install the NVIDIA TensorFlow 1.15 build using the nvidia-pyindex files installed in step 2).

pip install --user nvidia-tensorflow[horovod]

That's it! You now have a the same highly optimized TensorFlow 1.15 build that NVIDIA uses in in their NGC TensorFlow-1 docker container.

Note: There are two separate groups of files installed; the conda tf1-nv env ($HOME/anaconda3/envs/tf1-nv) and the nvidia-pyindex files installed in $HOME/.local/bin and $HOME/.local/lib.

Test your install with a ResNet-50 benchmark

I placed part of the nvidia-examples directory in a GitHub repo for convenience. You can download a copy with,

wget https://github.com/dbkinghorn/NGC-TF1-nvidia-examples/archive/main/NGC-TF1-nvidia-examples.tar.gz

Then, un-tar the directory, cd to cnn and run a ResNet50 benchmark with synthetic data,

tar xf NGC-TF1-nvidia-examples.tar.gz

cd NGC-TF1-nvidia-examples-main/cnn/

Activate the tf1-nv conda env if you had deactivated it.

conda activate tf1-nv

and run the benchmark

(for single NVIDIA GPU)

python resnet.py --layers=50 --batch_size=64 --precision=fp32

(for multi GPU -np 2 for 2 GPUs ) ...be sure you have the library path set for MPI. See step 3)

mpiexec --bind-to socket -np 2 python resnet.py --layers=50 -batch_size=128 --precision=fp32

Enjoy!

Happy computing! --dbk @dbkinghorn


Looking for a GPU Accelerated Workstation?

Puget Systems offers a range of poweful and reliable systems that are tailor-made for your unique workflow.

Configure a System!

Labs Consultation Service

Our Labs team is available to provide in-depth hardware recommendations based on your workflow.

Find Out More!

Why Choose Puget Systems?


Built specifically for you

Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.

Fast Build Times

By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry leading ship time.

We're Here, Give Us a Call!

We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!

Lifetime Support/Labor Warranty

Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.

Click here for even more reasons!

Puget Systems Hardware Partners

Tags: oneAPI, Intel, Programming, Scientific Computing
ksk8863

How can I install it on Windows 10 platform?

Posted on 2020-12-13 18:11:31
Angga Febrian Sahid

As their github mention in https://github.com/NVIDIA/t... ,
python 3.6 and Ubuntu 18.04 or later (64-bit) required for installation.

I still wondering whether WSL can be used or not.
but to enable GPU on WSL, you need to upgrade your windows build version using Windows Insider Program.

Posted on 2020-12-14 01:16:50
Donald Kinghorn

NVIDIA only supports Linux for the TF build, ... this is the code used for the NGC container build ... My guess is that WSL2 will work fine but I haven't tried it ... yet :-) On the insider dev-channel releases of WSL2 you can enable GPU support. It basically virtualizes the GPU and shares the Windows driver. Pretty cool trick! I haven't done anything with that for a few months. I have been wondering what improvements have been done recently.

If you are using an insider Win10 release it would probably be worth a try. Just have to remember to NOT install the Linux NVIDIA driver in WSL

Maybe I'll try this myself sounds fun :-)

Posted on 2020-12-15 01:44:58
Arpit

hi professor, have you tried to install on the windows 10 platform? if yes please guide how to do so

Posted on 2021-04-01 10:23:54
Donald Kinghorn

No new news from NVIDIA on this. I don't think they have any plans of supporting Windows. Unfortunately neither is Google.
However, I have recently been working with TensorFlow 2.4 and seems to be a big step forward. There is compatibility layer for 1.x and a guide for porting to version 2. I highly recommend trying this. https://www.tensorflow.org/...

However, I do understand the bigger problem! There is a lot of interesting project code available for TF 1.x and it is now out of date for new NVIDIA hardware.

My honest recommendation is to learn to use Ubuntu 20.04 Linux. Windows will likely always be a secondary consideration for this type of work on the research side of things. It is unlikely that large players like NVIDIA, Google or Facebook will put much effort toward porting code to Windows since they do not use it internally or for their public products.

Having said that, I may try to compile the source that NVIDIA is maintaining for TF 1.15 for Windows but could be quite difficult!
Best wishes --Don

Posted on 2021-04-01 16:48:30
Angga Febrian Sahid

Thank you so much for the article Professor.
This really help me alot, and this is way more easier than using nvidia docker.

I also tried with NVIDIA RTX 3070 and works flawlessly!

Best Regards- Angga

Posted on 2020-12-14 01:15:24
Tibbers

What cudnn version does it use? The newest version(8.0.5.39-1) provides significant improvements.

Posted on 2020-12-15 20:43:30
Donald Kinghorn

cudnn 8.04.30
here's a list of lib info dirs from the install I did ( This is the GirHub branch tagged as tr1.15-4+nv20.11 )

nvidia_cublas-11.2.1.74.dist-info
nvidia_cuda_cupti-11.1.69.dist-info
nvidia_cuda_nvcc-11.1.74.dist-info
nvidia_cuda_nvrtc-11.1.74.dist-info
nvidia_cuda_runtime-11.1.74.dist-info
nvidia_cudnn-8.0.4.30.dist-info
nvidia_cufft-10.3.0.74.dist-info
nvidia_curand-10.2.2.74.dist-info
nvidia_cusolver-11.0.0.74.dist-info
nvidia_cusparse-11.2.0.275.dist-info
nvidia_dali_cuda110-0.27.0.dist-info
nvidia_dali_nvtf_plugin-0.27.0+nv20.11.dist-info
nvidia_horovod-0.20.2+nv20.11.dist-info
nvidia_nccl-2.8.2.dist-info
nvidia_pyindex
nvidia_pyindex-1.0.5.dist-info
nvidia_tensorboard-1.15.0+nv20.11.dist-info
nvidia_tensorflow-1.15.4+nv20.11.dist-info
nvidia_tensorrt-7.2.1.6.dist-info

If you are adventurous and familiar with the bazel buiklkd system you could try your own build with the newer cudnn ... I'm not feeling that adventurous :-)
take care --Don

Posted on 2020-12-15 22:23:25
Tibbers

I compiled and build Pytorch for cudnn8.0.5. I havn't done much in Tensorflow and will stick to my cusom pytorch for now. Improvements were in excess of 10% compared to the nightly build for my 3080.

Posted on 2020-12-15 22:54:54
Donald Kinghorn

nice! that's a good improvement from a minor number lib update!

Posted on 2020-12-15 23:51:38
Ampere

https://developer.nvidia.co...
https://docs.nvidia.com/cud...

Nvidia CUDA Toolkit 11.2.0
Last updated December 15, 2020

Posted on 2020-12-16 01:23:53

I found your website perfect for my needs. It contains wonderful and helpful posts. I have read most of them and learned a lot from them. Want to fix the problem of Canon Printer errors? Just follow the steps which are mention on the blog to Resolve Canon printer error code 1403 issue. Thanks

Posted on 2020-12-21 11:29:51
Ragner lothbrok

I was looking for sites related to Offshore Software development, when I saw your post. Brother printer offline windows 10

Posted on 2020-12-26 07:07:43

Thanks for this article. It's just what I was searching for. Will bookmark it.Check out the way to fix Franchise options in Bangalore. Lean how you can fix it at your own or feel free to call our experts on our toll-free numbers or visit our website to know more!

Posted on 2020-12-30 11:31:13
Donald Kinghorn

I'm going to add a comment here since this post is getting linked from other projects and some folks are running into issues.

My first recommendation to anyone is to use docker or some other container runtime along with NVIDIA's TF containers on NGC, they are really good and will save you much grief in the long run.

A common error to hit is this

Value 'sm_86' is not defined for option 'gpu-architecture'

Here is my reply to someone who hit this problem. Hopefully others will find this and if will help resolve troubles;

+++++

I think you have a path problem. It looks like you still have an
older cudatoolkit on your path for cuda 10.x (or even 11.0) which
doesn't know about the newer RTX30 GPU's. (ie. sm_86/compute_86) You
need to have cudatoolkit 11.1 or greater on your path. (greater is
probably better)

I know it's a mess! This is why I'm so sold on using containers! I
wrote up this post to help folks that cannot for whatever reason use
containers. It's the TF build that NVIDIA puts in their container. It's
a messy pip install since their stuff is not on PyPi

The really messy part is that not everything is isolated to the conda
env. I'm using that mostly for the Python 3.6. The NVIDIA stuff is
unfortunately not on PyPi which is why that index is being installed
under $HOME/.local/share ...

If you have other cuda stuff on your PATH or LD_LIBRARY_PATH (even
system default stuff) then you need to get the newer stuff "in front of"
that.
+++++

Posted on 2021-01-05 16:39:06
Angga Febrian Sahid

Thank you for the update Professor.
Maybe because I just freshly installed my ubuntu OS and the drivers so I didn't find that error.

Off course using this(nvidia-tensroflow) method is the easiest and fastest way to install Tensorflow 1.15 on new NVIDIA GPU, especially for people who already get used with Tensorflow in Windows OS alongside anaconda.

since Docker will save us much grief in the long run so is worth to try (never tried it before).
But after all, it was nice to have a lot of option to install TF 1.15 on new GPU.

Posted on 2021-01-06 06:25:47
Donald Kinghorn

issues like the above can sometimes be hard to sort out ... the problem went away when they updated to driver 460 and cudatoolkit 11.2. In general trying release updates is a good first step in trouble shooting. That process may even just fix something that was not quite right from the prior config :-)

Posted on 2021-01-06 16:22:45
kala gulla

Thanks a lot Sir, I have successfully installed all the dependencies and able to use my GPU RTX3070 wit around 90% usages. Now I am planning to use horovod for distribution training. I have installed openmpi using the commands you provided into this article. But I am not able to run "horovodrun --check-build".

I am planning to use NCCL with 2 machines. Can you please provide the steps to setup horovod distribution training into conda env tf1-nv..
Thanks

Posted on 2021-03-02 07:45:10
Donald Kinghorn

I'm afraid I can't give you very good advise on this since I haven't done it (I always use the NGC containers for this and everything is already setup)
However, I just checked on Anaconda cloud for packages that might help ... nccl is there in the nvidia and conda-forge channel. I would make a new env for TF and then install nccl into that. It may bring a lot of needed dependencies along with it. There are also packages for horovod but not in the nvidia or conda-forge channels ... that may be the easiest way forward and you could keep everything isolated in an env. For your cluster you should be OK since you would want your /home mounted on both nodes ... I'm afraid this would be an "experiment" since I don't have better advise.

Keep me post on how it goes!

Posted on 2021-03-02 17:52:21
Dirk H

Thank you for the great tutorial. I have a 2x3090 Machine with Ubuntu 18.04. Would it work or is Ubuntu 20 mandatory, because Ubuntu 20 ist mentioned here: https://github.com/NVIDIA/t... Anyone with experiences?

Posted on 2021-03-09 10:34:12
Donald Kinghorn

I'm not sure if there are any dependencies for 20.04 if you are installing the wheel with pip like I have in this post. ?? I wouldn't think it would not be a problem but I can't confirm that. You certainly need python 3.8 and a newer pip in your env. If your system is working OK with the 3090's then you should be able to try doing the pip install like in this post. Just keep in mind that it is not a real clean install since a bunch of dependencies will get loaded into .local/ in your home dir. and you could have issues with PATH.

I had some trouble with the install a few weeks after I did this post. I have been experimenting a lot with TF 2.4 and Using different base python setups. I'm currently using Miniforge because I was having so many dependency conflicts with Miniconda defaults. I removed all of my python setups and envs including everything that was installed from doing this pip install of NVIDIA TF 1.15!

Unfortunately getting proper dev environments setup with new hardware can be difficult. The containers from NGC are very good and that is the easiest way to get things going. However, then you have to deal with setting up a container runtime. I don't use docker anymore and have been using NVIDIA Enroot (I know and like the devs :-) Podman is a pretty good "docker" compatible setup and works well. There is also Singularity but I don't use it.

I think you should try what I have in this post and see if it works OK for you. If not then clean everything up and remove the env etc.. Then consider making an effort to try setting up Podman and the cuda-runtime. Once you get a container setup going you'll have a lot more freedom for trying things since it seems "everybody" is making containers for their stuff. ... I have love-hate with containers :-) but mostly like the idea. I have container setup blog posts on my list of things to do but that's a long list :-)

Posted on 2021-03-09 16:23:17