Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1419
Dr Donald Kinghorn (Scientific Computing Advisor )

How to Install TensorFlow with GPU Support on Windows 10 (Without Installing CUDA) UPDATED!

Written on April 26, 2019 by Dr Donald Kinghorn
Share:

Introduction

In June of 2018 I wrote a post titled The Best Way to Install TensorFlow with GPU Support on Windows 10 (Without Installing CUDA). That post has served many individuals as guide for getting a good GPU accelerated TensorFlow work environment running on Windows 10 without needless installation complexity. It's very satisfying to me personally to have been able to help so many to get started with TensorFlow on Windows 10! However, that guide is nearly a year old now and has needed an update for some time. I've been promising to do this in my comment reply's, so, here it is.

This post will guide you through a relatively simple setup for a good GPU accelerated work environment with TensorFlow (with Keras and Jupyter notebook) on Windows 10. You will not need to install CUDA for this!

I'll walk you through the best way I have found so far to get a good TensorFlow work environment on Windows 10 including GPU acceleration. This will be a complete working environment including,

  • System preparation and NVIDIA driver update
  • Anaconda Python installation
  • Creating an environment for your TensorFlow configuration using "conda"
  • Installing the latest stable build of TensorFlow in that environment
  • Setting up Jupyter Notebook to work with your new "env"
  • An example deep learning problem using TensorFlow with GPU acceleration, Keras, Jupyter Notebook, and TensorBoard visualization.

Lets do it.


Step 1) System Preparation - NVIDIA Driver Update and checking your PATH variable (Possible "Gotchas")

This is a step that was left out of the original post and the issues presented here were the source of most difficulties that people had with the old post. The current state of your Windows 10 configuration may cause difficulties. I'll try to give guidance on things to look out for.

The primary testing for this post is on a fresh install of Windows 10 Home "October 2018 Update" on older hardware. (Intel Core i7 4770 + NVIDIA GTX 980 GPU). This turns out to be a good test systems because it would have failed with the old guide without the information in this step.

Check your NVIDIA Driver

This is important and I'll show you why.

Don't assume Microsoft gave you the latest NVIDIA driver! Check it and update if there is a newer version.

Right click on your desktop and then "NVIDIA Control Panel"

nvidia control panel 1

nvidia control panel 2

You can see that my fresh install of Windows 10 gave me a version 388 driver. That is way too old! Now click on "System Information" and then the "Components" panel. The next image shows why that 388 driver wont work with the newest TensorFlow,

nvidia control panel 3

The CUDA "runtime" is part of the NVIDIA driver. The CUDA runtime version has to support the version of CUDA you are using for any special software like TensorFlow that will be linking to other CUDA libraries (DLL's). As of this writing TensorFlow (v1.13) is linking to CUDA 10.0. The runtime has to be as new, or newer, than the extra CUDA libraries you need.

Update the NVIDIA Display Driver

Even if you think you have the latest NVIDIA driver check to be sure.

Go to [https://www.nvidia.com/Download/index.aspx] and enter the information for your GPU. Then click "search".

driver page 1

Click "search" to go to the download page,

driver page 2

It doesn't matter too much what GPU you put in on the search page the latest driver supports cards all the way back to the 600 series.

Download and install the driver following the prompts.

Note: I used the "Standard" driver if you are using an install that was done by Dell or HP etc. they may have put there own OEM version on your system. If the standard driver doesn't work try the "DCH" driver. Also, NVIDIA now has 2 drivers because some video processing applications were not working right. I used the "Game Ready Driver". After all, it's "Workstation by day, Battle-station by night". Right?

Check your PATH environment variable

This may not be something you think about very often, but it's a good idea to have an idea of the state of your PATH environment variable. Why? Development tools will often alter you PATH variable. If you are trying to run some code and getting errors that some library or executable cannot be found, or just having strange problems that doesn't seem to make sense, then your system may be grabbing something by looking at your PATH and finding a version that you are not expecting.

If you answer yes to any of the following then you should really look at your PATH,

  • Have you installed Visual Studio?
  • Did you install some version of CUDA?
  • Have you installed Python.org Python?
  • Have you tried a "pip" install of TensorFlow?

You may be reading this because you tried and failed to install TensorFlow following Google's instructions. If you feel that you made a mess on your system then you can try to do some clean-up by uninstalling what you did. But, you may not have to clean up. Try to do what I suggest for the TensorFlow install. However, first look at your PATH so you know it's state in case you run into strange errors.

Go to the "Start menu" and start typing PATH Variable, your should get a search result for the control panel "System Properties" advanced panel.

control panel path

Click on "Environment Variables"

control panel sys

The PATH on my testing system is short because I haven't installed anything that would modify it.

If you have a long string then there is a great "Edit.." panel that will show you each entry and allow you to move things up or down and delete or add new entries.

The main idea to keep in mind is that when your systems searches for an executable or library it will start by looking in the current directory (folder) and then goes through directories listed in your User PATH entries followed by the System PATH. It keeps going until it finds the first thing that satisfies what you asked for (or fails) ... but it might not be the thing you want it to find. It takes the first thing it finds. If you have folder entries in your PATH that have different version of an executable of DLL with the same name you can move the PATH for the one you want toward the beginning of your PATH so it's found first.

Be very careful with your PATH. Don't make changes unless you know what you are doing. It should mostly be something that you are aware of for trouble-shooting.

A special note for laptops

If you have a laptop with an NVIDIA GPU (like a nice gaming laptop) then you should succeed with the instructions in this post. However, one unique problem on laptops is that you will likely have power saving control that switches your display driver back to the CPU's integrated display. A current Windows 10 setup on your laptop along with the latest driver should automatically switch your display to the NVIDIA driver when you start TensorFlow (same as starting up a game) but, if you have trouble that looks like TensorFlow is not finding your GPU then you may need to manually switch your display. You will likely find options by right clicking on your desktop.


Step 2) Python Environment Setup with Anaconda Python

I highly recommend Anaconda Python. If you need some arguments for using Python take a look at my post Should You Learn to Program with Python. For arguments on why you should use the Anaconda Python distribution see, How to Install Anaconda Python and First Steps for Linux and Windows. Another reason for using Anaconda Python in the context of installing GPU accelerated TensorFlow is that by doing so you will not have to do a CUDA install on your system.

Anaconda is focused toward data-science and machine learning and scientific computing. It installs cleanly on your system in a single directory so it doesn't make a mess in your systems application and library directories. It is also performance optimized for important numerical packages like numpy, scipy etc..

Download and Install Anaconda Python

Anaconda download

You can download an "Run" at the same time or download to your machine and double click on the "exe" file to start the installer.

  • You will be asked to accept a license agreement ...
  • "Select Install Type" I recommend you chose "Just Me" since this is part of your personal development environment.
  • "Chose Install Location" I recommend you keep the default which is at the top level of you user directory.
  • "Advanced Installation Options"

Advanced install opts

"Register Anaconda as my default Python 3.7" is recommended." "Add Anaconda to my PATH environment variable" is OK to select. However, you don't really need to do that. If you use the GUI, Anaconda Navigator, the (DOS) shell or the PowerShell link in the Anaconda folder on your start menu they will temporarily set the proper PATH environment for you without making a "permanent" change to your PATH variable. For this install I will leave it un-checked.

My personal preference it to "Add Anaconda to my PATH" because I want it to be found whenever I use Python.

Note: This version of the Anaconda distribution supports "Python environments" in PowerShell which is my personal preferred way to to work with "conda" on Windows.

Check and Update your Anaconda Python Install

Go to the "Start menu" find the "Anaconda3" item and then click on the "Anaconda Powershell Prompt",

Powershell prompt for Anaconda

With "Anaconda Powershell" opened do a quick check to see that you now have Anaconda3 Python 3.7 as your default Python.

(base) PS>python
Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

Type CTRL-D to exit the Python prompt.

Update your base Anaconda packages

`conda` is a powerful package and environment management tool for Anaconda. We'll use `conda` from Powershell to update the base Python install. Run the following commands. It may take some time to do this since there may be a lot of modules to update.

conda update conda

conda update anaconda

conda update python

conda update --all

That should bring your entire base Anaconda install up to the latest packages. (Everything may already be up to date.)

Anaconda Navigator

There is a GUI for Anaconda called `anaconda-navigator`. I personally find it distracting/confusing/annoying and prefer using `conda` from the command-line. Your taste may differ! ... and my opinion is subject to change if they keep improving it. If you are new to Anaconda then I highly recommend that you read up on `conda` even (or especially!) if you are thinking about using the "Navigator" GUI.


Step 3) Create a Python "virtual environment" for TensorFlow using conda

You should set up an environment for TensorFlow separate from your base Anaconda Python environment. This keeps your base clean and will give TensorFlow a space for all of it's dependencies. It is in general good practice to keep separate environments for projects especially when they have special package dependencies. Think of it as a separate "name-space" for your project.

There are many possible options when creating an environment with conda including adding packages with specific version numbers and specific Python base versions. This is sometimes useful if you want fine control and it also helps with version dependency resolution. Here we will keep it simple and just create a named environment, then activate that environment and install the packages we want inside of that.

  • From the "Anaconda Powershell Prompt" command line do,
conda create --name tf-gpu

I named the environment 'tf-gpu' but you can use any name you want. For example you could add the version number.

NOTE: avoid using spaces in names! Python will not handle that well and you could get get strange errors. "-" and "_" are fine. (Python programmers often use underscores.)

  • Now exit from the Powershell you are using and then open a new one before you activate the new "env". This is an annoying quirk but, powershell will not re-read it's environment until you restart it. If you activate the new "env" before you restart you will not be able to do any package installs because the needed utilities will not be on the path in the current shell until after a restart.
  • "activate" the environment, (I'll show my full Powershell prompt and output instead of just the commands)
(base) PS C:Usersdon> conda info --envs
# conda environments:
#
base * C:UsersdonAnaconda3
tf-gpu C:UsersdonAnaconda3envstf-gpu


(base) PS C:Usersdon> conda activate tf-gpu

(tf-gpu) PS C:Usersdon>

The `conda info --envs` command shows the "envs" you have available.

After doing `conda activate tf-gpu` you can see that the prompt is now preceded by the the name of the environment `(tf-gpu)`. Any conda package installs will now be local to this environment.


Step 4) Install TensorFlow-GPU from the Anaconda Cloud Repositories

There is an "official" Anaconda maintained TensorFlow-GPU package for Windows 10!

A search for "tensorflow" on the Anaconda Cloud will list the available packages from Anaconda and the community. There is a package "anaconda / tensorflow-gpu 1.13.1" listed near the top that has builds for Linux and Windows. This is what we will be installing from the commands below.

This command will install the latest stable version of TensorFlow with GPU acceleration in this conda environment. (It will be the latest version maintained by the Anaconda team and may lag by a few weeks from any fresh release from Google.)

(tf-gpu) C:Usersdon> conda install tensorflow-gpu

That's it! You now have TensorFlow with NVIDIA CUDA GPU support!

This includes, TensorFlow, Keras, TensorBoard, CUDA 10.0 toolkit, cuDNN 7.3 along with all of the dependencies. It's all in your new "tf-gpu" env ready to use and isolated from other env's or packages on your system.


Step 5) Simple check to see that TensorFlow is working with your GPU

You can use the powershell that you have activated the tf-gpu env in and did the TensorFlow install with or open a new one and do ` conda activate tf-gpu`.

With your tf-gpu env active type the following,

python

Your prompt will change to the python interpreter prompt. this will be a simple test and we'll use a nice feature of recent TensorFlow releases, eager execution.

>>> import tensorflow as tf

>>> tf.enable_eager_execution()

>>> print( tf.constant('Hello from TensorFlow ' + tf.__version__) )

(that is 2 underscores before and after "version")

My session including the output looked like this,

(base) PS>conda activate tf-gpu

(tf-gpu) PS>python

Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32

Type "help", "copyright", "credits" or "license" for more information.

>>> import tensorflow as tf

>>> tf.enable_eager_execution()

>>> print( tf.constant( 'Hellow from TensorFlow ' + tf.__version__ ) )

2019-04-24 18:08:58.248433: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

2019-04-24 18:08:58.488035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:

name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate(GHz): 1.2785

pciBusID: 0000:01:00.0

totalMemory: 4.00GiB freeMemory: 3.30GiB

2019-04-24 18:08:58.496081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0

2019-04-24 18:08:58.947914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-04-24 18:08:58.951226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0

2019-04-24 18:08:58.953130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N

2019-04-24 18:08:58.955149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3005 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980, pci bus id: 0000:01:00.0, compute capability: 5.2)

tf.Tensor(b'Hellow from TensorFlow 1.13.1', shape=(), dtype=string)

>>>

When you first run TensorFlow it outputs a bunch of information about the execution environment it is in. You can see that it found the GTX 980 in this system and added it as an execution device.

Next we will do something a little more useful and fun with Keras, after we configure Jupyter notebook to use our 'tf-gpu' environment.


Step 6) Create a Jupyter Notebook Kernel for the TensorFlow Environment

You can work with an editor and the command line and you often want to do that but, Jupyter notebooks are great for doing machine learning development work. In order to get Jupyter notebook to work the way you want with this new TensorFlow environment you will need to add a "kernel" for it.

With your tf-gpu environment activated do,

conda install ipykernel jupyter

Note: I installed both ipykernel and jupyter above since jupyter was not installed by default when we created the tf-gpu env. jupyter is installed by default in the (base) env.

Now create the Jupyter kernel,

python -m ipykernel install --user --name tf-gpu --display-name "TensorFlow-GPU-1.13"

You can set the "display-name" to anything you like. I included the version number here.

With this "tf-gpu" kernel installed, when you start Jupyter notebook you will now have an option to to open a new notebook using this kernel.

Start a Jupyter notebook,

jupyter notebook

Look at the "New" menu,

Jupyter kernel for TF

Note: If you start a jupyter notebook from the (base) env you will see "TensorFlow-GPU-1.13" option but you will not be able to import tensorflow in that notebook because TensorFlow is only installed in the "tf-gpu" env. [You could have installed into your (base) env but, I recommend that you keep separate env's.]


Step 7) An Example Convolution Neural Network training using Keras with TensorFlow

In order to check everything out lets setup the classic neural network LeNet-5 using Keras using a Jupyter notebook with our "TensorFlow-GPU-1.13" kernel. We'll train the model on the MNIST digits data-set and then use TensorBoard to look at some plots of the job run.

You do not need to install Keras or TensorBoard separately since they are now included with the TensorFlow install.

Activate your "tf-gpu" env

Launch "Anaconda Powershell" and then do,

conda activate tf-gpu

Create a working directory (and log directory for TensorBoard)

I like to have a directory called "projects" in my user home directory. In the project directory I create directories for things I'm working on. Of course, you can organize your work however you like. ... But I do highly recommend that you learn to use the command-line if your are not familiar with working like that. You can thank me later!

In powershell the the following commands are useful for managing directories,

To see what directory you are in,

pwd

(if you just opened "Anaconda Powershell" you should be in your "user home directory")

To create a new directory (and additional subdirectories all at once)

Note: when you are working with "code" I highly recommend that you **do not use spaces in directory or file names**.

mkdir projects/tf-gpu-MNIST/logs

That one command above gives you a work directory, "tf-gpu-MNIST", and a "logs" subdirectory.

Note: In powershell you can use "/" or "" to separate directories. (It has many commands that would be the same in Linux and you can use those alternatively to "DOS" like commands. )

To change directory use "cd"

cd projects/tf-gpu-MNIST

(For completeness) To delete a directory you can use the ` rmdir` command

Launch a Jupyter Notebook

After "cd'ing: into your working directory and with the tf-gpu environment activated start a Jupyter notebook,

jupyter notebook

From the 'New' drop-down menu select the 'TensorFlow-GPU-1.13' kernel that you added (as seen in the image in the last section). You can now start writing code!


MNIST hand written digits example

The following "code blocks" can be treated as jupyter notebook "Cells". You can type them in (recommended for practice) or cut and past. To execute the code in a cell use `Shift-Return`.

We will setup and train LeNet-5 with the MNIST handwritten digits data.

Import TensorFlow

import tensorflow as tf

Load and process the MNIST data

mnist = tf.keras.datasets.mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# reshape and rescale data for the CNN

train_images = train_images.reshape(60000, 28, 28, 1)

test_images = test_images.reshape(10000, 28, 28, 1)

train_images, test_images = train_images/255, test_images/255

Create the LeNet-5 convolution neural network architecture

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Dropout(0.25),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(10, activation='softmax')
])

Compile the model

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Set log data to feed to TensorBoard for visual analysis

tensor_board = tf.keras.callbacks.TensorBoard('./logs/LeNet-MNIST-1')

Train the model (with timing)

import time

start_time=time.time()

model.fit(train_images, train_labels, batch_size=128, epochs=15, verbose=1,
         validation_data=(test_images, test_labels), callbacks=[tensor_board])

print('Training took {} seconds'.format(time.time()-start_time))

The results

After running that training for 15 epochs the last epoch gave,

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
60000/60000 [==============================] - 6s 105us/sample - loss: 0.2400 - acc: 0.9276 - val_loss: 0.0515 - val_acc: 0.9820
...
...
Epoch 15/15
60000/60000 [==============================] - 5s 84us/sample - loss: 0.0184 - acc: 0.9937 - val_loss: 0.0288 - val_acc: 0.9913
Training took 79.47694969177246 seconds

Not bad! Training accuracy 99.37% and Validation accuracy 99.13%

It took about 80 seconds on my old Intel i7-4770 box with an NVIDIA GTX 980 GPU (it's about 17 times slower on the CPU).


Look at the job run with TensorBoard

Open another "Anaconda Powershell" and activate your tf-gpu env, and "cd" to your working directory,

conda activate tf-gpu

cd projects/tf-gpu-MNIST

Then startup TensorBoard

tensorboard --logdir=./logs --port 6006

It will give you a local web address with the name of your computer (like the lovely name I got from this test Win10 install)

tensorboard start

Open that address in your browser and you will be greeted with (the wonderful) TensorBoard. These are the plots it had for that job run,

TensorBoard output

Note: For a long training job you can run TensorBoard on a log file during the training. It will monitor the log file and let your refresh the plots as it progresses.


Conclusion

That MNIST digits training example was a model with 1.2 million training parameters and a dataset with 60,000 images. **It took 80 seconds utilizing the NVIDIA GTX 980 on my old test system! For reference it took 1345 seconds using all cores at 100% on the Intel i7-4770 CPU in that machine. That's an 17 fold speedup on the GPU. That's why you use GPU's for this stuff!**

Note: I used the same procedure for doing the CPU version. I created a new "env" naming it "tf-CPU" and installed the CPU only version of TensorFlow i.e. `conda install tensorflow` without the "-gpu" part. I then ran the same Jupyter notebook using a "kernel" created for that env.

I sincerely hope this guide helps get you up-and-running with TensorFlow. Feel free to add comments if you have any trouble. Either myself or someone else in the community will likely be able to help you!

Happy computing! --dbk


Looking for a
Scientific Compute System?

Do you have a project that needs serious compute power, and you don't know where to turn? Puget Systems offers a range of HPC workstations and servers tailored for both CPU and GPU workloads.

Why Choose Puget Systems?


Built specifically for you

Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.

Fast Build Times

By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry leading ship time of 7-10 business days on nearly all our system orders.

We're Here, Give Us a Call!

We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!

Lifetime Support/Labor Warranty

Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.

Click here for even more reasons!

Tags: Windows, Machine Learning, Tensorflow, NVIDIA, GPU
Bernardo Rufino

Thank you for such a detailed article!

It helped me a lot not only for installing tensorflow with GPU support, but also do see how great it's to use the Shell when you know the commands!

Thanks!

Posted on 2019-05-03 02:48:21
Donald Kinghorn

You are welcome. I really like where things are headed with Powershell (and WSL) on Windows 10. I was happy to see that Anaconda has full support for PS now. The command-line will give you great powers :-)

It's even getting to where you can go back and forth between Windows and Linux pretty seamlessly. I have SSH, both client and server, running on Win 10. That gives some very interesting usage possibilities in a heterogeneous environment.

I'll be doing more posts about this kind of thing.

Posted on 2019-05-03 15:17:05
dt

There is a missing reference to the TensorBoard callback in the MNIST example. I modified my notebook in this part to look like this

import time

tbCallBack = tf.keras.callbacks.TensorBoard(log_dir='./Graph', histogram_freq=0,

write_graph=True, write_images=True)

start_time=time.time()

model.fit(train_images, train_labels, batch_size=128, epochs=15, verbose=1,

validation_data=(test_images, test_labels), callbacks=[tbCallBack])

print('Training took {} seconds'.format(time.time()-start_time))

Posted on 2019-05-03 05:43:40
Donald Kinghorn

I think you just missed it :-) I have this in there between the compile and fit

tensor_board = tf.keras.callbacks.TensorBoard('./logs/LeNet-MNIST-1')

but I only set the log directory and left everything else as defaults

There are a whole bunch of options when you instantiate a TensorBoard callback. The args you passed made me go look at the new docs (which I hadn't done yet :-)
https://www.tensorflow.org/...

Thanks --Don

Posted on 2019-05-03 14:23:31
Adam Kowalczewski

This is a fantastic alternative to any other instructions I've seen. Got me up and running with a gpu in the most direct way. Thank you so much.

Posted on 2019-05-11 21:41:56
Werner van Waesberghe

Many thanks. First test runs approx. 15 times faster.

Posted on 2019-05-11 23:53:42
Franz Hahn

Unfortunately I cannot get Tensorboard to work. The process runs, the python.exe in my tf-gpu env is allowed through my firewall, but when I open the tensorboard url I get an ERR_CONNECTION_REFUSED. Any ideas?

Posted on 2019-05-17 14:12:18
Florian Bautry

Try to go directly to localhost:6006 in your navigator.

Posted on 2019-05-24 03:54:34
Franz Hahn

I did, and I also ran the tensor board server on other ips to see if it was an IP conflict. Not the case, doesn't work on any IP.

Posted on 2019-05-24 06:33:10
Pranabesh Das

This one (http://localhost:6006/#scalars) worked for me. Thanks!

Posted on 2019-06-08 05:22:17
Donald Kinghorn

Tensorboard sometimes gives people trouble. I'm not completely sure why.

The two things that I believe cause the most trouble are,
1) the directory path to the log file that the "callback" needs to write the data to is not correct somewhere in your code and
2) Tensorboard permissions to read that data.

(I've messed up with tensorboard before by using "logs" in one place and "log" in another and not catching it)

I had some strange trouble myself last weekend (not tensorbaord related) on a new Win10 install that had oneDrive enabled. I was creating python directories and files with Powershell and they disappeared within file explorer!! (and didn't show up in oneDrive either) I suspect that sometimes oneDrive can do unexpected things on your system and could cause difficulty for programs like Tensorboard ??? It's possible that your "conection refused" error is because the log file isn't where you think it is and access to it is restricted because of one Drive (applications have to be registered to access those files (?)) [ I lost some of my work! I got so frustrated with it that I reinstalled the system and left oneDrive disabled ]

My advise is to double check that your "logs" directory is where you think it is and then make sure that oneDrive has nothing to do with the directories you are using.
Another thing that might help is to create a directory in your user directory ( I use "projects") and do all of your work in there (sub-directories). ... and if in doubt give full paths to file/dir names i.e. C:\Users\don\projects\TFtest1\logs

Posted on 2019-05-24 16:11:22
Dinesh Muniandy

Hi Donald,

Just a question, since this article doesn't focus on installing CUDA (or it's libraries) to use Tensorflow GPU with; are we going to be experiencing any form of lost in performance - meaning, the speed/efficiency that we usually get while training models (with CUDA) ?

Update 1:
* mkdir projects/tf-gpu-MNIST/logs (didn't work for me)
* mkdir projects\tf-gpu-MNIST\logs (changing this to backslash, worked for me)

Posted on 2019-05-18 11:28:04
Donald Kinghorn

Hi Dinesh, Yes! the mkdir command is the same in CMD and in Powershell but in Powershell you can use either directory separator.

Also, the cuda toolkit cuBLAS and cuDNN that get installed with the conda command are the same packages (libraries DLL's) that would be installed from a direct system wide install. But they are localized and the correct versions for the particular build of TF

Posted on 2019-05-24 15:33:25
S A

This was a great article. It is tough to find guides this detailed that review all the potential pitfalls as well.

Out of curiosity, could you deploy your Jupyter notebook onto Google Colab? Especially if you wanted to make your model available for inference by others? Have you had any experience using NiftyNet or similar packages within this environment?

Thanks!

Posted on 2019-05-26 01:22:44
Donald Kinghorn

Thanks! covering the pitfalls was part of my motivation for doing this post. The one last year left out just enough that a lot of people ran into trouble.

Colab is really nice and I wouldn't be surprised to see this MNIST with LeNet example already up there as an example.

NiftyNet looks really interesting, I hadn't seen that before.

Something I looked at today that got me motivated was the recently released TensorFlow-graphics https://github.com/tensorfl... It's unsupervised learning with training simultaneously on an encoder and decoder for images ... based on "differentiable graphics" ... There are some tutorials on Colab that are linked from the github page. I think I want to dig into this one a bit :-)

Posted on 2019-05-29 01:56:54
Prachi Sharma

This is the best post ! Everything is clearly written here. :)

Posted on 2019-06-04 13:52:00
Donald Kinghorn

Tank you :-)

Posted on 2019-06-05 16:30:50
Pranabesh Das

Great post! Thanks a ton.

Posted on 2019-06-08 05:23:15
BobVan

This is terrific. I spent the past two days trying to set up a new laptop with an RTX 2060 card installed. Your instructions worked perfectly.

THANK YOU!!!!!!!!

Posted on 2019-06-10 02:13:15
Prachi Sharma

Hi! I did the same way many days back. Everything was working fine whenever I was doing only transfer learning but when I started doing training using CNNs, this error is shown on notebook:

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node block1_conv1/convolution}}]]

I am not able to solve it. If you have any suggestions to solve it, then do let me know.

Specs: Windows 10, tensorflow-gpu = 10.0.0, CUDA = 10.0.0, Cudnn = 7.7.0, Builtin GPU = GeForce GTX 1060 3GB, Compute capability = 6.1

Thanks in Advance !

Posted on 2019-06-13 14:19:30
Donald Kinghorn

Hi Prachi, I just checked on my laptop (similar with 1060). This systems has updated to Win10 19.03 since I had last used anaconda on it. It was badly broken! I have reinstalled to the latest anaconda version as described in this post. Just now checking some things...

The anaconda developers have been making a lot of changes recently and there have been problems reported. I just now had errors when trying to use Powershell. These went away when I did an update i.e.
conda update conda
conda update --all

I re-did the tf-gpu install and ran the MNIST notebook everything looks OK.

I noticed one thing strange in what you reported Cudnn = 7.7.0 In the install I just did it added these, (conda list shows this)
cudatoolkit 10.0.130 0
cudnn 7.6.0 cuda10.0_0
tensorflow-gpu 1.13.1 h0d30ee6_0

You may have gotten a cudnn 7.7.0 somehow ??? (that would be a version conflict) I suggest that you create a new env (maybe tf-gpu-new) and the then do a fresh tensorflow-gpu install in that.

I'm hopeful this will fix things for you. Post back with more info if it doesn't. --Don

Posted on 2019-06-15 01:54:15
Suhaas Valanjoo

I installed Anaconda for Python 3.7 on Windows 10. If I used activate or "conda activate" command in the Powershell prompt, it just would not work, but it works very well in windows command line. Basically, I could not see the indicator that asterisk moves to preceding position of the newly activated environment, but I clearly see it with windows prompt. There is an open issue on github. https://github.com/conda/co...

https://uploads.disquscdn.c...

Pictures attached https://uploads.disquscdn.c... https://uploads.disquscdn.c... https://uploads.disquscdn.c...

Posted on 2019-06-15 02:29:29
Donald Kinghorn

They made some changes recently that have cause some trouble. I had a similar problem and I think I resolved by doing "conda init" in Powershell ...

It has been a bit of a mess. Try doing a conda init and then conda activate tf-gpu

Posted on 2019-06-18 21:24:21
Suhaas Valanjoo

Thanks, Donald. Big mess, true :-). I spent hours trying to figure out the workaround.

Posted on 2019-06-18 22:41:51
Suhaas Valanjoo

'Conda init' did not solve the issue (picture atatched)

Posted on 2019-06-20 01:52:45
Donald Kinghorn

Dang! I know I had the same issue but don't remember what fixed it. Try this (if you haven't already),
open PS then do cmd to change to dos shell then
conda update conda
conda update --all
exit
exit
Then try again.
I was frustrated with anaconda for a couple of weeks because of some of their changes. I messed with it a few times including re-installs to older releases and then finally to the latest update. Everything is working the way I want now but the PS issues were to most annoying!

Posted on 2019-06-20 16:26:17
Padmakumar Nambiar

Hello Donald, Great article... Thanks for sharing!

Now, I'm facing the following issue related to performance of my ML code and thought you may be able to help please...

This particular call in "mrcnn.model.py" takes about 13 secs to return !!

r = model.detect([image], verbose=0)[0]

… which finally ends in the following call in the "sessions.py" file,...

ret = tf_session.TF_SessionRunCallable(self._session._session, self._handle, args, status, run_metadata_ptr)

… which in turn calls the following function call in "pywrap_tensorflow_internal.py" (which consumes most of the time delay mentioned above):

def TF_SessionRunCallable(session, handle, feed_values, out_status, run_metadata):

return _pywrap_tensorflow_internal.TF_SessionRunCallable(session, handle, feed_values, out_status, run_metadata)

TF_SessionRunCallable = _pywrap_tensorflow_internal.TF_SessionRunCallable

My TF version is: 1.13.1

Any help will be greatly appreciated. Thanks.

Posted on 2019-06-18 09:48:17
Donald Kinghorn

I don't have any good ideas on that. It does look like it is something with tensorflow itself since the tie is going in to the python wrapper call. There is a lot going on in the model with mask-rcnn and it could be using resnet-50 or -101 ... I've never used this so I don't know what would be good/bad performance timing.

I wish I could offer you better advise but I'm afraid I don't have any ideas.

Posted on 2019-06-18 21:17:55
Padmakumar Nambiar

Hi Donald, Thanks for your reply... Just in case it helps, I uninstalled my existing TF version, and installed tf-gpu-1.3.1 instead, and now I find a bunch of errors around the pywrap function mentioned above (pasted below). I changed it back to 1.13.1, those errors disappeared, but it started taking 13 secs for each frame in the video.

<<anaconda prompt="">>python video_demo.py
Traceback (most recent call last):
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module> from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module> _pywrap_tensorflow_internal = swig_import_helper()
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\Users\user\Anaconda3\lib\imp.py", line 242, in load_module return load_dynamic(name, filename, file)
File "C:\Users\user\Anaconda3\lib\imp.py", line 342, in load_dynamic return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "video_demo.py", line 2, in <module> from visualize_cv2 import model, display_instances, class_names
File "C:\Users\user\eclipse-workspace\New_mask_RCNN\visualize_cv2.py", line 6, in <module> from mrcnn import utils
File "C:\Users\user\eclipse-workspace\New_mask_RCNN\mrcnn\utils.py", line 15, in <module> import tensorflow as tf
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\__init__.py", line 24, in <module> from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\__init__.py", line 49, in <module> from tensorflow.python import pywrap_tensorflow
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in <module> raise ImportError(msg)

ImportError: Traceback (most recent call last):
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module> from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module> _pywrap_tensorflow_internal = swig_import_helper()
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\Users\user\Anaconda3\lib\imp.py", line 242, in load_module return load_dynamic(name, filename, file)
File "C:\Users\user\Anaconda3\lib\imp.py", line 342, in load_dynamic return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

Posted on 2019-06-19 03:52:37
nandodmelo

I will try your approach, but I have a trivial question, what’s the difference between GPU acceleration without installing CUDA and Installing CUDA/cuDNN? Or is it the same thing?

Posted on 2019-06-24 15:31:04
Donald Kinghorn

That is a very good question and it's important to understand!

I say "without installing CUDA" but really all of the libraries are installed along with tensorflow-gpu when you use the Anaconda package. They are kept in the resources sub-directories along with the TensorFlow-GPU package install. They are isolated from the rest of your setup by creating the "env" i.e. tf-gpu. That does 3 things for you;

1) It keeps you from having to do a full manual install of CUDA (which requires that you also install MS Visual Studio)

2) It gives you the correct versions of the CUDA libraries that you need! That last point is really important because if you just did a CUDA install you would likely be getting the latest version from NVIDIA which right now is 10.1. Google is compiling TensorFlow 1.13, 1.14 and 2.0.0-beta1 using CUDA 10.0 i.e. CUDA 10.1 wouldn't work!

3) lastly it gives you an easy way to have multiple packages installed that are using different version of the CUDA, cuDNN libraries. For example you could do the same kind of install for PyTorch linked against CUDA 10.1 or 9.2 or whatever.

It also makes upgrade paths a lot cleaner too, just make a new env and install a new version. Bottom line is it helps you keep from having a mess on your system.

P.S. I'm working on a short post right now that will use the tensorflow-gpu seup in this post as the basis for setting up a TensorFlow 2.0.0-beta1 install for testing ( we can do that because TF 2 beta has the same CUDA dependencies at the other versions :-) --Don

Posted on 2019-06-25 17:17:29

Thank you. It works. I got like 20 times speedup using GTX 1060 compared to my CPU i7-3770 @ 3.4GHz 3.9GHz.
The GPU utilization was around 40%. Very crucial detail is the neural network structure.
If your network is too small you won't gain any speedup. I tried with the simple MNIST model example on TensorFlow tutorial and I gained nothing.
That mistake made me thought I installed GPU version incorrectly until I try LeNet-5 model and saw 20 times speedup.

Posted on 2019-06-24 20:28:46
Donald Kinghorn

Good point! I didn't really discuss that in the post. Small test jobs and a lot of "learning" examples might not show much or any improvement in performance on GPU. It's when the model and/or the data-set gets larger that the GPU really starts to be a BIG advantage.

Posted on 2019-06-25 16:59:19
Sri Harsha

Hey,

I see the tensorflow page lists that python 3.6 to be the requirement, I dont understand how you were able to install it on python 3.7.
Can you please explain that ?

Thanks

Posted on 2019-06-28 15:04:42
Donald Kinghorn

It's not strictly a requirement. Although it's probably good not to use a Python version older than that. The dependencies are determined when the code is compiled. The devs working on the build for Anaconda probably used the latest Python in their environment when they compiled TensorFlow-GPU for their package.

"conda" is a package manager. When you do "conda install tensorflow-gpu" it is going to pull the package from the official build on Anaconda cloud. conda will have a list of the dependencies for the package and make sure that they are met when it does the install (it will warn you if it needs to downgrade any existing package in the env). You don't have to worry about it.

However, you probably could use Python 3.6 is you really wanted to. You can set a Python base version when you create an env. For example you could try something like

conda create --name tf-gpu-py36 python=3.6
conda activate tf-gpu-py36
conda install tensorflow-gpu

let me try that ... OK... in that env "conda list" shows (leaving out most of the output)
...
cudatoolkit 10.0.130 0
cudnn 7.6.0 cuda10.0_0
...
python 3.6.8 h9f7ef89_7
...
tensorboard 1.13.1 py36h33f27b4_0
tensorflow 1.13.1 gpu_py36h9006a92_0
tensorflow-base 1.13.1 gpu_py36h871c8ca_0
tensorflow-estimator 1.13.0 py_0
tensorflow-gpu 1.13.1 h0d30ee6_0

I tested it and it works fine :-) Personally I would keep the latest Python 3.7 unless you have some code that has a real conflict with it that you want to use in the same env. You have a LOT of control over versioning with conda!

Posted on 2019-06-28 16:24:03
Mr 53,461

Hi Donald,
I want to add one of the the tensorflow/examples/ that I find on github
Where is the 'right' place to put it?

(base) PS C:\Users\jeff> conda info --envs
base * C:\jeff\anaconda3
tf-gpu C:\jeff\anaconda3\envs\tf-gpu

Thanks!

Posted on 2019-07-02 18:17:19
Donald Kinghorn

You can put source anywhere you like. I usually create a directory under my user directory called projects (I do this on Linux and Windows) and then I create directories in there for anything that I'm working on.

If you open Powershell you can do
pwd ( to see what dir you are in (should be C:\Users\jeff )
mkdir projects
cd projects
mkdir tf-examples
cd tf-examples

And then but your stuff from GitHub in there (of course you can use any dir name you like and you can use the GUI file manager to do this too)

One thing to note, I advise you to *not* use spaces in any directory or file names. Windows allows that but it can cause problems with Python.

Posted on 2019-07-02 23:13:43
Mr 53,461

Works!
For anyone else's benefit...
In anaconda powershell I say 'conda activate tf-gpu' and the prompt look like this: (tf-gpu) PS C:\Users\jeff\projects\tf-examples.
Then I did a big github clone: git clone https://github.com/tensorfl...

Now when I start a jupyter notebook I can import models

Thanks a lot Donald!!

Posted on 2019-07-03 14:54:30
Lina Chato

Hi,
Thank you for the great article....
I have Quadro P2000 GPU and this is not good for DL/ML applications. so I buy TITAN. Please I would be thankful if you could advise! Should I reinstall tensorflow in my system again after mounting the new GPU and install its driver?

Posted on 2019-07-05 16:42:37
Donald Kinghorn

I don't think you will need to reinstall TensorFlow if your setup has been working well for you.

You should be able to install the Titan and then restart. The system should detect the card and then update the driver. Be sure to check the driver version like I have suggested in this post. It would be good to update it to the latest release (430 right now I believe)

Posted on 2019-07-05 18:20:09
Thomas Chu

Thanks a lot!
It works for my system(HP Z4 G4+Titan).

Posted on 2019-07-08 19:52:56
pl709

Sir, you post is like a shining light in a stormy skyI've battling with the system/CUDA/cuDNN/py/tf-gpu compatibility problems for a looooong time

Posted on 2019-07-26 02:33:05
Donald Kinghorn

Ha ha, I'm glad you were able to see the light! I do appreciate your (and everyone's) kind works :-) --Don

Posted on 2019-07-26 15:27:24
Divyansh Jain

Excellent Article!! Thank you so much :)

Posted on 2019-08-03 12:46:23
Brandon Elford

Amazing, you have ended two weeks worth of time trying to install TF-GPU. Thank you so much!

Posted on 2019-09-09 15:35:16