Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1419
Dr Donald Kinghorn (Scientific Computing Advisor )

How to Install TensorFlow with GPU Support on Windows 10 (Without Installing CUDA) UPDATED!

Written on April 26, 2019 by Dr Donald Kinghorn
Share:

Introduction

In June of 2018 I wrote a post titled The Best Way to Install TensorFlow with GPU Support on Windows 10 (Without Installing CUDA). That post has served many individuals as guide for getting a good GPU accelerated TensorFlow work environment running on Windows 10 without needless installation complexity. It's very satisfying to me personally to have been able to help so many to get started with TensorFlow on Windows 10! However, that guide is nearly a year old now and has needed an update for some time. I've been promising to do this in my comment reply's, so, here it is.

This post will guide you through a relatively simple setup for a good GPU accelerated work environment with TensorFlow (with Keras and Jupyter notebook) on Windows 10. You will not need to install CUDA for this!

I'll walk you through the best way I have found so far to get a good TensorFlow work environment on Windows 10 including GPU acceleration. This will be a complete working environment including,

  • System preparation and NVIDIA driver update
  • Anaconda Python installation
  • Creating an environment for your TensorFlow configuration using "conda"
  • Installing the latest stable build of TensorFlow in that environment
  • Setting up Jupyter Notebook to work with your new "env"
  • An example deep learning problem using TensorFlow with GPU acceleration, Keras, Jupyter Notebook, and TensorBoard visualization.

Lets do it.


Step 1) System Preparation - NVIDIA Driver Update and checking your PATH variable (Possible "Gotchas")

This is a step that was left out of the original post and the issues presented here were the source of most difficulties that people had with the old post. The current state of your Windows 10 configuration may cause difficulties. I'll try to give guidance on things to look out for.

The primary testing for this post is on a fresh install of Windows 10 Home "October 2018 Update" on older hardware. (Intel Core i7 4770 + NVIDIA GTX 980 GPU). This turns out to be a good test systems because it would have failed with the old guide without the information in this step.

Check your NVIDIA Driver

This is important and I'll show you why.

Don't assume Microsoft gave you the latest NVIDIA driver! Check it and update if there is a newer version.

Right click on your desktop and then "NVIDIA Control Panel"

nvidia control panel 1

nvidia control panel 2

You can see that my fresh install of Windows 10 gave me a version 388 driver. That is way too old! Now click on "System Information" and then the "Components" panel. The next image shows why that 388 driver wont work with the newest TensorFlow,

nvidia control panel 3

The CUDA "runtime" is part of the NVIDIA driver. The CUDA runtime version has to support the version of CUDA you are using for any special software like TensorFlow that will be linking to other CUDA libraries (DLL's). As of this writing TensorFlow (v1.13) is linking to CUDA 10.0. The runtime has to be as new, or newer, than the extra CUDA libraries you need.

Update the NVIDIA Display Driver

Even if you think you have the latest NVIDIA driver check to be sure.

Go to [https://www.nvidia.com/Download/index.aspx] and enter the information for your GPU. Then click "search".

driver page 1

Click "search" to go to the download page,

driver page 2

It doesn't matter too much what GPU you put in on the search page the latest driver supports cards all the way back to the 600 series.

Download and install the driver following the prompts.

Note: I used the "Standard" driver if you are using an install that was done by Dell or HP etc. they may have put there own OEM version on your system. If the standard driver doesn't work try the "DCH" driver. Also, NVIDIA now has 2 drivers because some video processing applications were not working right. I used the "Game Ready Driver". After all, it's "Workstation by day, Battle-station by night". Right?

Check your PATH environment variable

This may not be something you think about very often, but it's a good idea to have an idea of the state of your PATH environment variable. Why? Development tools will often alter you PATH variable. If you are trying to run some code and getting errors that some library or executable cannot be found, or just having strange problems that doesn't seem to make sense, then your system may be grabbing something by looking at your PATH and finding a version that you are not expecting.

If you answer yes to any of the following then you should really look at your PATH,

  • Have you installed Visual Studio?
  • Did you install some version of CUDA?
  • Have you installed Python.org Python?
  • Have you tried a "pip" install of TensorFlow?

You may be reading this because you tried and failed to install TensorFlow following Google's instructions. If you feel that you made a mess on your system then you can try to do some clean-up by uninstalling what you did. But, you may not have to clean up. Try to do what I suggest for the TensorFlow install. However, first look at your PATH so you know it's state in case you run into strange errors.

Go to the "Start menu" and start typing PATH Variable, your should get a search result for the control panel "System Properties" advanced panel.

control panel path

Click on "Environment Variables"

control panel sys

The PATH on my testing system is short because I haven't installed anything that would modify it.

If you have a long string then there is a great "Edit.." panel that will show you each entry and allow you to move things up or down and delete or add new entries.

The main idea to keep in mind is that when your systems searches for an executable or library it will start by looking in the current directory (folder) and then goes through directories listed in your User PATH entries followed by the System PATH. It keeps going until it finds the first thing that satisfies what you asked for (or fails) ... but it might not be the thing you want it to find. It takes the first thing it finds. If you have folder entries in your PATH that have different version of an executable of DLL with the same name you can move the PATH for the one you want toward the beginning of your PATH so it's found first.

Be very careful with your PATH. Don't make changes unless you know what you are doing. It should mostly be something that you are aware of for trouble-shooting.

A special note for laptops

If you have a laptop with an NVIDIA GPU (like a nice gaming laptop) then you should succeed with the instructions in this post. However, one unique problem on laptops is that you will likely have power saving control that switches your display driver back to the CPU's integrated display. A current Windows 10 setup on your laptop along with the latest driver should automatically switch your display to the NVIDIA driver when you start TensorFlow (same as starting up a game) but, if you have trouble that looks like TensorFlow is not finding your GPU then you may need to manually switch your display. You will likely find options by right clicking on your desktop.


Step 2) Python Environment Setup with Anaconda Python

I highly recommend Anaconda Python. If you need some arguments for using Python take a look at my post Should You Learn to Program with Python. For arguments on why you should use the Anaconda Python distribution see, How to Install Anaconda Python and First Steps for Linux and Windows. Another reason for using Anaconda Python in the context of installing GPU accelerated TensorFlow is that by doing so you will not have to do a CUDA install on your system.

Anaconda is focused toward data-science and machine learning and scientific computing. It installs cleanly on your system in a single directory so it doesn't make a mess in your systems application and library directories. It is also performance optimized for important numerical packages like numpy, scipy etc..

Download and Install Anaconda Python

Anaconda download

You can download an "Run" at the same time or download to your machine and double click on the "exe" file to start the installer.

  • You will be asked to accept a license agreement ...
  • "Select Install Type" I recommend you chose "Just Me" since this is part of your personal development environment.
  • "Chose Install Location" I recommend you keep the default which is at the top level of you user directory.
  • "Advanced Installation Options"

Advanced install opts

"Register Anaconda as my default Python 3.7" is recommended." "Add Anaconda to my PATH environment variable" is OK to select. However, you don't really need to do that. If you use the GUI, Anaconda Navigator, the (DOS) shell or the PowerShell link in the Anaconda folder on your start menu they will temporarily set the proper PATH environment for you without making a "permanent" change to your PATH variable. For this install I will leave it un-checked.

My personal preference it to "Add Anaconda to my PATH" because I want it to be found whenever I use Python.

Note: This version of the Anaconda distribution supports "Python environments" in PowerShell which is my personal preferred way to to work with "conda" on Windows.

Check and Update your Anaconda Python Install

Go to the "Start menu" find the "Anaconda3" item and then click on the "Anaconda Powershell Prompt",

Powershell prompt for Anaconda

With "Anaconda Powershell" opened do a quick check to see that you now have Anaconda3 Python 3.7 as your default Python.

(base) PS>python
Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

Type CTRL-D to exit the Python prompt.

Update your base Anaconda packages

`conda` is a powerful package and environment management tool for Anaconda. We'll use `conda` from Powershell to update the base Python install. Run the following commands. It may take some time to do this since there may be a lot of modules to update.

conda update conda

conda update anaconda

conda update python

conda update --all

That should bring your entire base Anaconda install up to the latest packages. (Everything may already be up to date.)

Anaconda Navigator

There is a GUI for Anaconda called `anaconda-navigator`. I personally find it distracting/confusing/annoying and prefer using `conda` from the command-line. Your taste may differ! ... and my opinion is subject to change if they keep improving it. If you are new to Anaconda then I highly recommend that you read up on `conda` even (or especially!) if you are thinking about using the "Navigator" GUI.


Step 3) Create a Python "virtual environment" for TensorFlow using conda

You should set up an environment for TensorFlow separate from your base Anaconda Python environment. This keeps your base clean and will give TensorFlow a space for all of it's dependencies. It is in general good practice to keep separate environments for projects especially when they have special package dependencies. Think of it as a separate "name-space" for your project.

There are many possible options when creating an environment with conda including adding packages with specific version numbers and specific Python base versions. This is sometimes useful if you want fine control and it also helps with version dependency resolution. Here we will keep it simple and just create a named environment, then activate that environment and install the packages we want inside of that.

  • From the "Anaconda Powershell Prompt" command line do,
conda create --name tf-gpu

I named the environment 'tf-gpu' but you can use any name you want. For example you could add the version number.

NOTE: avoid using spaces in names! Python will not handle that well and you could get get strange errors. "-" and "_" are fine. (Python programmers often use underscores.)

  • Now exit from the Powershell you are using and then open a new one before you activate the new "env". This is an annoying quirk but, powershell will not re-read it's environment until you restart it. If you activate the new "env" before you restart you will not be able to do any package installs because the needed utilities will not be on the path in the current shell until after a restart.
  • "activate" the environment, (I'll show my full Powershell prompt and output instead of just the commands)
(base) PS C:Usersdon> conda info --envs
# conda environments:
#
base * C:UsersdonAnaconda3
tf-gpu C:UsersdonAnaconda3envstf-gpu


(base) PS C:Usersdon> conda activate tf-gpu

(tf-gpu) PS C:Usersdon>

The `conda info --envs` command shows the "envs" you have available.

After doing `conda activate tf-gpu` you can see that the prompt is now preceded by the the name of the environment `(tf-gpu)`. Any conda package installs will now be local to this environment.


Step 4) Install TensorFlow-GPU from the Anaconda Cloud Repositories

There is an "official" Anaconda maintained TensorFlow-GPU package for Windows 10!

A search for "tensorflow" on the Anaconda Cloud will list the available packages from Anaconda and the community. There is a package "anaconda / tensorflow-gpu 1.13.1" listed near the top that has builds for Linux and Windows. This is what we will be installing from the commands below.

This command will install the latest stable version of TensorFlow with GPU acceleration in this conda environment. (It will be the latest version maintained by the Anaconda team and may lag by a few weeks from any fresh release from Google.)

(tf-gpu) C:Usersdon> conda install tensorflow-gpu

That's it! You now have TensorFlow with NVIDIA CUDA GPU support!

This includes, TensorFlow, Keras, TensorBoard, CUDA 10.0 toolkit, cuDNN 7.3 along with all of the dependencies. It's all in your new "tf-gpu" env ready to use and isolated from other env's or packages on your system.


Step 5) Simple check to see that TensorFlow is working with your GPU

You can use the powershell that you have activated the tf-gpu env in and did the TensorFlow install with or open a new one and do ` conda activate tf-gpu`.

With your tf-gpu env active type the following,

python

Your prompt will change to the python interpreter prompt. this will be a simple test and we'll use a nice feature of recent TensorFlow releases, eager execution.

>>> import tensorflow as tf

>>> tf.enable_eager_execution()

>>> print( tf.constant('Hello from TensorFlow ' + tf.__version__) )

(that is 2 underscores before and after "version")

My session including the output looked like this,

(base) PS>conda activate tf-gpu

(tf-gpu) PS>python

Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32

Type "help", "copyright", "credits" or "license" for more information.

>>> import tensorflow as tf

>>> tf.enable_eager_execution()

>>> print( tf.constant( 'Hellow from TensorFlow ' + tf.__version__ ) )

2019-04-24 18:08:58.248433: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

2019-04-24 18:08:58.488035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:

name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate(GHz): 1.2785

pciBusID: 0000:01:00.0

totalMemory: 4.00GiB freeMemory: 3.30GiB

2019-04-24 18:08:58.496081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0

2019-04-24 18:08:58.947914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-04-24 18:08:58.951226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0

2019-04-24 18:08:58.953130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N

2019-04-24 18:08:58.955149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3005 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980, pci bus id: 0000:01:00.0, compute capability: 5.2)

tf.Tensor(b'Hellow from TensorFlow 1.13.1', shape=(), dtype=string)

>>>

When you first run TensorFlow it outputs a bunch of information about the execution environment it is in. You can see that it found the GTX 980 in this system and added it as an execution device.

Next we will do something a little more useful and fun with Keras, after we configure Jupyter notebook to use our 'tf-gpu' environment.


Step 6) Create a Jupyter Notebook Kernel for the TensorFlow Environment

You can work with an editor and the command line and you often want to do that but, Jupyter notebooks are great for doing machine learning development work. In order to get Jupyter notebook to work the way you want with this new TensorFlow environment you will need to add a "kernel" for it.

With your tf-gpu environment activated do,

conda install ipykernel jupyter

Note: I installed both ipykernel and jupyter above since jupyter was not installed by default when we created the tf-gpu env. jupyter is installed by default in the (base) env.

Now create the Jupyter kernel,

python -m ipykernel install --user --name tf-gpu --display-name "TensorFlow-GPU-1.13"

You can set the "display-name" to anything you like. I included the version number here.

With this "tf-gpu" kernel installed, when you start Jupyter notebook you will now have an option to to open a new notebook using this kernel.

Start a Jupyter notebook,

jupyter notebook

Look at the "New" menu,

Jupyter kernel for TF

Note: If you start a jupyter notebook from the (base) env you will see "TensorFlow-GPU-1.13" option but you will not be able to import tensorflow in that notebook because TensorFlow is only installed in the "tf-gpu" env. [You could have installed into your (base) env but, I recommend that you keep separate env's.]


Step 7) An Example Convolution Neural Network training using Keras with TensorFlow

In order to check everything out lets setup the classic neural network LeNet-5 using Keras using a Jupyter notebook with our "TensorFlow-GPU-1.13" kernel. We'll train the model on the MNIST digits data-set and then use TensorBoard to look at some plots of the job run.

You do not need to install Keras or TensorBoard separately since they are now included with the TensorFlow install.

Activate your "tf-gpu" env

Launch "Anaconda Powershell" and then do,

conda activate tf-gpu

Create a working directory (and log directory for TensorBoard)

I like to have a directory called "projects" in my user home directory. In the project directory I create directories for things I'm working on. Of course, you can organize your work however you like. ... But I do highly recommend that you learn to use the command-line if your are not familiar with working like that. You can thank me later!

In powershell the the following commands are useful for managing directories,

To see what directory you are in,

pwd

(if you just opened "Anaconda Powershell" you should be in your "user home directory")

To create a new directory (and additional subdirectories all at once)

Note: when you are working with "code" I highly recommend that you **do not use spaces in directory or file names**.

# in the new version 1.14 you no longer need to create the logs file for Tensorboard
# It is still good to create a working directory 
# mkdir projects/tf-gpu-MNIST/logs
mkdir projects/tf-gpu-MNIST

That one command above gives you a work directory, "tf-gpu-MNIST", and a "logs" subdirectory.

Note: In powershell you can use "/" or "" to separate directories. (It has many commands that would be the same in Linux and you can use those alternatively to "DOS" like commands. )

To change directory use "cd"

cd projects/tf-gpu-MNIST

(For completeness) To delete a directory you can use the ` rmdir` command

IMPORTANT!

***********************************************************

The older version (1.13.1) was able to use UNIX like file paths on Windows but it looks like version 1.14 does not! You need to change this,

tensor_board = tf.keras.callbacks.TensorBoard('./logs/LeNet-MNIST-1')

to this,

tensor_board = tf.keras.callbacks.TensorBoard('.\logs\LeNet-MNIST-1') 

I also noticed that you no longer need to create the directory before hand i.e. if the directors .\logs\LeNet=MNIST-1 doesn't exist when you start the job run it will be created automatically.

*************************************************************

Launch a Jupyter Notebook

After "cd'ing: into your working directory and with the tf-gpu environment activated start a Jupyter notebook,

jupyter notebook

From the 'New' drop-down menu select the 'TensorFlow-GPU-1.13' kernel that you added (as seen in the image in the last section). You can now start writing code!


MNIST hand written digits example

The following "code blocks" can be treated as jupyter notebook "Cells". You can type them in (recommended for practice) or cut and past. To execute the code in a cell use `Shift-Return`.

We will setup and train LeNet-5 with the MNIST handwritten digits data.

Import TensorFlow

import tensorflow as tf

Load and process the MNIST data

mnist = tf.keras.datasets.mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# reshape and rescale data for the CNN

train_images = train_images.reshape(60000, 28, 28, 1)

test_images = test_images.reshape(10000, 28, 28, 1)

train_images, test_images = train_images/255, test_images/255

Create the LeNet-5 convolution neural network architecture

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Dropout(0.25),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(10, activation='softmax')
])

Compile the model

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Set log data to feed to TensorBoard for visual analysis

tensor_board = tf.keras.callbacks.TensorBoard('./logs/LeNet-MNIST-1')

Train the model (with timing)

import time

start_time=time.time()

model.fit(train_images, train_labels, batch_size=128, epochs=15, verbose=1,
         validation_data=(test_images, test_labels), callbacks=[tensor_board])

print('Training took {} seconds'.format(time.time()-start_time))

The results

After running that training for 15 epochs the last epoch gave,

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
60000/60000 [==============================] - 6s 105us/sample - loss: 0.2400 - acc: 0.9276 - val_loss: 0.0515 - val_acc: 0.9820
...
...
Epoch 15/15
60000/60000 [==============================] - 5s 84us/sample - loss: 0.0184 - acc: 0.9937 - val_loss: 0.0288 - val_acc: 0.9913
Training took 79.47694969177246 seconds

Not bad! Training accuracy 99.37% and Validation accuracy 99.13%

It took about 80 seconds on my old Intel i7-4770 box with an NVIDIA GTX 980 GPU (it's about 17 times slower on the CPU).


Look at the job run with TensorBoard

Open another "Anaconda Powershell" and activate your tf-gpu env, and "cd" to your working directory,

conda activate tf-gpu

cd projects/tf-gpu-MNIST

Then startup TensorBoard

tensorboard --logdir=./logs --port 6006

It will give you a local web address with the name of your computer (like the lovely name I got from this test Win10 install)

tensorboard start

Open that address in your browser and you will be greeted with (the wonderful) TensorBoard. These are the plots it had for that job run,

Note: on Chrome I had to use localhost:6006 instead of the address returned from Tensorboard

TensorBoard output

Note: For a long training job you can run TensorBoard on a log file during the training. It will monitor the log file and let your refresh the plots as it progresses.


Conclusion

That MNIST digits training example was a model with 1.2 million training parameters and a dataset with 60,000 images. **It took 80 seconds utilizing the NVIDIA GTX 980 on my old test system! For reference it took 1345 seconds using all cores at 100% on the Intel i7-4770 CPU in that machine. That's an 17 fold speedup on the GPU. That's why you use GPU's for this stuff!**

Note: I used the same procedure for doing the CPU version. I created a new "env" naming it "tf-CPU" and installed the CPU only version of TensorFlow i.e. `conda install tensorflow` without the "-gpu" part. I then ran the same Jupyter notebook using a "kernel" created for that env.

I sincerely hope this guide helps get you up-and-running with TensorFlow. Feel free to add comments if you have any trouble. Either myself or someone else in the community will likely be able to help you!

Happy computing! --dbk


Looking for a
Scientific Compute System?

Do you have a project that needs serious compute power, and you don't know where to turn? Puget Systems offers a range of HPC workstations and servers tailored for both CPU and GPU workloads.

Why Choose Puget Systems?


Built specifically for you

Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.

Fast Build Times

By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry leading ship time of 7-10 business days on nearly all our system orders.

We're Here, Give Us a Call!

We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!

Lifetime Support/Labor Warranty

Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.

Click here for even more reasons!

Tags: Windows, Machine Learning, Tensorflow, NVIDIA, GPU
Bernardo Rufino

Thank you for such a detailed article!

It helped me a lot not only for installing tensorflow with GPU support, but also do see how great it's to use the Shell when you know the commands!

Thanks!

Posted on 2019-05-03 02:48:21
Donald Kinghorn

You are welcome. I really like where things are headed with Powershell (and WSL) on Windows 10. I was happy to see that Anaconda has full support for PS now. The command-line will give you great powers :-)

It's even getting to where you can go back and forth between Windows and Linux pretty seamlessly. I have SSH, both client and server, running on Win 10. That gives some very interesting usage possibilities in a heterogeneous environment.

I'll be doing more posts about this kind of thing.

Posted on 2019-05-03 15:17:05
dt

There is a missing reference to the TensorBoard callback in the MNIST example. I modified my notebook in this part to look like this

import time

tbCallBack = tf.keras.callbacks.TensorBoard(log_dir='./Graph', histogram_freq=0,

write_graph=True, write_images=True)

start_time=time.time()

model.fit(train_images, train_labels, batch_size=128, epochs=15, verbose=1,

validation_data=(test_images, test_labels), callbacks=[tbCallBack])

print('Training took {} seconds'.format(time.time()-start_time))

Posted on 2019-05-03 05:43:40
Donald Kinghorn

I think you just missed it :-) I have this in there between the compile and fit

tensor_board = tf.keras.callbacks.TensorBoard('./logs/LeNet-MNIST-1')

but I only set the log directory and left everything else as defaults

There are a whole bunch of options when you instantiate a TensorBoard callback. The args you passed made me go look at the new docs (which I hadn't done yet :-)
https://www.tensorflow.org/...

Thanks --Don

Posted on 2019-05-03 14:23:31
Adam Kowalczewski

This is a fantastic alternative to any other instructions I've seen. Got me up and running with a gpu in the most direct way. Thank you so much.

Posted on 2019-05-11 21:41:56
Werner van Waesberghe

Many thanks. First test runs approx. 15 times faster.

Posted on 2019-05-11 23:53:42
Franz Hahn

Unfortunately I cannot get Tensorboard to work. The process runs, the python.exe in my tf-gpu env is allowed through my firewall, but when I open the tensorboard url I get an ERR_CONNECTION_REFUSED. Any ideas?

Posted on 2019-05-17 14:12:18
Florian Bautry

Try to go directly to localhost:6006 in your navigator.

Posted on 2019-05-24 03:54:34
Franz Hahn

I did, and I also ran the tensor board server on other ips to see if it was an IP conflict. Not the case, doesn't work on any IP.

Posted on 2019-05-24 06:33:10
Pranabesh Das

This one (http://localhost:6006/#scalars) worked for me. Thanks!

Posted on 2019-06-08 05:22:17
Donald Kinghorn

Tensorboard sometimes gives people trouble. I'm not completely sure why.

The two things that I believe cause the most trouble are,
1) the directory path to the log file that the "callback" needs to write the data to is not correct somewhere in your code and
2) Tensorboard permissions to read that data.

(I've messed up with tensorboard before by using "logs" in one place and "log" in another and not catching it)

I had some strange trouble myself last weekend (not tensorbaord related) on a new Win10 install that had oneDrive enabled. I was creating python directories and files with Powershell and they disappeared within file explorer!! (and didn't show up in oneDrive either) I suspect that sometimes oneDrive can do unexpected things on your system and could cause difficulty for programs like Tensorboard ??? It's possible that your "conection refused" error is because the log file isn't where you think it is and access to it is restricted because of one Drive (applications have to be registered to access those files (?)) [ I lost some of my work! I got so frustrated with it that I reinstalled the system and left oneDrive disabled ]

My advise is to double check that your "logs" directory is where you think it is and then make sure that oneDrive has nothing to do with the directories you are using.
Another thing that might help is to create a directory in your user directory ( I use "projects") and do all of your work in there (sub-directories). ... and if in doubt give full paths to file/dir names i.e. C:\Users\don\projects\TFtest1\logs

Posted on 2019-05-24 16:11:22
Dinesh Muniandy

Hi Donald,

Just a question, since this article doesn't focus on installing CUDA (or it's libraries) to use Tensorflow GPU with; are we going to be experiencing any form of lost in performance - meaning, the speed/efficiency that we usually get while training models (with CUDA) ?

Update 1:
* mkdir projects/tf-gpu-MNIST/logs (didn't work for me)
* mkdir projects\tf-gpu-MNIST\logs (changing this to backslash, worked for me)

Posted on 2019-05-18 11:28:04
Donald Kinghorn

Hi Dinesh, Yes! the mkdir command is the same in CMD and in Powershell but in Powershell you can use either directory separator.

Also, the cuda toolkit cuBLAS and cuDNN that get installed with the conda command are the same packages (libraries DLL's) that would be installed from a direct system wide install. But they are localized and the correct versions for the particular build of TF

Posted on 2019-05-24 15:33:25
S A

This was a great article. It is tough to find guides this detailed that review all the potential pitfalls as well.

Out of curiosity, could you deploy your Jupyter notebook onto Google Colab? Especially if you wanted to make your model available for inference by others? Have you had any experience using NiftyNet or similar packages within this environment?

Thanks!

Posted on 2019-05-26 01:22:44
Donald Kinghorn

Thanks! covering the pitfalls was part of my motivation for doing this post. The one last year left out just enough that a lot of people ran into trouble.

Colab is really nice and I wouldn't be surprised to see this MNIST with LeNet example already up there as an example.

NiftyNet looks really interesting, I hadn't seen that before.

Something I looked at today that got me motivated was the recently released TensorFlow-graphics https://github.com/tensorfl... It's unsupervised learning with training simultaneously on an encoder and decoder for images ... based on "differentiable graphics" ... There are some tutorials on Colab that are linked from the github page. I think I want to dig into this one a bit :-)

Posted on 2019-05-29 01:56:54
Prachi Sharma

This is the best post ! Everything is clearly written here. :)

Posted on 2019-06-04 13:52:00
Donald Kinghorn

Tank you :-)

Posted on 2019-06-05 16:30:50
Pranabesh Das

Great post! Thanks a ton.

Posted on 2019-06-08 05:23:15
BobVan

This is terrific. I spent the past two days trying to set up a new laptop with an RTX 2060 card installed. Your instructions worked perfectly.

THANK YOU!!!!!!!!

Posted on 2019-06-10 02:13:15
Prachi Sharma

Hi! I did the same way many days back. Everything was working fine whenever I was doing only transfer learning but when I started doing training using CNNs, this error is shown on notebook:

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node block1_conv1/convolution}}]]

I am not able to solve it. If you have any suggestions to solve it, then do let me know.

Specs: Windows 10, tensorflow-gpu = 10.0.0, CUDA = 10.0.0, Cudnn = 7.7.0, Builtin GPU = GeForce GTX 1060 3GB, Compute capability = 6.1

Thanks in Advance !

Posted on 2019-06-13 14:19:30
Donald Kinghorn

Hi Prachi, I just checked on my laptop (similar with 1060). This systems has updated to Win10 19.03 since I had last used anaconda on it. It was badly broken! I have reinstalled to the latest anaconda version as described in this post. Just now checking some things...

The anaconda developers have been making a lot of changes recently and there have been problems reported. I just now had errors when trying to use Powershell. These went away when I did an update i.e.
conda update conda
conda update --all

I re-did the tf-gpu install and ran the MNIST notebook everything looks OK.

I noticed one thing strange in what you reported Cudnn = 7.7.0 In the install I just did it added these, (conda list shows this)
cudatoolkit 10.0.130 0
cudnn 7.6.0 cuda10.0_0
tensorflow-gpu 1.13.1 h0d30ee6_0

You may have gotten a cudnn 7.7.0 somehow ??? (that would be a version conflict) I suggest that you create a new env (maybe tf-gpu-new) and the then do a fresh tensorflow-gpu install in that.

I'm hopeful this will fix things for you. Post back with more info if it doesn't. --Don

Posted on 2019-06-15 01:54:15
Suhaas Valanjoo

I installed Anaconda for Python 3.7 on Windows 10. If I used activate or "conda activate" command in the Powershell prompt, it just would not work, but it works very well in windows command line. Basically, I could not see the indicator that asterisk moves to preceding position of the newly activated environment, but I clearly see it with windows prompt. There is an open issue on github. https://github.com/conda/co...

https://uploads.disquscdn.c...

Pictures attached https://uploads.disquscdn.c... https://uploads.disquscdn.c... https://uploads.disquscdn.c...

Posted on 2019-06-15 02:29:29
Donald Kinghorn

They made some changes recently that have cause some trouble. I had a similar problem and I think I resolved by doing "conda init" in Powershell ...

It has been a bit of a mess. Try doing a conda init and then conda activate tf-gpu

Posted on 2019-06-18 21:24:21
Suhaas Valanjoo

Thanks, Donald. Big mess, true :-). I spent hours trying to figure out the workaround.

Posted on 2019-06-18 22:41:51
Suhaas Valanjoo

'Conda init' did not solve the issue (picture atatched)

Posted on 2019-06-20 01:52:45
Donald Kinghorn

Dang! I know I had the same issue but don't remember what fixed it. Try this (if you haven't already),
open PS then do cmd to change to dos shell then
conda update conda
conda update --all
exit
exit
Then try again.
I was frustrated with anaconda for a couple of weeks because of some of their changes. I messed with it a few times including re-installs to older releases and then finally to the latest update. Everything is working the way I want now but the PS issues were to most annoying!

Posted on 2019-06-20 16:26:17
Padmakumar Nambiar

Hello Donald, Great article... Thanks for sharing!

Now, I'm facing the following issue related to performance of my ML code and thought you may be able to help please...

This particular call in "mrcnn.model.py" takes about 13 secs to return !!

r = model.detect([image], verbose=0)[0]

… which finally ends in the following call in the "sessions.py" file,...

ret = tf_session.TF_SessionRunCallable(self._session._session, self._handle, args, status, run_metadata_ptr)

… which in turn calls the following function call in "pywrap_tensorflow_internal.py" (which consumes most of the time delay mentioned above):

def TF_SessionRunCallable(session, handle, feed_values, out_status, run_metadata):

return _pywrap_tensorflow_internal.TF_SessionRunCallable(session, handle, feed_values, out_status, run_metadata)

TF_SessionRunCallable = _pywrap_tensorflow_internal.TF_SessionRunCallable

My TF version is: 1.13.1

Any help will be greatly appreciated. Thanks.

Posted on 2019-06-18 09:48:17
Donald Kinghorn

I don't have any good ideas on that. It does look like it is something with tensorflow itself since the tie is going in to the python wrapper call. There is a lot going on in the model with mask-rcnn and it could be using resnet-50 or -101 ... I've never used this so I don't know what would be good/bad performance timing.

I wish I could offer you better advise but I'm afraid I don't have any ideas.

Posted on 2019-06-18 21:17:55
Padmakumar Nambiar

Hi Donald, Thanks for your reply... Just in case it helps, I uninstalled my existing TF version, and installed tf-gpu-1.3.1 instead, and now I find a bunch of errors around the pywrap function mentioned above (pasted below). I changed it back to 1.13.1, those errors disappeared, but it started taking 13 secs for each frame in the video.

<<anaconda prompt="">>python video_demo.py
Traceback (most recent call last):
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module> from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module> _pywrap_tensorflow_internal = swig_import_helper()
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\Users\user\Anaconda3\lib\imp.py", line 242, in load_module return load_dynamic(name, filename, file)
File "C:\Users\user\Anaconda3\lib\imp.py", line 342, in load_dynamic return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "video_demo.py", line 2, in <module> from visualize_cv2 import model, display_instances, class_names
File "C:\Users\user\eclipse-workspace\New_mask_RCNN\visualize_cv2.py", line 6, in <module> from mrcnn import utils
File "C:\Users\user\eclipse-workspace\New_mask_RCNN\mrcnn\utils.py", line 15, in <module> import tensorflow as tf
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\__init__.py", line 24, in <module> from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\__init__.py", line 49, in <module> from tensorflow.python import pywrap_tensorflow
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in <module> raise ImportError(msg)

ImportError: Traceback (most recent call last):
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module> from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module> _pywrap_tensorflow_internal = swig_import_helper()
File "C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\Users\user\Anaconda3\lib\imp.py", line 242, in load_module return load_dynamic(name, filename, file)
File "C:\Users\user\Anaconda3\lib\imp.py", line 342, in load_dynamic return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

Posted on 2019-06-19 03:52:37
nandodmelo

I will try your approach, but I have a trivial question, what’s the difference between GPU acceleration without installing CUDA and Installing CUDA/cuDNN? Or is it the same thing?

Posted on 2019-06-24 15:31:04
Donald Kinghorn

That is a very good question and it's important to understand!

I say "without installing CUDA" but really all of the libraries are installed along with tensorflow-gpu when you use the Anaconda package. They are kept in the resources sub-directories along with the TensorFlow-GPU package install. They are isolated from the rest of your setup by creating the "env" i.e. tf-gpu. That does 3 things for you;

1) It keeps you from having to do a full manual install of CUDA (which requires that you also install MS Visual Studio)

2) It gives you the correct versions of the CUDA libraries that you need! That last point is really important because if you just did a CUDA install you would likely be getting the latest version from NVIDIA which right now is 10.1. Google is compiling TensorFlow 1.13, 1.14 and 2.0.0-beta1 using CUDA 10.0 i.e. CUDA 10.1 wouldn't work!

3) lastly it gives you an easy way to have multiple packages installed that are using different version of the CUDA, cuDNN libraries. For example you could do the same kind of install for PyTorch linked against CUDA 10.1 or 9.2 or whatever.

It also makes upgrade paths a lot cleaner too, just make a new env and install a new version. Bottom line is it helps you keep from having a mess on your system.

P.S. I'm working on a short post right now that will use the tensorflow-gpu seup in this post as the basis for setting up a TensorFlow 2.0.0-beta1 install for testing ( we can do that because TF 2 beta has the same CUDA dependencies at the other versions :-) --Don

Posted on 2019-06-25 17:17:29

Thank you. It works. I got like 20 times speedup using GTX 1060 compared to my CPU i7-3770 @ 3.4GHz 3.9GHz.
The GPU utilization was around 40%. Very crucial detail is the neural network structure.
If your network is too small you won't gain any speedup. I tried with the simple MNIST model example on TensorFlow tutorial and I gained nothing.
That mistake made me thought I installed GPU version incorrectly until I try LeNet-5 model and saw 20 times speedup.

Posted on 2019-06-24 20:28:46
Donald Kinghorn

Good point! I didn't really discuss that in the post. Small test jobs and a lot of "learning" examples might not show much or any improvement in performance on GPU. It's when the model and/or the data-set gets larger that the GPU really starts to be a BIG advantage.

Posted on 2019-06-25 16:59:19
Sri Harsha

Hey,

I see the tensorflow page lists that python 3.6 to be the requirement, I dont understand how you were able to install it on python 3.7.
Can you please explain that ?

Thanks

Posted on 2019-06-28 15:04:42
Donald Kinghorn

It's not strictly a requirement. Although it's probably good not to use a Python version older than that. The dependencies are determined when the code is compiled. The devs working on the build for Anaconda probably used the latest Python in their environment when they compiled TensorFlow-GPU for their package.

"conda" is a package manager. When you do "conda install tensorflow-gpu" it is going to pull the package from the official build on Anaconda cloud. conda will have a list of the dependencies for the package and make sure that they are met when it does the install (it will warn you if it needs to downgrade any existing package in the env). You don't have to worry about it.

However, you probably could use Python 3.6 is you really wanted to. You can set a Python base version when you create an env. For example you could try something like

conda create --name tf-gpu-py36 python=3.6
conda activate tf-gpu-py36
conda install tensorflow-gpu

let me try that ... OK... in that env "conda list" shows (leaving out most of the output)
...
cudatoolkit 10.0.130 0
cudnn 7.6.0 cuda10.0_0
...
python 3.6.8 h9f7ef89_7
...
tensorboard 1.13.1 py36h33f27b4_0
tensorflow 1.13.1 gpu_py36h9006a92_0
tensorflow-base 1.13.1 gpu_py36h871c8ca_0
tensorflow-estimator 1.13.0 py_0
tensorflow-gpu 1.13.1 h0d30ee6_0

I tested it and it works fine :-) Personally I would keep the latest Python 3.7 unless you have some code that has a real conflict with it that you want to use in the same env. You have a LOT of control over versioning with conda!

Posted on 2019-06-28 16:24:03
Mr 53,461

Hi Donald,
I want to add one of the the tensorflow/examples/ that I find on github
Where is the 'right' place to put it?

(base) PS C:\Users\jeff> conda info --envs
base * C:\jeff\anaconda3
tf-gpu C:\jeff\anaconda3\envs\tf-gpu

Thanks!

Posted on 2019-07-02 18:17:19
Donald Kinghorn

You can put source anywhere you like. I usually create a directory under my user directory called projects (I do this on Linux and Windows) and then I create directories in there for anything that I'm working on.

If you open Powershell you can do
pwd ( to see what dir you are in (should be C:\Users\jeff )
mkdir projects
cd projects
mkdir tf-examples
cd tf-examples

And then but your stuff from GitHub in there (of course you can use any dir name you like and you can use the GUI file manager to do this too)

One thing to note, I advise you to *not* use spaces in any directory or file names. Windows allows that but it can cause problems with Python.

Posted on 2019-07-02 23:13:43
Mr 53,461

Works!
For anyone else's benefit...
In anaconda powershell I say 'conda activate tf-gpu' and the prompt look like this: (tf-gpu) PS C:\Users\jeff\projects\tf-examples.
Then I did a big github clone: git clone https://github.com/tensorfl...

Now when I start a jupyter notebook I can import models

Thanks a lot Donald!!

Posted on 2019-07-03 14:54:30
Lina Chato

Hi,
Thank you for the great article....
I have Quadro P2000 GPU and this is not good for DL/ML applications. so I buy TITAN. Please I would be thankful if you could advise! Should I reinstall tensorflow in my system again after mounting the new GPU and install its driver?

Posted on 2019-07-05 16:42:37
Donald Kinghorn

I don't think you will need to reinstall TensorFlow if your setup has been working well for you.

You should be able to install the Titan and then restart. The system should detect the card and then update the driver. Be sure to check the driver version like I have suggested in this post. It would be good to update it to the latest release (430 right now I believe)

Posted on 2019-07-05 18:20:09
Thomas Chu

Thanks a lot!
It works for my system(HP Z4 G4+Titan).

Posted on 2019-07-08 19:52:56
pl709

Sir, you post is like a shining light in a stormy skyI've battling with the system/CUDA/cuDNN/py/tf-gpu compatibility problems for a looooong time

Posted on 2019-07-26 02:33:05
Donald Kinghorn

Ha ha, I'm glad you were able to see the light! I do appreciate your (and everyone's) kind works :-) --Don

Posted on 2019-07-26 15:27:24
Divyansh Jain

Excellent Article!! Thank you so much :)

Posted on 2019-08-03 12:46:23
Brandon Elford

Amazing, you have ended two weeks worth of time trying to install TF-GPU. Thank you so much!

Posted on 2019-09-09 15:35:16
Andrew Samsock

Great install guide! Thank you. I'm getting the following error when trying to run the NMIST code sample.

2019-09-23 14:46:59.020150: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'cupti64_100.dll'; dlerror: cupti64_100.dll not found
2019-09-23 14:46:59.024270: W tensorflow/core/profiler/lib/profiler_session.cc:182] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found.
2019-09-23 14:46:59.035553: I tensorflow/core/platform/default/device_tracer.cc:641] Collecting 0 kernel records, 0 memcpy records.
2019-09-23 14:46:59.038756: E tensorflow/core/platform/default/device_tracer.cc:68] CUPTI error: CUPTI could not be loaded or symbol could not be found.

Posted on 2019-09-23 21:49:08
Donald Kinghorn

Hummm, I'm not sure about that one. Did you try to do a pip install as outlined on the TensorFlow docs page? If so you may be having some library conflicts. If not, and you have just followed what is in this guide then I'm a not sure ... let me try something ...

OK, I tried my existing install on this Win10 laptop worked fine (was using Python 3.7.3 and tensorflow 13.1) Then in that env I did conda update --all which moved python to 3.7.4 and tensorflow to 1.14 BROKEN! I'm getting " ModuleNotFoundError: No module named 'win32api'" when I start up a jupyter notebook

... I've been messing around for a couple of hours trying to find the problem ... I tried creating an env with python 3.7.3 and tensorflow 1.13.1 but hit same problem. I can run everything fine from the console but jupyter and jupyterlab are broken ( but I'm not sure if it's jupyter or something that jupyter depends on ??)

Looks like something is messed up with some recent update in Anaconda. I was not able to reproduce the error you saw but got some new ones of my own.

This may take a while to sort out YUK! ... sorry ... I'll hack on it some more but right now my newer envs are all broken for Jupyter on Windows

Posted on 2019-09-24 00:57:09
Donald Kinghorn

If you are still running into problems ... TensorFlow 2rc seem to be working OK ... see above

Posted on 2019-09-26 16:27:33
Donald Kinghorn

Just to let everyone know ... Today (Sept. 24 2019) I did a fresh install of Anaconda Python 2019.07 and it is not working correctly! My jupyter notebooks launch and fail to load a kernel with a missing win32api error. (command line runs seem to be OK) This is probably a broken Anaconda distribution since the 2019.03 install I had on the systems before this reinstall was working fine.

I did all of the updates like I mention in this post. Things might work OK with this 2019.07 install if you don't do the updates ???

For anyone having trouble. Here is a link to the Anaconda installer archives where you can find the 2019.03 package. I'm going to revert to that and see if things are working again ... Crud! That didn't work either ... this may just be broken for a while until things get fixed upstream --Don

Posted on 2019-09-25 03:49:14
Isaac Mather

Hi Don,

Great article! Wow!

Quick q, did things get fixed upstream?

Cheers,
Isaac

Posted on 2020-01-07 19:12:16
Donald Kinghorn

Yes, thankfully! It's not uncommon for projects go through some "rough times" where things are broken for awhile. Once you have a good working environment setup you usually don't want to be too quick to update things :-) With this kind of work Windows is mostly done after Linux and doesn't get as much testing. It's getting better and better though!

It's getting to be time for me to do another post fully updated for TensorFlow 2. ... and I'm testing on Win10 20H1 now too. ... that is looking really good :-)

Posted on 2020-01-07 21:43:08
Isaac Mather

Oh most excellent!

Your guide came up while I am researching my first PyTorch, TensorFlow and and TensorboardX installation. You laid out such a detailed explanation of the process of installing TensorFlow, and cleared up many questions I had about installing PyTorch like what CUDA do I need, and what Visual Studio do I need. (Turns out neither with a fresh Anaconda install)

I'm following this tutorial: https://www.ahmedbesbes.com... which requires PyTorch 0.4.1 and suggests TensorboardX 1.8, both of which are outdated versions. If I follow your Install TensorFlow 2 beta1 (GPU) guide on top of your How to Install TensorFlow with GPU Support on Windows 10 guide, will installation of these older versions go smoothly? Assuming I use some install instructions like this: https://stackoverflow.com/q...

Edit: thank you for posting this guide in the first place! Having reviewed your blog post history, your content is hugely valuable for an aspiring machine learning engineer!

Second edit: i think, having read more of the comments, that everything should be all good!

Posted on 2020-01-10 18:54:18
Donald Kinghorn

You are on the right track. With conda you can create an env and install very specific versions of packages without messing up your base install.

If you go to https://anaconda.org/pytorc... you will see a tab called "Files" those are all of the builds that are in that repository. They go way back!

After you create your env you should be able to install PyTorch 0.4.1 with

conda install -c pytorch pytorch=0.4.1

That should pull all of the correct dependencies for that version.

Posted on 2020-01-11 00:19:27
Isaac Mather

Thank you for your help sir!

Posted on 2020-01-11 00:31:00
Isaac Mather

Update: I followed your guide, and it works! Using the MNIST hand written digits example above it works! Hooray! Thank you!

Posted on 2020-01-24 00:21:15
Ian

Thank you for all the great articles. Now that I am finally ready to try I notice this comment. Is there an ETA for "another post fully updated for TensorFlow 2" for Windows 10?
Also, any suggestions on 'localizing' CUDA (as in your TF on Linux articles ) on Windows 10 in case it is required for non-TF use, so it doesn't muck TF up ?

Posted on 2020-02-24 14:26:28
김학배

Thank you very much.
In the meantime, I couldn't install the tensorflow-gpu version on my PC.

thank you.

Posted on 2019-09-26 04:33:42
Donald Kinghorn

TensorFlow 2rc seem to be working OK ... see above

Posted on 2019-09-26 16:26:31
Donald Kinghorn

For everyone that has been having trouble with TensorFlow 1.14 working correctly in a Jupyter notebook (see my comment from Sept 24) I have good news...
TensorFlow 2rc seems to be working fine!
If you want to start using TensorFlow 2 (and you should) then you can install that using pip over the top of your install that was outlined above.

I wrote a post about doing this in June when beta1 had come out...
"Install TensorFlow 2 beta1 (GPU) on Windows 10 and Linux with Anaconda Python (no CUDA install needed)"
https://www.pugetsystems.co...

In that post the TensorFlow 2 version was beta1 It is now a release candidate and the version number is 2.0.0-rc1 so you would want to use the following command

pip install tensorflow-gpu==2.0.0-rc1

Please read the post mentioned above. It will have you clone your existing env to a new name and then do the pip TF2 install.

Posted on 2019-09-26 16:26:11
Pranab Das

First, thanks for the excellent guide :-)

Getting the following exception at the last stage of MNIST training:
NotFoundError Traceback (most recent call last)
<ipython-input-8-792ff921728a> in <module>
4
5 model.fit(train_images, train_labels, batch_size=128, epochs=15, verbose=1,
----> 6 validation_data=(test_images, test_labels), callbacks=[tensor_board])
7
8 print('Training took {} seconds'.format(time.time()-start_time))

~\AppData\Local\Continuum\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
778 validation_steps=validation_steps,
779 validation_freq=validation_freq,
--> 780 steps_name='steps_per_epoch')
781
782 def evaluate(self,

~\AppData\Local\Continuum\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
372 # Callbacks batch end.
373 batch_logs = cbks.make_logs(model, batch_logs, batch_outs, mode)
--> 374 callbacks._call_batch_hook(mode, 'end', batch_index, batch_logs)
375 progbar.on_batch_end(batch_index, batch_logs)
376

~\AppData\Local\Continuum\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\callbacks.py in _call_batch_hook(self, mode, hook, batch, logs)
246 for callback in self.callbacks:
247 batch_hook = getattr(callback, hook_name)
--> 248 batch_hook(batch, logs)
249 self._delta_ts[hook_name].append(time.time() - t_before_callbacks)
250

~\AppData\Local\Continuum\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\callbacks.py in on_train_batch_end(self, batch, logs)
529 """
530 # For backwards compatibility.
--> 531 self.on_batch_end(batch, logs=logs)
532
533 def on_test_batch_begin(self, batch, logs=None):

~\AppData\Local\Continuum\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\callbacks_v1.py in on_batch_end(self, batch, logs)
360 self._total_batches_seen += 1
361 if self._is_profiling:
--> 362 profiler.save(self.log_dir, profiler.stop())
363 self._is_profiling = False
364 elif (not self._is_profiling and

~\AppData\Local\Continuum\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\profiler.py in save(logdir, result)
142 logdir, 'plugins', 'profile',
143 datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
--> 144 gfile.MakeDirs(plugin_dir)
145 maybe_create_event_file(logdir)
146 with gfile.Open(os.path.join(plugin_dir, 'local.trace'), 'wb') as f:

~\AppData\Local\Continuum\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\lib\io\file_io.py in recursive_create_dir(dirname)
436 errors.OpError: If the operation fails.
437 """
--> 438 recursive_create_dir_v2(dirname)
439
440

~\AppData\Local\Continuum\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\lib\io\file_io.py in recursive_create_dir_v2(path)
451 errors.OpError: If the operation fails.
452 """
--> 453 pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(path))
454
455

NotFoundError: Failed to create a directory: ./logs/LeNet-MNIST-1\plugins\profile\2019-10-10_15-12-16; No such file or d

Posted on 2019-10-10 09:48:13
Pranab Das

Somehow my yesterday's post got deleted; hence reposting. While training the model with timing in the MNIST example, I get the following error. Any suggestion?
Train on 60000 samples, validate on 10000 samples
Epoch 1/15
128/60000 [..............................] - ETA: 13:23 - loss: 2.3159 - acc: 0.0625

---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
<ipython-input-7-792ff921728a> in <module>
4
5 model.fit(train_images, train_labels, batch_size=128, epochs=15, verbose=1,
----> 6 validation_data=(test_images, test_labels), callbacks=[tensor_board])
7
8 print('Training took {} seconds'.format(time.time()-start_time))

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
778 validation_steps=validation_steps,
779 validation_freq=validation_freq,
--> 780 steps_name='steps_per_epoch')
781
782 def evaluate(self,

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
372 # Callbacks batch end.
373 batch_logs = cbks.make_logs(model, batch_logs, batch_outs, mode)
--> 374 callbacks._call_batch_hook(mode, 'end', batch_index, batch_logs)
375 progbar.on_batch_end(batch_index, batch_logs)
376

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\callbacks.py in _call_batch_hook(self, mode, hook, batch, logs)
246 for callback in self.callbacks:
247 batch_hook = getattr(callback, hook_name)
--> 248 batch_hook(batch, logs)
249 self._delta_ts[hook_name].append(time.time() - t_before_callbacks)
250

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\callbacks.py in on_train_batch_end(self, batch, logs)
529 """
530 # For backwards compatibility.
--> 531 self.on_batch_end(batch, logs=logs)
532
533 def on_test_batch_begin(self, batch, logs=None):

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\callbacks_v1.py in on_batch_end(self, batch, logs)
360 self._total_batches_seen += 1
361 if self._is_profiling:
--> 362 profiler.save(self.log_dir, profiler.stop())
363 self._is_profiling = False
364 elif (not self._is_profiling and

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\profiler.py in save(logdir, result)
142 logdir, 'plugins', 'profile',
143 datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
--> 144 gfile.MakeDirs(plugin_dir)
145 maybe_create_event_file(logdir)
146 with gfile.Open(os.path.join(plugin_dir, 'local.trace'), 'wb') as f:

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\lib\io\file_io.py in recursive_create_dir(dirname)
436 errors.OpError: If the operation fails.
437 """
--> 438 recursive_create_dir_v2(dirname)
439
440

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\lib\io\file_io.py in recursive_create_dir_v2(path)
451 errors.OpError: If the operation fails.
452 """
--> 453 pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(path))
454
455

NotFoundError: Failed to create a directory: ./logs/LeNet-MNIST-1\plugins\profile\2019-10-11_10-51-33; No such file or directory

Posted on 2019-10-11 05:26:27
Donald Kinghorn

yes it looks like it was getting blocked because of links or something ...
The trouble you are having is most likely from not having the "logs" directory created before the call to tensorboard this happens often (to me too) there are some other comments that have the same problem.

Check that your directories exist and that they are visible from the directory that you are starting your jupyter notebook in --Don

Posted on 2019-10-11 14:53:00
Donald Kinghorn

Hi Pranab, I finally found the the problem! Look at the most recent comment (I made changes in the post too) -- Best wishes --Don

Posted on 2019-10-16 17:11:53
Pranab Das

Thanks :-)

Posted on 2019-10-18 06:01:25
Pranab Das

While other things went well, the utilization of NVIDIA GPU remained 0%. Could you throw some light on how to address?

Posted on 2019-10-30 03:24:23

Where are you looking at GPU utilization? If that is in Task Manager, it is worth noting that the default GPU usage shown there is "3D" (the way that games and other graphics-intensive applications use the GPU). If you want to use CUDA usage instead, go to Task Manager -> Performance tab -> GPU (on the left side) -> select "Cuda" from one of the drop-down menus on the right side.

Posted on 2019-10-30 15:43:21
Pranab Das

Thanks :-)
The problem manifested in different ways. One's what you responded to. The other is GPU getting out of memory upon increasing the number of deep neurons. At least that is what the curt message is interpreted by Tensorflow community. I thought it should not happen with the NVIDIA GPU kicking in (after following your instructions). That the code is ok was tested by running it in a cloud env.

Posted on 2019-10-30 16:18:20
Donald Kinghorn

Running out of GPU mem is, unfortunately, common! [That's why some folks end up getting Titan RTX with 24GB but usually 12GB is OK for lots of work 2-4GB cards can't really do anything but small models] One thing to try if you think you are getting an OOM (out of memory) error but shouldn't. Sometimes code crashes can cause a memory leak in GPU mem in a Jupyter notebook. You can try shutting down the kernel and notebook and restarting the session.

Most cloud GPU instances will start up with a Tesla GPU with at least 16GB (and give your instance exclusive access to the GPU ... that's why they are kind of expensive)

Posted on 2019-10-30 16:39:03
Donald Kinghorn

Thanks William, that's a great tip!

Posted on 2019-10-30 16:28:38
Donald Kinghorn

try;
conda activate tf-gpu
python

import tensorflow as tf

tf.test.is_gpu_available()

tf.test.gpu_device_name()

There are several test functions in tensorflow. Those 2 should show you information about you current tf env

If you have done an env with TF 2.0 (that's now the default if you conda install tensorflow-gpu )
then you could play with this, (you can change tf.device("/cpu:0") to see the difference in time)

import tensorflow as tf
import time

n = 1000
dtype = tf.float32
start = time.time()
with tf.device("/gpu:0"):
a = tf.Variable(tf.ones((n, n), dtype=dtype))
b = tf.Variable(tf.ones((n, n), dtype=dtype))
c = tf.matmul(a, b)
c_norm=tf.linalg.norm(c)

print('cnorm = ',c_norm)
print( 'took', time.time()-start, 'sec')

The comment striped out the indentation for a,b,c,c_norm! ... and added extra line breaks

Posted on 2019-10-30 16:19:11
Torben Andersen

I followed your instructions by the beginning of 2019 and the system is still working beautifully (thanks Donald!). I just got a paper published based on my work with the system. I now wish to upgrade to Tensorflow 2 but I am scared that I break anything if I try to update/upgrade. Has anybody done this and did it work? Any instructions?

Posted on 2019-10-11 06:28:36
Torben Andersen

My apologies for asking the above question. I thought that I had double-checked that it had not already been dealt with but I was wrong. It's explained in all detail. Again thanks for a wonderful instruction.

Posted on 2019-10-11 10:34:36
Donald Kinghorn

Hi Torben, I was going to add another comment related to this. There is now a problem with what I posted below The Anaconda folks updated the build of TF 1.14 to be linked with CUDA 10.1 That will fail with the pip install of TF 2.0 because it is linked against CUDA 10.0

You might be able to do the clone procedure in the post I liked below. But check to see what CUDA version you have in your existing TF env.
from inside your existing TF env

conda list

will show all the packages that are installed. Look for the the cuda-toolkit package, if it is version 10.0 then you can do the clone env and pip install of TF 2.0
If you see version 10.1 then it will not work.

The Anaconda folks will likely have an official TF 2 package built soon ... I just checked and it not there today.

basically something like the following in a new env will work too.
conda install tensorflow-gpu=1.13.1
pip install tensorflow-gpu==2.0

1.13.1 sets up the correct CUDA libs and then pip install for 2.0 will work

Posted on 2019-10-11 14:29:15
夏天成

Thanks for a such helpful article! It really helped me a lot since there were a plenty of tutorials on the internet that i tried were not useful.
But i got a problem when i try to run the code you post in the article. After i execute "Create the LeNet-5 convolution neural network architecture". I GOT THIS WARNING BELOW THE CELL:

"WARNING:tensorflow:From C:\Users\Jason\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\ops\init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor"

And i ignored this, my training results have Errors like this:

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
128/60000 [..............................] - ETA: 8:56 - loss: 2.3053 - acc: 0.0703
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
<ipython-input-7-792ff921728a> in <module>
4
5 model.fit(train_images, train_labels, batch_size=128, epochs=15, verbose=1,
----> 6 validation_data=(test_images, test_labels), callbacks=[tensor_board])
7
8 print('Training took {} seconds'.format(time.time()-start_time))

~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
778 validation_steps=validation_steps,
779 validation_freq=validation_freq,
--> 780 steps_name='steps_per_epoch')
781
782 def evaluate(self,

~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
372 # Callbacks batch end.
373 batch_logs = cbks.make_logs(model, batch_logs, batch_outs, mode)
--> 374 callbacks._call_batch_hook(mode, 'end', batch_index, batch_logs)
375 progbar.on_batch_end(batch_index, batch_logs)
376

~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\callbacks.py in _call_batch_hook(self, mode, hook, batch, logs)
246 for callback in self.callbacks:
247 batch_hook = getattr(callback, hook_name)
--> 248 batch_hook(batch, logs)
249 self._delta_ts[hook_name].append(time.time() - t_before_callbacks)
250

~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\callbacks.py in on_train_batch_end(self, batch, logs)
529 """
530 # For backwards compatibility.
--> 531 self.on_batch_end(batch, logs=logs)
532
533 def on_test_batch_begin(self, batch, logs=None):

~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\callbacks_v1.py in on_batch_end(self, batch, logs)
360 self._total_batches_seen += 1
361 if self._is_profiling:
--> 362 profiler.save(self.log_dir, profiler.stop())
363 self._is_profiling = False
364 elif (not self._is_profiling and

~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\profiler.py in save(logdir, result)
142 logdir, 'plugins', 'profile',
143 datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
--> 144 gfile.MakeDirs(plugin_dir)
145 maybe_create_event_file(logdir)
146 with gfile.Open(os.path.join(plugin_dir, 'local.trace'), 'wb') as f:

~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\lib\io\file_io.py in recursive_create_dir(dirname)
436 errors.OpError: If the operation fails.
437 """
--> 438 recursive_create_dir_v2(dirname)
439
440

~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\lib\io\file_io.py in recursive_create_dir_v2(path)
451 errors.OpError: If the operation fails.
452 """
--> 453 pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(path))
454
455

NotFoundError: Failed to create a directory: ./logs/LeNet-MNIST-1\plugins\profile\2019-10-16_13-44-58; No such file or directory

Any idea? Would appreciate your reply!

Posted on 2019-10-16 13:09:43
Donald Kinghorn

This has been coming up more often recently. You can ignore all of the message except that last line :-) What is happening is that Tensorboard is not able to write into the log directory.

I am going to try to reproduce the error and see if I can find what is going on...
**************************
OK Found it!
The older version (1.13.1) was able to use UNIX like file paths on Windows but it looks like version 1.14 does not
you need to change this,
tensor_board = tf.keras.callbacks.TensorBoard('./logs/LeNet-MNIST-1')
to this
tensor_board = tf.keras.callbacks.TensorBoard('.\logs\LeNet-MNIST-1')

I also noticed that you no longer need to create the directory before hand i.e. if the directors .logs\LeNet=MNIST-1 doesn't exist when you start the job run it will be created automatically.
****************************

I'll add a note to the post text! --dbk

Posted on 2019-10-16 17:09:52
夏天成

Hi Donald! Thanks for such a quickly reply! My test session is running and i finally can use GPU for my study! But i got problem when i followed your last step to open the TensorBoard, my browser just don't open the address for me. Tried many ways posted on the internet, even can't get into the TensorBoard page. Does anyone the face the same problem like me?

Posted on 2019-10-17 15:55:43
Donald Kinghorn

Yes! I just tried this too ... It works fine on Firefox and Edge but Chrome does not open the page! I use Firefox when I work with Jupyter notebooks so I didn't see this.

It may be because the local server that Tensorboard is running from is using http:// instead of https:// Chrome is starting to block that!

I tried using localhost:6006 in Chrome and that did work OK. ... I added a note in the post, thanks! :-)

Posted on 2019-10-17 17:49:28
夏天成

Thanks a lot!

Posted on 2019-10-21 13:34:10
107united

Do we need to install Visual Studio as a pre requisite for CUDA and CUDNN ?

Posted on 2019-11-11 11:22:03
Donald Kinghorn

No. This is the beauty of using Anaconda. You don't need to install CUDA. The needed libraries are installed along with tensorflow-gpu in the env that you create.
After you install tensorflow-gpu do

conda list

that will show you all the packages that are in the env. You should see something like,
cudatoolkit 10.0.130 0
cudnn 7.6.0 cuda10.0_0
in the list.

Posted on 2019-11-11 18:13:20
Seth Barton

This is incredible. Thank you! Managing these libraries is tough to learn how to do, especially as a student. This article helped me a lot.

Posted on 2019-11-24 02:33:59
Md.Mamunur Rashid

Great article , helpful

Posted on 2019-12-24 13:41:04
Guillaume Carrier

You are my savior! I had so much trouble getting the tensorflow installed and this solved all my problems. There are many out there that have the same issues and you should be the reference for solving this. There are way too much confusing information on google. simply THANK YOU.

Posted on 2020-01-10 03:23:21
Jack

Thank you very much!!
One question though, why do we need to create a new kernel for the tensorflow ('tensorflow-gpu-1.14 in your case)? Can we just import tensorflow from 'python 3' kernel? What are the difference?
Using conda tensorflow installation is MUCH easier!
Thank you again!

Posted on 2020-01-27 05:55:45
Donald Kinghorn

You are welcome :-) I make the new kernel for Jupyter to have the env available as a notebook startup choice.

You can install some version of TF into the default "base" python env and then load it from that default Py3 env/kernel. However, I recommend that you keep separate envs for different versions and functionally isolate the different environments. I typically have at least 2 versions of TF in different envs then just pick what I want from the notebook menu ... and often have multiple versioned notebook kernels open at the same time!

Think of the jupyter kernel as "activating" an env for the notebook

Posted on 2020-01-27 15:48:45
Jack

Thanks for the quick response. I didn't make myself very clear or I definitely have some misunderstandings of jupyter notebook.
What i mean is, after I created a separate environment call "tf", and installed tensorflow along with all the packages I will use, I could just open the jupyter notebook within "tf" environment, by choosing python 3 as the kernel and import tensorflow.

So my question is what is the advantage if I create a new kernel like you did instead of what I did.

Thanks,
Jack

Posted on 2020-01-27 20:17:22
Donald Kinghorn

OK got it ... the main advantage is the warm feeling you get from having your envs listed in the notebook "New" startup list :-) ...and you can switch envs from the kernel menu in a running notebook. Also, you can start a notebook from any env including "base" and then just pick the kernel you want when the notebook launches. So, there are some advantages but really it's a matter of preference

...there is certainly nothing wrong with doing what you are doing i.e. activating an env and starting a notebook from there. In fact I do that a lot myself

Posted on 2020-01-27 23:43:22

Thanks for the article, very helpful!! A note for newbies like me- dont run the update python command. If you update to python 3.8 it doesnt support tensorflow yet......so you'd need to backpedal and create new environment. Thanks!!

Posted on 2020-02-14 17:24:33
Donald Kinghorn

Hi Keith, Thanks for posting that! I haven't checked 3.8 against anything, I'm not surprised TF is broken against it.

Another thing to keep in mind is that creating environments is very powerful. You can specify versions of modules to load including the base python. And you can create as many envs as you want for using different versions. For example, right now that up date would give you a default 3.8 python and the default TensorFlow-gpu would be 2.0. If you want an env that matches what I did in this post your could make the env like, (I'm using a long name just to make versions clear)

conda create --name tf-gpu-1.14-py3.7 python=3.7 tensorflow-gpu=1.14

then conda activate tf-gpu-1.14-py3.7 will give you a similar env to what was originally in this post (I actually used TF 1.13.1 at first)

Posted on 2020-02-14 17:51:38

Excellent, thanks!!

Posted on 2020-02-14 18:11:29

Totally a noob question, but if my python script says my nvidia card only has compute capability 1.3, but the minimum required cuda capability is 3.0......does that mean my graphics card is useless for this? (Nvidia Quadro Fx 5800) Thanks!

Posted on 2020-02-14 20:24:50
Donald Kinghorn

Hi Keith, I didn't see this question the other day ... that GPU is pretty ancient :-) Here's a like to what those compute capability numbers refer to https://en.wikipedia.org/wi...

I would recommend at a minimum a GTX970 Anything from there up will be nice for computing and should hold up to the load stress. A used 970 around $100 wouldn't be too bad. For a new card I would go with at least a GTX1650 This post has a good comparison of performance from 1660 and 1070 up
https://www.pugetsystems.co...

Posted on 2020-02-24 17:17:56
Jay Couture

This is so straight forward that everything worked the first time. I have a small amount of experience with each of the technologies involved and am very thankful for you taking the time to write this and get my environment jump started. Thank you!
-Jay

Posted on 2020-02-24 01:01:51
Donald Kinghorn

You are welcome :-)

Posted on 2020-02-24 17:18:10
Arnan Yasamorn

Thank you for such a detailed article!

I'll try install with RTX2060.

Posted on 2020-03-08 07:32:48
pranjal saxena

Thank you for your kind document, It is my favorite document everytime I purchase new machine or install new windows. :)

Posted on 2020-03-17 02:48:14
Mariana Abuhattoum

hello Donald,
i just do exactly the same steps, and i got exception in the 5th part.

(tensorflow-od) PS C:\Users\maria> python
Python 3.7.7 (default, Mar 23 2020, 23:19:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-04-02 16:28:52.264978: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
>>> import object_detection
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'object_detection'
>>> tf.enable_eager_execution()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'enable_eager_execution'
>>> tf.enable_eager_execution()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'enable_eager_execution'
>>> print( tf.constant('Hello from TensorFlow ' + tf.__version__) )
2020-04-02 16:31:36.312726: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-02 16:31:37.148821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:04:00.0 name: GeForce GT 740M computeCapability: 3.5
coreClock: 1.0325GHz coreCount: 2 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 11.92GiB/s
2020-04-02 16:31:37.158607: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-02 16:31:37.180255: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-02 16:31:37.199985: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-02 16:31:37.209568: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-02 16:31:37.230298: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-02 16:31:37.244872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-02 16:31:37.275412: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-02 16:31:37.287370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-02 16:31:37.292837: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-04-02 16:31:37.306204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:04:00.0 name: GeForce GT 740M computeCapability: 3.5
coreClock: 1.0325GHz coreCount: 2 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 11.92GiB/s
2020-04-02 16:31:37.318547: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-02 16:31:37.324120: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-02 16:31:37.332073: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-02 16:31:37.338403: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-02 16:31:37.344495: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-02 16:31:37.352889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-02 16:31:37.358552: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-02 16:31:37.372676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\maria\anaconda3\envs\tensorflow-od\lib\site-packages\tensorflow_core\python\framework\constant_op.py", line 258, in constant
allow_broadcast=True)
File "C:\Users\maria\anaconda3\envs\tensorflow-od\lib\site-packages\tensorflow_core\python\framework\constant_op.py", line 266, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "C:\Users\maria\anaconda3\envs\tensorflow-od\lib\site-packages\tensorflow_core\python\framework\constant_op.py", line 95, in convert_to_eager_tensor
ctx.ensure_initialized()
File "C:\Users\maria\anaconda3\envs\tensorflow-od\lib\site-packages\tensorflow_core\python\eager\context.py", line 509, in ensure_initialized
context_handle = pywrap_tensorflow.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
>>>

you think i should delete cuda?

https://uploads.disquscdn.c...

Posted on 2020-04-02 13:41:41
Mariana Abuhattoum

did*

Posted on 2020-04-02 13:42:07
Donald Kinghorn

You need to do Step 1

Your driver is way too old. The CUDA runtime environment is provided by the NVIDIA driver ... not the dev libraries that get installed when you install tensorflow-gpu. The dev libs that TensorFlow are linking to are (mostly)up to date version 10.1 (within the last year) Your driver is several years old with a CUDA 9.0 runtime.

If you look at those messages you will see
2020-04-02 16:31:37.158607: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
That 101 part at the end is referring to CUDA 10.1
In your screen shot your driver is showing CUDA 9.0.176

Version conflicts between linked libraries (DLL's) is one of the biggest problems you run into with development code.

If you install the newest NVIDIA driver you will probably end up with a reported CUDA version of 10.2 That's OK since it will cover older code.

Welcome to scientific software development :-) Everyone doing this kind of work runs into problems like this at some point :-)

Posted on 2020-04-02 15:02:39
Mariana Abuhattoum

thanks a lot Donald that was helpful. i'll consider that :) :)

Posted on 2020-04-02 17:54:57
Eric White

Hi Donald, this is gold, thank you!

I end up with this error which I couldn't solve by searching entire internet. They all say it is about log folder, I've tried in every way; full path, os.path.join, reverse slash etc.

Please help.

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
WARNING:tensorflow:Trace already enabled
128/60000 [..............................] - ETA: 7s - loss: 1.4812 - accuracy: 0.6562
---------------------------------------------------------------------------
ProfilerNotRunningError Traceback (most recent call last)
~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in on_start(self, model, callbacks, use_samples, verbose, mode)
752 try:
--> 753 yield
754 model._successful_loop_finish = True

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
341 training_context=training_context,
--> 342 total_epochs=epochs)
343 cbks.make_logs(model, epoch_logs, training_result, ModeKeys.TRAIN)

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in run_one_epoch(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs)
180 cbks.make_logs(model, batch_logs, batch_outs, mode)
--> 181 step += 1
182

~\anaconda3\envs\tf\lib\contextlib.py in __exit__(self, type, value, traceback)
118 try:
--> 119 next(self.gen)
120 except StopIteration:

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in on_batch(self, step, mode, size)
787 self.callbacks._call_batch_hook(
--> 788 mode, 'end', step, batch_logs)
789 self.progbar.on_batch_end(step, batch_logs)

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\callbacks.py in _call_batch_hook(self, mode, hook, batch, logs)
238 batch_hook = getattr(callback, hook_name)
--> 239 batch_hook(batch, logs)
240 self._delta_ts[hook_name].append(time.time() - t_before_callbacks)

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\callbacks.py in on_train_batch_end(self, batch, logs)
1694 if self._is_tracing:
-> 1695 self._log_trace()
1696 elif (not self._is_tracing and

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\callbacks.py in _log_trace(self)
1746 step=step,
-> 1747 profiler_outdir=os.path.join(self._log_write_dir, 'train'))
1748 self._is_tracing = False

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py in trace_export(name, step, profiler_outdir)
1239 if profiler:
-> 1240 _profiler.save(profiler_outdir, _profiler.stop())
1241

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\eager\profiler.py in stop()
98 raise ProfilerNotRunningError(
---> 99 'Cannot stop profiling. No profiler is running.')
100 if context.default_execution_mode == context.EAGER_MODE:

ProfilerNotRunningError: Cannot stop profiling. No profiler is running.

During handling of the above exception, another exception occurred:

ProfilerNotRunningError Traceback (most recent call last)
<ipython-input-16-792ff921728a> in <module>
4
5 model.fit(train_images, train_labels, batch_size=128, epochs=15, verbose=1,
----> 6 validation_data=(test_images, test_labels), callbacks=[tensor_board])
7
8 print('Training took {} seconds'.format(time.time()-start_time))

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
817 max_queue_size=max_queue_size,
818 workers=workers,
--> 819 use_multiprocessing=use_multiprocessing)
820
821 def evaluate(self,

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
395 total_epochs=1)
396 cbks.make_logs(model, epoch_logs, eval_result, ModeKeys.TEST,
--> 397 prefix='val_')
398
399 return model.history

~\anaconda3\envs\tf\lib\contextlib.py in __exit__(self, type, value, traceback)
128 value = type()
129 try:
--> 130 self.gen.throw(type, value, traceback)
131 except StopIteration as exc:
132 # Suppress StopIteration *unless* it's the same exception that

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in on_start(self, model, callbacks, use_samples, verbose, mode)
755 finally:
756 # End of all epochs
--> 757 self.callbacks._call_end_hook(mode)
758
759 @tf_contextlib.contextmanager

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\callbacks.py in _call_end_hook(self, mode)
260 """Helper function for on_{train|test|predict}_end methods."""
261 if mode == ModeKeys.TRAIN:
--> 262 self.on_train_end()
263 elif mode == ModeKeys.TEST:
264 self.on_test_end()

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\callbacks.py in on_train_end(self, logs)
377 """
378 for callback in self.callbacks:
--> 379 callback.on_train_end(logs)
380
381 def on_test_begin(self, logs=None):

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\callbacks.py in on_train_end(self, logs)
1718 def on_train_end(self, logs=None):
1719 if self._is_tracing:
-> 1720 self._log_trace()
1721 self._close_writers()
1722

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\keras\callbacks.py in _log_trace(self)
1745 name='batch_%d' % step,
1746 step=step,
-> 1747 profiler_outdir=os.path.join(self._log_write_dir, 'train'))
1748 self._is_tracing = False
1749

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py in trace_export(name, step, profiler_outdir)
1238
1239 if profiler:
-> 1240 _profiler.save(profiler_outdir, _profiler.stop())
1241
1242 trace_off()

~\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\eager\profiler.py in stop()
97 if _profiler is None:
98 raise ProfilerNotRunningError(
---> 99 'Cannot stop profiling. No profiler is running.')
100 if context.default_execution_mode == context.EAGER_MODE:
101 context.context().executor.wait()

ProfilerNotRunningError: Cannot stop profiling. No profiler is running.

Posted on 2020-04-11 18:39:18
fabrice saadoun

Don,
Thanks a lot for this tutorial. The clearest one I met until now.

Just 2 errors that occurred during the tutorial:
- tf.enable_eager_execution() returned an error
- tensor_board = tf.keras.callbacks.TensorBoard('.\logs\LeNet-MNIST-1') also returned an error

But it does not impact on the example.
Thanks again!

Posted on 2020-04-12 12:05:45
Donald Kinghorn

I have been wanting to update this post to reflect changes in the new TensorFlow 2. "eager" is default now and Keras probably has a bunch of changes that I don't know about. It's tightly integrated now ... I just haven't been back to this for a while ... been working on a project that's taking a lot of time and I haven't been doing any science! Looking forward to getting back to it :-)

Posted on 2020-04-13 16:04:18
sai vardhan reddy

thanks bro it helped me a lot and you have got a new subscriber

Posted on 2020-05-13 11:47:58
Donald Kinghorn

:-) you are welcome my friend!

Posted on 2020-05-14 02:02:23
Chuck Schultz

Dr Kinghorn - I got this example and the Linux version to work some time ago. Now I am going back over that work and I have a question. The question is: where are the 60,000 images physically at in this example. I have a CNN algorithm with dataset on my workstation that I am trying to set it up to run and the novice in me is a little lost. Thanks.

Chuck Schultz

Posted on 2020-05-27 01:49:43
Donald Kinghorn

thr data is getting pulled down by this nifty Keras utility function and then read into the variable mnist (it's not a very big dataset so it easily fits in memory)
mnist = tf.keras.datasets.mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

the image data and labels are in those variables split up nicely byt that "load_data" function

You can gt the raw MNIST from Yann LeCun's site here http://yann.lecun.com/exdb/... The data set is also available from Kaggle and probably other places too. Kaggle has loads of datasets!

Posted on 2020-05-27 15:17:43
Chuck Schultz

Sir

Thanks, I suspected the dataset was in the cloud. What I am looking to do is have the dataset on my workstation and run the example exclusively on my workstation. In truth, I am trying to run a somewhat similar CNN imaging algorithm for a different and harder problem. I guess I am going to have to ponder this to see if I can figure it out...

Chuck Schultz

Posted on 2020-05-28 00:39:34
Donald Kinghorn

You can certainly download the datasets and use them from your local system directly. That keras function is just a convenience utility for some small data sets. You can get those files directly from Yann Lecun's site. He was one of the original researchers for that work.

And be sure to check out Kaggle https://www.kaggle.com/ you can find and download lots of datasets from there. It's a treasure trove really... You may find something closer to what you are thinking about working on. Maybe something that can supplement the data you are planning on using to give you a larger training set??

I'll warn you up front that preparing data sets for use in training can be one of the most difficult parts of doing ML/AI in the real world. Kaggle is again a good resource for a lot of practical advise and code snippets for all aspects of the field. But, yes, data prep is hard, and it's not really covered well in documentation and instructional material because it's not as "glamorous" as doing a training run or inference. Expect some challenges and be persistent :-)
Best wishes --Don

Posted on 2020-05-28 14:48:27
Aryan Jain

Please help me!! It has been 4 days. I have followed each above given step but still no chance.!

(tf-gpu) C:\Users\aryan>python

Python 3.7.7 (default, May 6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32

Type "help", "copyright", "credits" or "license" for more information.

>>> import tensorflow

Traceback (most recent call last):

File
"C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\pywrap_tensorflow.py",
line 58, in <module>

from tensorflow.python.pywrap_tensorflow_internal import *

File
"C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\pywrap_tensorflow_internal.py",
line 28, in <module>

_pywrap_tensorflow_internal = swig_import_helper()

File
"C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\pywrap_tensorflow_internal.py",
line 24, in swig_import_helper

_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)

File "C:\Users\aryan\anaconda3\envs\tf-gpu\lib\imp.py", line 242, in load_module

return load_dynamic(name, filename, file)

File "C:\Users\aryan\anaconda3\envs\tf-gpu\lib\imp.py", line 342, in load_dynamic

return _load(spec)

ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\__init__.py", line 41, in <module>

from tensorflow.python.tools import module_util as _module_util

File "C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\__init__.py", line 50, in <module>

from tensorflow.python import pywrap_tensorflow

File
"C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\pywrap_tensorflow.py",
line 69, in <module>

raise ImportError(msg)

ImportError: Traceback (most recent call last):

File
"C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\pywrap_tensorflow.py",
line 58, in <module>

from tensorflow.python.pywrap_tensorflow_internal import *

File
"C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\pywrap_tensorflow_internal.py",
line 28, in <module>

_pywrap_tensorflow_internal = swig_import_helper()

File
"C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\pywrap_tensorflow_internal.py",
line 24, in swig_import_helper

_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)

File "C:\Users\aryan\anaconda3\envs\tf-gpu\lib\imp.py", line 242, in load_module

return load_dynamic(name, filename, file)

File "C:\Users\aryan\anaconda3\envs\tf-gpu\lib\imp.py", line 342, in load_dynamic

return _load(spec)

ImportError: DLL load failed: The specified module could not be found.

it constantly showsme this import error! It worked once but I deleted that
env because I wanted to change the tf version. Now it does not work at
all!

Posted on 2020-05-29 14:54:12
Donald Kinghorn

It looks like the problem is that you have multiple Python installs. ... maybe you followed instructions from Google? ... (and that's not good)
Looking in the TensorFlow issues on GitHub I see that there are LOTS of people having this problem ...

It's basically a PATH issue. (maybe PYTHONPATH)

Here is the hint of what's going on;
"C:\Users\aryan\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\pywrap_tensorflow.py"
Look at the path to that file. It is not part of Anaconda Python. Most everything for Anaconda stays in your C:\Users\aryan\Anaconda3 directory.

The directory that has what you are trying to run is C:\Users\aryan\Anaconda3\envs\tf-gpu\site-packages\

It looks like you installed Python outside of Anaconda (i.e. from python.org) and then pip installed TensorFlow with that. It has inserted itself into some part of your PATH. "C:\Users\aryan\anaconda3\envs\tf-gpu\lib\imp.py" is being called inside your conda env and is finding/getting the wrong pywrap_tensorflow.py and then throwing that DLL error ... perhaps a bigger question is why is that happening??? imp.py should be finding it's own local env files first!

The easiest thing to do is uninstall the other "Python" and just use Anaconda3
[ dev environments on Windows can be particularly difficult to manage]

It's possible to have multiple Python installs on Windows but it is difficult to keep things straight and working correctly. If you want to try to keep multiple versions then you need to become very good friends with your Windows environment variables ... that's not fun! ... but, take a look at your system/user environment variables before you make changes and see if you can spot something that look like it may have cause the trouble

Check all of this out and post again if you are still having trouble ... if it is actually an Anaconda problem then I'd really like to know! ... I'll help you fix it!

Posted on 2020-05-29 16:24:48
Aryan Jain

Thank you so so so so much! It works like a charm!
I deleted older version of python.
One last thing though, is there a way to keep track of GPU. I mean, processes that are using it and all. Thank you.

Posted on 2020-05-29 18:16:48
Donald Kinghorn

You are welcome :-) I'm glad that took care of it ...

My preferred way to watch the GPU is with nvidia-smi (I use that a lot on Linux) It's not on the Windows PATH but you can find it here,
C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe
You would need to run that from PowerShell or CMD

if you run it as nvidia-smi -l it will loop continuously use -h to see the help message. It's has a lot of options and is very powerful i.e. be careful with it :-) ... you can change things about your GPU with it ...

Another nice option that has a graphical interface is GPU-Z https://www.techpowerup.com...

Posted on 2020-05-29 23:06:10
Aryan Jain

Thank you!

https://media0.giphy.com/me...

Posted on 2020-05-30 07:06:18
Oscar Cañón

Muchas gracias!

Posted on 2020-05-30 00:38:09
Dhananjaya (Jay)

Thank you Don, lucky, I visited your blog, this has helped me install and setup Tensorflow-GUP effortlessly.
Have you written any post on WSL on Windows 10, love to read them.

Thanks.

Posted on 2020-06-12 04:37:13
Donald Kinghorn

I'm glad it was helpful!

I use WSL2 and have been for some time. It is now official in the 2004 Win10 release. The performance is much better than WSL1. I bumped my "Win10 insider" system to "Fast ring" and I'm hoping to get to do some testing with GPU compute access from WSL2! (this month I hope!) I have run docker and NVIDIA's enroot on WSL2 including installing NV drivers ... just need one more piece of the puzzle and MS and NVIDIA are working on that right now. I'm hoping to be able to install a full ML/AI environment on WSL2 using JupyterHub. I have worked up a nice setup for Linux servers but it should work on WSL also once a couple more WSL updates happen.

WSL2 still needs (for me) GPU compute access and systemd init capability. After that it should be a game changer.

I'll be writing about this soon. Best wishes --Don

Posted on 2020-06-12 15:34:29
Michael Nix

So this is only for Nvidia users? Ff sakes man. I'm following a tutorial from youtube about using Spleeter and I get this at the end of a mile long error "Failed to load the native TensorFlow runtimes." Guess the youtuber forgot to mention that part.

Posted on 2020-06-24 22:56:58
Donald Kinghorn

Oh man, yes, they should have mentioned that! Spleeter looks really cool. I'll have to check that out. I'm a musician too :-)

CUDA is an NVIDIA only thing. It's really amazing what they did with that whole "ecosystem" for developers. It's thanks to NVIDIA that all of the ML/AI development of the last several years has happened. Most of the ML/AI stuff that has GPU acceleration uses CUDA under the hood. Having an NVIDIA GPU for compute is pretty much a must have. If you are thinking about getting one and budget is tight an RTX 2070super gives a lot of compute capability to the cost. Any older 1070 or better GPU's are still really usable too.

The AMD GPU's are great hardware and very capable of high performance compute but the developer environment never really happened. OpenCL is just not as good to work with as CUDA.

Microsoft is doing some really cool stuff that will level the playing field but it wont be mainstream for a bit. They have developed DirectML which use DirectX under the hood. Google has ported a version of TensorFlow to run on it. I will probably try it. It's in bleeding edge insider Win10 now (Developer channel fast ring) Since it's using DirectX at it's core it should work with all DirectX-11 capable GPU's This stuff probably wont be ready for release until Fall at the earliest though.

My best advice for you is to try to pickup a decent NVIDIA GPU if you can. That is where all the development work gets done so you will have a lot better chance of getting things working. Take care --Don

Posted on 2020-06-25 01:41:56
Donald Kinghorn

just noticed some other stuff for Spleeter ... I would check out the Colab setup they did to try it. Also, you can do this without GPU. It should work fine on CPU. Install Anaconda python and then you can make an env like I describe in this post but use the CPU version of TensorFlow (in fact they have tensorflow==1.15.2 in their requirements.txt file, that is the CPU version of the newer release of TF 1.x)

I think Spleeter will pull all of it's dependencies during install.

conda create -n spleeter -c conda-forge spleeter

then

conda activate spleeter

and try it out!

Posted on 2020-06-25 02:02:33
petec2

<repeat the="" -="" omfg="" -="" this="" article="" is="" literally="" the="" best="" thing="" ever.="" we="" should="" delete="" all="" other="" diy="" articles="" that="" are="" not="" this="" clear,="" concise="" and="" logical="">

BUT
windows 10 (may 2004 (seriously? 2004, you don't have anyone with working brain cells to point out "maybe we should skip build numbers all the way to 3000?")
followed your instructions religiously from a clean install.

import tensorflow as tf
Got the << ImportError: DLL load failed: The specified module could not be found. >>
While it was trying to load '_pywrap_tensorflow_internal' pyd

searches led me to here, and still failed.
<more head="" banging="">
downloaded cudatoolkit and cudnn, installed cudatoolkit, copied cudnn to the cuda dir
and

WTF
import worked
ran the rest of your example and ... SHOCKING ... it worked
I could not get tensorboard to work and that is yet another error msg for another day

thanks a billion

Posted on 2020-07-10 02:12:36
Donald Kinghorn

ha ha :-) You are welcome! This post is in serious need of a refresh too. The principles are sound but entropy has a tendency to make things rot over time.
There are multiple ways to make things work and sometimes things can be fragile ... and break. ...more often than not it comes down to issues with PATH ... It's a developers nightmare sometimes and it's been that way for decades

This problem with the DLL is really concerning. You are the 2nd to bring this to my attention. Really got my attention because you started from a clean install. I'm almost at a break point on a big project and I'm anxious to get back to fixing some of these "useful" posts :-)

I'm glad you got stuff working! (tensorboard stuff is rolled into TensorFlow 2 now and I haven't messed with it for a while ...)
Best wishes --Don

Posted on 2020-07-10 15:14:35
Eric

Hello, good article but sadly it wont work for me.

I managed to replicate all your stuff but when I want to start training I will get the following error:
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node sequential/conv2d/Conv2D (defined at <ipython-input-7-792ff921728a>:6) ]] [Op:__inference_distributed_function_1035]

conda list cudnn in the prompt shell give me:
Name: cudnn

Version: 7.6.5
Build: cuda10.1_0
my installed tensorflow version is 2.1.0

I had installed this CUDA version by my own before trying this tutorial. Is this a problem?

Thanks in advance!

Posted on 2020-07-31 12:32:52
Donald Kinghorn

This post definitely needs a refresh to see if there is any new gotcha's. In principle though it still all be valid.
I think you are on the right track for troubleshooting. It looks like a PATH problem. If you had installed CUDA and cuDNN natively then runtime links may still be looking for that. Check your environment variables to see if you have the old install still in there.
I have anaconda on the Win 10 laptop I'm using right now ... I'll check if I can create a new env and fire up TF2 ...
This system was a bit out of date so I did:
conda update conda
conda update --all

conda create --name tf2new-gpu tensorflow-gpu ipykernel
conda activate tf2new-gpu
python -m ipykernel install --user --name tf2new-gpu --display-name "TensorFlow-GPU-2.1"

jupyter notebook

ran a MNIST training example with GPU ... worked like a charm

So, yes, I think you are hitting a path problem from the old install.
Best wishes --Don

Posted on 2020-07-31 23:48:29
Eric

Hello Don , thanks for the very fast response!
You are right, your post is still valid and shows the best guide to get started with Tensorflow and explore its possibilities!
After I wrote you yesterday I created a conda enviroment using tf1.14 and CUDA 9.0 and this worked out fine for me with this MNIST-example.

Now I checked my enviromental variables and there is no Path for CUDA anymore, the only thing I'm wondering is that when I start NVIDIA-smi.exe it shows in the top right corner "CUDA10.2" Could I get in trouble with this? its still there also after unistalling all NVIDIA packages that came along with cuda.

Good news at the end: I just did the same (or I think so) as yesterday with the tf2.1 env and it worked out fine now. No Warnings or Errors in the MNIST script and the GPU was used for training as well! Seems like always: the biggest problem is in front of the maschine.

Finally I have a couple of questions at the end:
- do I have to active the conda enviroment allways before running the notebook script?
- there should always be just on kernel running, right?

- I want to use this enviroment in my PyCharm IDE to do a bigger YOLO-Object detection, is there anything I really have to watch out or should it be plug and play after the enviroment is set up correctly in anaconda?

Again, thanks so much for your help and this article! I had a lot of frustrating days befor I found this guidance.

Best regards

Eric

Posted on 2020-08-01 09:56:15
Donald Kinghorn

:-) Glad you got everything working!

First, the CUDA 10.2 that you see from nvidia-smi is referring to the runtime which is part of the NVIDIA driver install. It's independent of the libraries that are installed in the conda envs

You don't necessarily have to activate the env before the notebook but I usually do. You can changed the "kernel" that you are using in the notebook interface.

You can have multiple kernels running but a BIG warning! TensorFlow is annoying sometimes because it does not release GPU memory until you shutdown the kernel that you were using. If you are doing work and had a training job running that used up a lot of GPU memory and left that kernel running ... then started up another notebook for some other work, the GPU memory will still be full. I'm sure you will see what I mean at some point. PyTorch has a way to release the GPU memory without stopping the kernel (thus preserving the variable values defined in the notebook in CPU memory) You will be able to find a a good workflow but it is a bit of a nuisance sometimes --Don

Posted on 2020-08-03 18:08:26
manontoilet

Donald, I appreciate your share to deal with the installation of tensorflow gpu in simple and clean way, it works for me.
I recently noticed the ft 2.3 was released but for conda win only support 2.1, it seems the only disadvantage of conda install is the delay of the new version,
since tf was released as pip install, need some work to do the conda install package right? do you know any way to do the upgrade? instead of waiting for the conda install available ?
Thank you

Posted on 2020-08-21 17:14:34
Donald Kinghorn

Yes, there is a lag from release to conda packaging. And it's worse for Windows (usually 1 release behind Linux)... let me check some things ... I think they are still liking against CUDA 10.1 so the dependencies from 2.1 may still be OK.

An approach like I did last year for TF 2-beta might work. I cloned an existing TF env and pip installed the new TF on top of it. It worked :-)
https://www.pugetsystems.co...

Dang! i messed around with this for a while ... too long :-) I could not get it working right. If I get more time I'll try again and post back --Don

Posted on 2020-08-21 21:36:05
Camila Ribeiro

Thank you so much for the tutorial! It really helped me out.
One further question: is there anyway to run this through Visual Studio Code? I would love to continue using VS Code instead of the jupyter notebook in the browser, but cannot do that with the GPU support. I do have the option in VS code to choose the Kernel I created (TensorFlow-GPU-1.13), but even with this option on, it does not use the GPU.
Thank you!

Posted on 2020-08-26 09:45:21
Donald Kinghorn

That is a great question! I really like vscode too. Yesterday I was thinking I should try out the updated "notebook" features in the python extension. I haven't really explored the extension yet.
I'm really curious about what the problem could be. It seems that if the env is activated it should work because the needed GPU/cuda stuff is in the env.

OK I just checked this out. I see what you mean! It loads the env and activates it but no available GPU devices. If I open a powershell term in vscode I can activate the env and running tf.config.list_physical_devices('GPU') shows the GPU but from the notebook in vscode the result is an empty list.

This looks like a bug. I'll report it as an issue on github and see if I get a response. This will be great if we can get it working right :-)

Posted on 2020-08-26 19:17:49
Donald Kinghorn

OK, now it's working! I was writing issue report and tried it again and it worked right :-) I tested this on a laptop so I added vscode as a program for GPU acceleration in the nvidia control panel.

If you are using a laptop right click on your desktop and open "NVIDIA Control Panel" then go to Manage 3D settings --> Program Settings click add and add vscode and then set it to use NV GPU

https://uploads.disquscdn.c...

You might need to restart windows ?? and for sure restart vscode

If you are not on using a laptop let me know --Don

Posted on 2020-08-26 21:03:40
Yayosawa

Muchas gracias amigo ! me diste la solución que necesitaba. Me había llevado trabajo y tiempo instalar y reinstalar versiones de Cuda y Cudnn y finalmente mi ambiente quedo perfecto y limpio con este post. Saludos desde Chile !

Posted on 2020-09-08 02:06:37
Donald Kinghorn

De nada mi amigo :-)

Posted on 2020-09-08 16:01:59
Sehej Bakshi

Hello sir!

I followed your tutorial to the last word and it works like a charm!
Thank You!

Posted on 2020-09-14 17:53:41