TensorFlow Installation CPU version



Installing TensorFlow can be easy or hard depending on what you want to achieve. A CPU only install is relatively simple to do in several different ways. However, if you want GPU acceleration, (and yes, you will want GPU acceleration!) it can be a problem to get a TensorFlow install that is "the way you want it"! In this post I’m going to look at getting a basic CPU version of TensorFlow installed and running with "standard" Python and Anaconda Python.

In later posts I will go through installs with GPU acceleration, installs for Windows 10 and for a "good" GPU accelerated install for Anaconda Python. I will also do a post on using TensorFlow with NVIDIA Docker (which is the easiest way to use TensorFlow). For most of those posts I will likely be recompiling TensorFlow from source to get it "the way I want it"!


Why start with the CPU version

The basic problem of installing TensorFlow with CUDA support … Dependencies!

In order to install GPU accelerated TensorFlow the following clip from a post I wrote about "motivation for using NVIDIA Docker" applies,

"Must be able to (handle, fix, maintain), + (library, configuration, version, environment) + ( nightmare, hell)!"

OK, it’s not that bad … but it kind of is. As a teaser, using Docker is the most well supported and easiest way to use TensorFlow but it may not be "what you want".

The TensorFlow developers do a great job! The documentation is very good. The install instructions on their site are clear and accurate. They provide pre-built binary images that are easy to install for Python. However, there are dependencies. If you want GPU support you have to have the appropriate CUDA libraries available on your system.

As of this writing the TensorFlow binary (stable) images with CUDA support are linked to an "out-of-date" CUDA install. The current CUDA install is version 9.1 and the TensorFlow binary builds require 9.0.

Things like that are always the "basic problem" with big research oriented software projects. If you try to do a "local" install on your system you risk breaking other programs that have different requirements/dependences. It is unfortunately easy to make a mess on your system.

I will come back to this issue and get a good GPU accelerated TensorFlow (local) install method that does not require you to "make a mess" on your system. … I promise!


Analysis of the TensorFlow install documentation.

Let’s look at the official install documentation. The Python API is the primary way to use TensorFlow. There are API’s for Java, C and Go but they are for "deployment" of TensorFlow models not development work. For Python there are 4 guides listed,

  • Installing TensorFlow for Ubuntu — This is the main environment for use of TensorFlow.
  • Installing ThesorFlow on macOS — I don’t care about this since Mac’s don’t use NVIDIA GPU’s i.e. no CUDA. … and I don’t have a MacBook (even though sometimes I wish I did.)
  • Installing TensorFlow on Windows — This I’ll look at in a separate post.
  • Installing TensorFlow from Sources — I will do this in later posts to get "what I want" in a build.

CPU or GPU version

The next thing they say in the install documentation is essentially,

CPU == easy!, GPU == hard!

They recommend you install the CPU only version even if you want the GPU version. Why? Dependencies! As I mentioned above, the current GPU version needs runtime access to CUDA 9.0 libraries. It also need version 7.0 of cuDNN … and it needs cuda-command-line-tools to get "cupti" libs which look like there may be version dependences for that too. So, yes, CPU easy, GPU hard/messy!

Docker + runtime=nvidia == real easy! … if you know how to setup and use Docker

If you have Docker and NVIDIA Docker setup on your workstation then firing up TensorFlow is is pretty trivial for CPU and GPU versions. It is obvious that the development work on TensorFlow is being done in docker containers. The continuous integration (CI) and deployment (CD) system puts up a nightly build in a docker container. I’ll discuss using Docker in a separate post. Really, I have already addressed this in the series of posts I did starting with How-To Setup NVIDIA Docker and NGC Registry on your Workstation – Part 1 Introduction and Base System Setup. There are 5 posts in that series and they go through full details of how to get Docker working for you. That is what I use for most of my work with TensorFlow.


Installing TensorFlow (CPU version)

If you are just getting started and/or not working on any demanding projects then this is a good option. I’ll go over two ways to do this on a workstation system.

Using "Standard" Python (Python.org) and pip

The system default Python on Ubuntu 16.04 is from their .deb packaging of the the "official" Python from Python.org (In Ubuntu 16.04 it is not the latest Python build though). That is the recommended Python from the TensorFlow developers. It’s their default-build Python environment. (It’s not my recommendation since I prefer Anaconda Python.)

I’m not going to go into much detail here because if this is what you want for your TensorFlow install then you can look at the install documentation.

If you are using your system Python then I assume you are using Virtualenv to isolate your project setups. [This is always a good idea especially if you are going to be using pip for package installs.]

Make sure you have a few prerequisites installed, (and please, just use Python 3 unless you have a real good reason to use the ancient "but I’m not dead yet!" Python 2).

sudo apt-get install python3-pip python3-dev python-virtualenv

Then create a virtualenv for your install. I have a directory in my home directory called envs and I’ll use tf-cpu for the virtual environment name.

virtualenv --system-site-packages -p python3 ~/envs/tf-cpu

activate that,

source ~/envs/tf-cpu/bin/activate

Now that you are in that environment use pip3 to install TensorFlow, (for example, in account kingh2 on my personal machine, "i9"),

(tf-cpu) kingh2@i9:~$ pip3 install --upgrade tensorflow

Now just make a quick check that is works, (I’m still in the environment created above) Note that I’m doing something a little more interesting than "hello world". This is matrix multiplication and "summation reduction" in TensorFlow with two 10000 x 10000 matrices … and I timed it. Following is an interactive command-line session terminal output.

(tf-cpu) kingh2@i9:~$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> import time
>>> tf.set_random_seed(42)
>>> A = tf.random_normal([10000,10000])
>>> B = tf.random_normal([10000,10000])
>>> def checkMM() :
...     start_time = time.time()
...     with tf.Session() as sess:
...             print( sess.run( tf.reduce_sum(tf.matmul(A,B)) )  )
...     print(" took {} seconds ".format(time.time() - start_time))
...
>>> checkMM()
2018-03-22 16:03:11.890694: I tensorflow/core/platform/cpu_feature_guard.cc:140]
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

-873818.56
 took 3.3814306259155273 seconds
>>>

The first time you run TensorFlow in a session it outputs a message from it’s "feature_guard.cc" code. Observe that it is letting me know that the TensorFlow build was not compiled to use the nice Intel AVX512 vector unit I have on my Xeon-W 2175 CPU!

2018-03-22 16:03:11.890694: I tensorflow/core/platform/cpu_feature_guard.cc:140]
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

That’s the default TensorFlow build you get from a pip install. Now lets do an install with Anaconda Python.

Using Anaconda Python with conda (My recommendation)

We will do a TensorFlow install using conda with the Anaconda Python distribution. I’m doing this on the same machine as above but in a different account, "kingh". (so I keep the user environments "clean")

I like Anaconda Python and I have recommended it in other posts. You can argue that there are some downsides to using it. However, it is very popular and well maintained and heavily oriented toward data-science users. It is by default optimized for performance and important numerical modules are linked against the excellent Intel MKL (Math Kernel Library). … but there is a surprise a little later…

What does the TensorFlow team say,

NOTE: The conda package is community supported, not officially supported. That is, the TensorFlow team neither tests nor maintains the conda package. Use that package at your own risk.

Well, what I’m going to install is the default TensorFlow done by the official Anaconda Python maintainers. Unfortunately, there is not a GPU version maintained in the default Anaconda packages. There are builds (including GPU) that you can install from Anaconda Cloud That are "community maintained" but I don’t necessarily recommend you use any of them. [In a later post I’ll go through building your own GPU accelerated TensorFlow as a conda package without needing a CUDA install on your system! ]

This is the Python version in the Anaconda version I’m using.

kingh@i9:~$ python
Python 3.6.4 |Anaconda custom (64-bit)| (default, Mar 13 2018, 01:15:57)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

Note: that this is Python 3.6.4 and it was built with gcc 7.2 (more up-to-date that the standard system Python).

Here’s a terminal session where I use conda to create a virtual environment called tf-cpu-conda with tensorflow installed (the official default Anaconda Python TensorFlow version 1.6.0)

kingh@i9:~$ conda create --name tf-cpu-conda tensorflow

That’s it! You now have TensorFlow installed in a coda environment.

Lets activate the environment and run the interactive test session again, (this is my terminal session running that test)

kingh@i9:~$ source activate tf-cpu-conda
(tf-cpu-conda) kingh@i9:~$ python
Python 3.6.4 |Anaconda, Inc.| (default, Mar 13 2018, 01:15:57)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> import time
>>> tf.set_random_seed(42)
>>> A = tf.random_normal([10000,10000])
>>> B = tf.random_normal([10000,10000])
>>> def checkMM() :
...     start_time = time.time()
...     with tf.Session() as sess:
...             print( sess.run( tf.reduce_sum(tf.matmul(A,B)) )  )
...     print(" took {} seconds ".format(time.time() - start_time))
...
>>> checkMM()
2018-03-22 17:00:43.627237: I tensorflow/core/platform/cpu_feature_guard.cc:140]
Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA

-873820.3
 took 1.521228313446045 seconds
>>>

Two things to note about Anaconda conda install of TensorFlow

  • The installed version with conda is 2.22 times faster than the pip install in "standard Python"
  • This build is not linked to MKL like many Anaconda built numerical packages. And, no AVX vectorization support either!

I will investigate the performance difference when linked against MKL in a later post when I compile from source. It could be that linking to MKL just doesn’t make much difference. I’ll compile it myself to see if it does. Also, gcc has AVX512 support too! They may have not used flags to build with AVX because they would need to build a multi-branched executable to accommodate different hardware and that can make the executables much larger (and they are already large!)

Why is the Anaconda Python TensorFlow so much faster?

I think it is just that a newer version of the gcc compiler was used in the Anaconda build. I ran help(tf) to get some info about the builds and found these differences,

Standard Python build

   COMPILER_VERSION = '4.8.4'
   CXX11_ABI_FLAG = 0
   VERSION = '1.6.0'

Anaconda Python build

   COMPILER_VERSION = '7.2.0'
   CXX11_ABI_FLAG = 1
   VERSION = '1.6.0'

Bonus section — Intel Python

While I was finishing up this post I got an email from Intel with an announcement that they released an update for Intel Python. So, I decided to install it for a quick test. I made an another account on my system "kingh3" and did a simple install from the downloaded distribution. (Just "untar" and run the install program.) I quickly ran the same little test that I did for standard Python and Anaconda Python. The Intel Python distribution has a TensorFlow build included but it is from an older code base, version 1.3.1. That means this should not be compared directly with the other test runs.

Here’s the terminal output of the test,

Python 3.6.3 |Intel Corporation| (default, Feb 12 2018, 06:37:09)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import tensorflow as tf
>>> import time
>>> tf.set_random_seed(42)
>>> A = tf.random_normal([10000,10000])
>>> B = tf.random_normal([10000,10000])                
>>> def checkMM() :
... 	start_time = time.time()
... 	with tf.Session() as sess:
... 		print( sess.run( tf.reduce_sum(tf.matmul(A,B)) )  )
... 	print(" took {} seconds ".format(time.time() - start_time))
...
>>> checkMM()
2018-03-23 16:00:08.258739: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-03-23 16:00:08.258789: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-23 16:00:08.258805: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX512F instructions, but these are available on your machine and could speed up CPU computations.
2018-03-23 16:00:08.258818: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
-873820.3
 took 1.6182544231414795 seconds
>>>

Here’s some info from help(tf),

DATA
    COMPILER_VERSION = '4.8.2 20140120 (Red Hat 4.8.2-15)'    
    VERSION = '1.3.1'

It’s not linked to MKL! … and no AVX support. It is nearly the same performance as the Anaconda build but it is a different version of TensorFlow. It’s interesting to me that Intel didn’t build and link TensorFlow with their own compilers and libraries in their own Python distribution. Maybe I’m missing something (probably)! I’ll find out when I build TensorFlow from source myself.


That’s enough for the CPU install. Please don’t take that little test I did as a benchmark! I really just wanted to do something more interesting than "hello world".

I have been using TensorFlow with Docker so I was a little surprised that local installs are somewhat messy. I didn’t expect that! I’ll do more guides for getting a good install. I feel a performance test in the works too!

Happy computing –dbk