Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1133
Dr Donald Kinghorn (Scientific Computing Advisor )

Build TensorFlow-CPU with MKL and Anaconda Python 3.6 using a Docker Container

Written on April 6, 2018 by Dr Donald Kinghorn
Share:


While I was at GTC18 TensorFlow 1.7 was released. I was anxious to try it out. Especially to see if there was a "better" CPU build since I was disappointed in my recent testing. This release is supposed to include Intel MKL and have more CPU optimizations like AVX ... so I fired it up and ... was disappointed! I'll show why a little later. It seems that to get a TensorFlow build the way I want it I'm going to have to compile it myself. That's OK, I'm used to that sort of thing. However, I didn't want to clutter up my local machine with the environment I would need to do the build. So, I decided to use Docker to create a build environment. This is really a nice use of Docker in my opinion. It's saving me from polluting my main computer system with libraries and build tools that I don't really want on there. In this post I'll go though how I did that. You might find it useful. [This idea will be really handy when I do this in a later post for a custom GPU build using CUDA.]


The Problem -- Official TensorFlow 1.7 first try and disappointment

Here is one of my first sessions with the new TensorFlow doing a quick "matmul" test. I'm using the official docker container which is the most convenient way to use the binary builds but ... It's not what I want,

kinghorn@i9:~$ docker run --rm -it tensorflow/tensorflow:1.7.0-py3 /bin/bash

root@05839d72e18e:/notebooks# python
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux

>>> import tensorflow as tf
>>> import time
>>> tf.set_random_seed(42)
>>> A = tf.random_normal([10000,10000])
>>> B = tf.random_normal([10000,10000])
>>> def checkMM():
...     start_time = time.time()
...     with tf.Session() as sess:
...             print( sess.run( tf.reduce_sum(tf.matmul(A,B)) ) )
...     print(" took {} seconds".format(time.time() - start_time) )
...
>>> checkMM()
2018-04-05 21:37:29.258701: I tensorflow/core/platform/cpu_feature_guard.cc:140]
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

-873818.56
 took 3.3772947788238525 seconds

>>> help(tf)
...
COMPILER_VERSION = '5.4.0 20160609'
CXX11_ABI_FLAG = 1
...
VERSION
    1.7.0


...this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA!!

This is not a good performance benchmark but it does give me an idea of how the CPU performance is and this is just as bad as what I saw before.

OK, this build does at least have AVX support but nothing higher. I want AVX512 support because my great Intel Xeon-W 2175 has it. Also, it is not linked to MKL and I want that too.

What I don't want, is to clutter up my system with libraries and build tools to get the TensorFlow configuration I'm after.


The Solution -- A TensorFlow build environment in a Docker container

Don't make a mess on your system just to build some big research package. Use a "throwaway" Docker container instead!

If you go to the TensorFlow documentation there are good instruction on how to compile from source. Lets make a container to do that!

Here's what I want for my TensorFlow build,

  • I want Anaconda Python 3.6 as the base Pyhton
  • I want a build with a gcc version that supports my hardware AVX512 and FMA
  • I want Intel MKL linked in to the build for BLAS, Lapack, and the Intel ML libraries i.e. MKL-DNN.

Here's what I'll use to get that,

  • Ubuntu 16.04 base image with a few standard build tools
    • This has gcc 5.4 which supports Skylake-X,W,SP AVX512 and FMA3
  • A current Anaconda3 Python install for Python dependencies
  • The current Bazel "make" tools which TensorFlow uses.
    • OpenJDK since Bazel is written in java.
  • TensorFlow 1.7.0 sources from GitHub

The above sources will go into a directory along with a Docker container that I'll create to do the build.

Step-by-Step Instructions

1) If needed, follow my guide to install and configure Docker

2) Download the sources for TensorFlow

  • Make a director to do the build,
mkdir TF-build
cd Tf-build
  • Get the TensorFlow source tree and "checkout" the branch you want,
git clone https://github.com/tensorflow/tensorflow
cd tensorflow/
git checkout r1.7

3) Setup the docker container build directory

  • From the TF-build directory create a directory for your Dockerfile and some other files we will copy into the container.
mkdir dockerfile
cd dockerfile

4) Get the Anaconda3 install shell archive file,

5) Create a file called bazel.list in the dockerfile directory containing the following line (for the bazel apt repo)

deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8

6) Create the Dockerfile to build the container

  • Put the following in a file named Dockerfile in the dockerfile directory (note the capital "D" in the file name)
# Dockerfile to setup a build environment for TensorFlow
# using Intel MKL and Anaconda3 Python

FROM ubuntu:16.04

# Add a few needed packages to the base Ubuntu 16.04
# OK, maybe *you* don't need emacs :-)
RUN \
    apt-get update && apt-get install -y \
    build-essential \
    curl \
    emacs-nox \
    git \
    openjdk-8-jdk \
    && rm -rf /var/lib/lists/*

# Add the repo for bazel and install it.
# I just put it in a file bazel.list and coped in the file
# containing the following line
# deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8
COPY bazel.list /etc/apt/sources.list.d/
RUN \
  curl https://bazel.build/bazel-release.pub.gpg | apt-key add - && \
  apt-get update && apt-get install -y bazel

# Copy in and install Anaconda3 from the shell archive
# Anaconda3-5.1.0-Linux-x86_64.sh
COPY Anaconda3* /root/
RUN \
  cd /root; chmod 755 Anaconda3*.sh && \
  ./Anaconda3*.sh -b && \
  echo 'export PATH="$HOME/anaconda3/bin:$PATH"' >> .bashrc && \
  rm -f Anaconda3*.sh

# That's it! That should be enough to do a TensorFlow 1.7 CPU build
# using Anaconda Python 3.6 Intel MKL with gcc 5.4

This Dockerfile will,

  • use the official Ubuntu 16.04 base
  • install some needed packages
  • add the apt repo for bazel and install it
  • Install Anaconda3 Python

7) Create the container

docker build -t tf-build-1.7-cpu .

That will create the container we will do the TensorFlow build in. This is a large container! It will take awhile to build and install everything. When you are finished doing the TensorFlow build you might want to delete the container image. You can always reproduce it using the Dockerfile!

8) Start the container and bind the directory with the source tree

docker run --rm -it -v $HOME/projects/TF-build:/root/TF-build tf-build-1.7-cpu

That will start the container. Note that I have my directory for the build in $HOME/projects/TF-build and that is being bound into the container at /root/TF-build.

9) Configure TensorFlow

Now that you are in the container,

cd /root/TF-build/tensorflow/

./configure

You should be looking at the TensorFlow source build documentation now to finish up. ./configure will ask a lot of questions. It should see Anaconda Python 3.6 as the system Python and use that. You will probably want to answer "No" to most of the questions. Also, answer "No" to GPU support since we didn't set up CUDA in this container. (we'll do that another time)

Here's are my answers to configure,

root@52dbf03709b0:~/TF-build/tensorflow# ./configure
/root/anaconda3/bin/python
.
Extracting Bazel installation...
You have bazel 0.11.1 installed.
Please specify the location of python. [Default is /root/anaconda3/bin/python]:


Found possible Python library paths:
  /root/anaconda3/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/root/anaconda3/lib/python3.6/site-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]:
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]:
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]:
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]:
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]:
No CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
Configuration finished

I kept -march=native since on my machine that will give me AVX512 and FMA3.

10) Build TensorFlow

After you are finished with configure you can do the build. I used,

bazel build --config=opt --config=mkl //tensorflow/tools/pip_package:build_pip_package

Note that I used --config=mkl that will cause the build to link in the Intel MKL-ML libs. Those libs are now included in the TensorFlow source tree. (Thank you Intel for making those important libraries available to the public.)

It will take some time to build since TensorFlow is a big package! I was greeted with the wonderful message,

INFO: Elapsed time: 417.744s, Critical Path: 95.61s
INFO: Build completed successfully, 4730 total actions

11) Create the pip package

After your build finishes you will want to create the pip package,

bazel-bin/tensorflow/tools/pip_package/build_pip_package ../tensorflow_pkg

You should now have a "whl" file in your TF-build/tensorflow_pkg directory. You can install that pip package in a conda environment on your local machine if you have Anaconda installed there. This is what I was planning on for the build. I also want this pip package for use in other Docker containers.

12) Install the pip package and do some quick tests

I'll test in the current Docker container. First create a conda env.

conda create -n tftest

source activate tftest

pip install tensorflow_pkg/tensorflow-1.7.0-cp36-cp36m-linux_x86_64.whl

The first thing I want to see is if MKL was linked in correctly. I used "ldd" to check the linbked libraries. Here's the MKL libs I was concerned about,

ldd ~/anaconda3/lib/python3.6/site-packages/tensorflow/libtensorflow_framework.so
...
libmklml_intel.so => /root/anaconda3/lib/python3.6/site-packages/tensorflow/../_solib_k8/_U_S_Sthird_Uparty_Smkl_Cintel_Ubinary_Ublob___Uexternal_Smkl_Slib/libmklml_intel.so (0x00007fbec553d000)
libiomp5.so => /root/anaconda3/lib/python3.6/site-packages/tensorflow/../_solib_k8/_U_S_Sthird_Uparty_Smkl_Cintel_Ubinary_Ublob___Uexternal_Smkl_Slib/libiomp5.so (0x00007fbec5199000)
...

Success! Linked to Intel MKL-ML.

Now for the quick "matmul" test,

(tftest) root@52dbf03709b0:~/TF-build# python
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0] on linux

>>> import tensorflow as tf
>>> import time
>>> tf.set_random_seed(42)
>>> A = tf.random_normal([10000,10000])
>>> B = tf.random_normal([10000,10000])
>>> def checkMM():
...     start_time = time.time()
...     with tf.Session() as sess:
...             print( sess.run( tf.reduce_sum( tf.matmul(A,B) ) ) )
...     print(" took {} seconds".format(time.time() - start_time))
...
>>> checkMM()

-873836.25
 took 1.4292142391204834 seconds

That's what I wanted to see. It is over twice as fast as the official build.

Happy computing! --dbk

Tags: TensorFlow, Intel MKL, Anaconda Python, Docker
george

Thanks for this! However when I'm inside the docker container, the 'tensorflow' directory cannot be found. 'root/TF-build/' directory does exist. Do you know why this might be? Thanks!

Posted on 2018-07-09 14:08:20
george

nevermind, I hadn't changed the docker run command to point to my build location (as you suggested!). Great article.

Posted on 2018-07-09 14:15:06
Donald Kinghorn

Thanks! Docker can be a big help for problems like this. I really love keeping my system "clean". Some of the trouble I had when I did this originally has been resolved with TF 1.8 ... like that bazel version issue ... It's always a "moving target" but core ideas like using docker for stuff like this and take care of lots of problems.
best wishes -- Don

Posted on 2018-07-10 22:34:45
Andre

Thanks for this guide, its very helpful! I'm able to get tensorflow working properly using your instructions, but whenever I exit and re-start the container my conda environment has disappeared and I am forced to re-install it. Is this intended behaviour? Do you know what I could be messing up?

Posted on 2018-07-11 18:58:56
Donald Kinghorn

[This is a long reply because it's common annoyance with docker ... I'm glad you asked this! ]

yes ... a container does not save state automatically. Look at the docker documentation for "commit" if you want to keep changed versions of the container. I use that when I make adjustment to a container. If I make major changes then I build a new container with a fresh Dockerfile.

This is one of the good/bad features of docker,... what you start up is just an "instance" of a container. There are two main things you can do, one is to explicitly use commit to create a new container version with changes in it, or, mount a file system into the container from the host. That is how I generally work with containers i.e. I use -v $HOME/projects:/projects to bind my projects dir into the container so anything I do in there is saved on my host. ...Anything outside of that in the container is gone when you shut it down. You can bind any volumes "-v" you want into the container but be careful because you are root in the container. Also, you have to be careful about file ownership, root vs your user ... I usually setup docker with UserNamespaces for my user. I have a series of posts on how I do a docker setup for a personal workstation. It works pretty well! Here is the last post in that series. It has links at the top to the other 4 posts.

https://www.pugetsystems.co...

In general there are good things and bad things about working in containers ... and you just ran into one of the annoyances :-)

You could create a generic container first and then install with the Anaconda install going into your bound /projects dir. [install to /projects/anaconda3 in the container for example ] Then you would have that available inside AND outside of the container. You might still have some trouble from the "dot" files that get created in your home directory and you have to careful with PATH ...

I mainly did the stuff in this post to get that .whl file built. You can take that file and then create a conda environment on your host system (same as you did inside the container) and then pip install that whl file into your own local Anaconda setup. That will get a local dev environment with your fresh custom TF build in it.

The Anaconda folks have been good lately about getting packages updated and built so I usually use what they have done but I really wanted to do a custom TF build! I still personally like to maintain a local development environment (for Anaconda Python at least). Containers can be incredibly useful for stuff that you don't use often or that would otherwise be difficult to install and maintain. It can get pretty crazy working with development environments. docker can be a life saver or an annoyance!

Best wishes --Don

Posted on 2018-07-12 15:20:25
Evgeny Matusov

Hi, thanks for the detailed instructions!
My question is, whether Anaconda gives you any additional speed-up or is it just convenient to use. I would like to stay with standard Python 3.6 and just compile TF with bazel and MKL support, if possible and if I get the same performance improvements as with Anaconda.

Posted on 2018-08-28 09:17:22
Donald Kinghorn

Hi Evgeny, I like Anaconda for the convenience. I also like conda for package management and it has numpy linked against the latest Intel MKL ... but, it really doesn't mater. Especially when using your own build of TensorFlow!

Posted on 2018-08-28 15:25:28