Table of Contents
TensorFlow is on it’s way to becoming the “standard” framework for machine learning. There are many reasons for that, and, it is not just for machine learning! In this post I’ll give a descriptive introduction to TensorFlow. This is the first post in a series on how to work with TensorFlow.
Why am I writing this?
Short answer — I’m learning to use TensorFlow myself!
I have had several occasions to use TensorFlow over the past year or so. These have mostly just been using someone else’s code. Not that I feel bad about that, but I have learned enough to know that I really want to learn more and use this for my own work. I’ve also been using TensorFlow in the excellent Deep Learning “specialization” by Andrew Ng on Coursera. I highly recommend that series of courses if you are interested in modern Deep Learning taught by one of the masters of the craft.
What is TensorFlow?
When you go to the TensorFlow web site the first thing you see is,
“An open-source machine learning framework for everyone”
Well, OK, it’s open-source and it’s a machine learning framework but I doubt that it’s really for “everyone”.
A little lower there is “About TensorFlow” and this starts with,
“TensorFlow™ is an open source software library for numerical computation using data flow graphs.”
Now we are getting somewhere. Let’s start with that.
TensorFlow is open-source (Where did it come from)
What became TensorFlow started as an internal use framework called DistBelief for deep neural network programming in the Google Brain group in 2011. That was presented in the paper “Large Scale Distributed Deep Networks” at the NIPS 2012 conference. 2012 is the year that the current deep learning craze took off with Geoffrey Hinton’s group from the University of Toronto won the “ImageNet Large-Scale Visual Recognition Challenge” with a Convolution Neural Network. Two years later Google was winning that with GoogLeNet … and then the NVIDA GPU’s came into play. Since then the interest and work with machine learning and AI has exploded.
DistBelief was refactored and became TensorFlow. (The “why did we do that!” bad stuff was rethought and rewritten, that’s what refactor means.) Google released an implementation of TensorFlow as open-source in 2015 (Apache 2.0 License). I remember it being a pain to build and the performance didn’t look very good. That changed rapidly! The open-source community grew quickly. TensorFlow is extensible and based on some very good ideas at it’s core. Development has been rapid from both the large amount of resources that Google provides for development and the many outside contributors to the code base.
TensorFlow is a software library for numerical computation
This is really what the core TensorFlow is all about (but there is more to it). Saying that it is a “library for numerical computation” is a lot better than saying “it’s a machine learning framework … for everyone”. You can do “plain old” numerical computation with TensorFlow. It does have a special way of organizing the calculations but the operations would look familiar to anyone who has worked with a high level library or language for numerical linear algebra or statistical computing. For example in the documentation for the tf.linalg module there are operations for adjoint, cholesky, det, eigvalsh, svd, norm, eye,trace ...
familiar stuff for someone who has done any mathematical computation. That module certainly caught my eye, but I counted 33 modules listed in the documentation for the Python API. One of the modules called tf.nn has the operations you would associate with machine (deep) learning, like conv2d, dropout, relu, max_pool, softmax. etc.
. There are other machine learning oriented modules like tf.layers, tf.losses, tf.train, tf.keras etc.. TensorFlow is a very large package and it has lots of bits and pieces that you would need for low level machine learning or general numerical computation. … and the list of those modules and operations keeps growing and improving!
I mentioned the Python API. TensorFlow is written in C++ and the main API for using it is written for Python. There are API’s for C++ and Java and Go but Python is the most stable, supported and easiest to use. There are bindings for other languages too but really Python is what you want to use unless you have to use something for project integration purposes.
TensorFlow has enough low-level functionality that you can develop detailed models with it, but it still has enough high-level operations to make it easy to use. For times when your focus is on rapid prototyping there are other high-level “frameworks” that call it as a back-end. Keras is excellent for this.
Tensors and data flow graphs
The next part of the “About TensorFlow” gets to the “how” of its design.
“… numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.”
What is a Tensor?
A tensor is a mathematical object in some n-dimensional vector space. A kind of numerical data structure in computer science terms. It is a generalization of what is normally called a matrix (or vector). For the purpose of TensorFlow it is a data structure. This means it’s just a bunch of numbers arranged in some specified n-dimensional layout. For example,
A 0-d or 0-order tensor is a number i.e. 1 or 4.8 …
A 1-d tensor is vector, maybe something like a stack of values,
A 2-d tensor is a matrix, a grid of values
A 3-d tensor with m rows n columns and 2 “channels would be something like two matrices stacked on top of each other i.e a “cube or box”,
Those kind of structures are very common in numerical “matrix” algebra and are often used for numerical computing. … just a bunch of numbers laid out in a way that make some kind of sense when doing mathematical operations and implementing them in a computer program.
A simple and common tensor is an RGB photo image. It is a 3 dimensional array (tensor) with some number of pixels m x n describing it’s “size” and 3 channels for Red Green and Blue. It’s like a stack of 3 matrices of pixel color values to represent an image.
What is a “flow graph”
A graph is another mathematical object. It roughly consists of points called nodes and relations between those nodes called edges i.e. lines connecting them. Graph theory is a a very beautiful branch of mathematics in my opinion! A computation flow graph is a directed graph with,
- “operations” for nodes
- and data tensors for edges.
- The tensor data flows through the graph while being operated on at the nodes. Thus the name TensorFlow.
Here’s a simple flow graph example of what part of a common machine learning calculation might look like
This is the flow diagram of the two sequential expressions. A is a matrix, x is a vector that undergoes a matrix multiplication (matmul) with A. That result which is a vector is added (+) together with the vector b and that result flows into the (sigmoid) operation to result in a. The fraction with exp(-z) in it is known as a “sigmoid function” and (sigmoid) is an implemented “operation” in TensorFlow (as is matmul).
That is the essence of the program structure design of TensorFlow. There is a lot more to it than that but that’s the basic idea.
When you use TensorFlow you basically do two things,
- add a series of operations to a graph object
- then create a “session” to “run” the graph.
It may sound strange but there is some very nice things that TensorFlow can do for you because of that structure.
Why use tensors and flow graphs?
You may be thinking — OK, that’s interesting, but why did they do that? What are those “nice” things that TensorFlow can do for me.
Here are a few of the reason that the tensor flow-graph is a good design,
Parallel execution
Having a computation-graph facilitates parallel execution. The graph can be broken up in to independent pieces that can be executed on multiple CPU’s, GPU’s and across system-nodes i.e. clusters. Some of those TensorFlow “operations” can be things like (send) and (receive). Lot’s of problems scale reasonable well with TensorFlow.
Automatic-differentiation and Back-Propagation
With the operations in small pieces as graph nodes it is feasible to do automatic-differentiation In case you don’t understand the implication of that let me assure you it is a big deal! Machine learning and, in general, much of scientific computing uses numerical optimization which usually depends on the availability of derivatives or gradients. One of the key methodologies in training neural networks is called Back-Propagation. (That’s basically differentiation using the chain rule.) It’s one of the most important ideas in machine learning. That can be very difficult to derive mathematically and hard to implement as code. You basically get it for free with TensorFlow. (I’ll test how good it is in a later post.)
“Operations” as CUDA Kernels”
Many of the TensorFlow “operations” i.e. nodes, are implemented as NVIDIA CUDA kernels ready for execution on GPU’s. Again, that’s a big deal! Being able to take advantage of the compute capability of GPU’s without having to write low-level GPU code.
There’s more!
TensorFlow is part of a set of packages.
TensorBoard
TensorBoard is graph visualization software for the flow-graphs. It is a great tools for visualizing a complicated network and useful for debugging code.
TensorFlow Serving
TensorFlow Serving is a high-performance server for deploying trained TensorFlow models in a production environment.
You have
- TensorFlow for development
- TensorBoard for design and debugging
- TensorFlow Serving for deployment
There is a lot to like about TensorFlow and I’m anxious to did deeper into it myself. I’ll keep you posted on how that goes!
Happy computing –dbk