Why You Should Consider PyTorch (includes Install and a few examples)

PyTorch is a relative newcomer to the list of ML/AI frameworks. It was launched in January of 2017 and has seen rapid development and adoption, especially since the beginning of 2018. It is also nearing the 1.0 release and it looks like the recently released 0.4 version is a freeze of the API in preparation for version 1. If you have been curious about PyTorch but haven’t tried it yet now is probably a good time. I have just started looking at it myself and so far I’m liking it a lot.

What is PyTorch?

From the name you can guess that it is a Python port of the Lua based Torch framework. That is only partly correct. PyTorch is really taking on a life of it’s own. It does implement “legacy” Torch compatibility but it is moving rapidly beyond that because of the popularity of Python. It’s mostly a mix of 3 pieces/ideas,

  • Torch – Lua: A great computer vision and general compute framework with a fair amount of use in game development, and visualization. It has a small user base. Torch – Lua is a Facebook AI Research project. Lua is a scripting language that integrates nicely with C/C++. It’s been around for 25 years! Torch – Lua has good CUDA GPU acceleration.
  • Chainer: Chainer is a Deep Neural Network framework using Python with GPU acceleration from CuPy. the development is led by the Japanese venture company Preferred Networks. One of the most notable feature of Chainer is “Define-by-Run”. That’s a principle feature that PyTorch has adopted. This means “dynamic” model execution. It’s as opposed to “Define-AND-Run” which means the model has to be compiled statically before it’s run.
  • HIPS Autograd: HIPS Autograd is an automatic differentiation library that can differentiate native Python and Numpy code. It came from the the Harvard Intelligent Probabilistic Systems lab. It uses dynamic define-by-run and immediate (eager) execution. This fits the basic PyTorch design and is the basis of the automatic gradient functionality in PyTorch.

Nice Features of PyTorch

PyTorch is picking up a lot of users. I believe this is largely because it has a very “pythonic” feel to it. It’s not like TensorFlow where the feel is more like python is a wrapper around an external programming language (which is basically true). PyTorch seems to be nice for experimenting with algorithms and it’s simple to debug. It is essentially like using Numpy with the option of using GPU acceleration if you want. It has some useful modules for neural networks, optimization and auto gradients.

I mentioned TensorFlow above. I don’t really like the PyTorch vs TensorFlow arguments. It’s almost impossible to not make that comparison though. TensorFlow is the leader in usage for ML/AI development by a wide margin over all of the other frameworks. TensorFlow has some great developers and a strong community and you can go from development to production including mobile with it. The biggest problem with TensorFlow is the learning curve, debugging and “feel”.

My main interest in PyTorch is for experimentation! I want to use it for exploring new ML/AI algoriths and for use outside of of that domain i.e. as a general purpose GPU accelerated numerical linear algebra play-ground.

Here are some (a lot) of the strengths of PyTorch,

  • Strong and very active development community. Many of the lead developers are at Facebook AI but it is not really a Facebook project. It is very open. It is used internally at Facebook for research, with Caffe2 being used for production.
  • Easy to learn. If you are moderately skilled with python and numpy you will be able to get started quickly.
  • It is becoming tightly aligned with Caffe 2 to meet production requirements.
  • It has ONNX support for porting models to other frameworks. This is a big plus. It’s not limited for use together with Caffe2.
  • Fast.ai has switched from Keras/TensorFlow to PyTorch. If you are not familiar with Fast.ai you should follow the link and check them out. They are doing great stuff.
  • Dynamic execution graphs. This is a big one, and it’s why PyTorch has a nice “feel”. You can execute your model graphs as you development them. This is the influence from Chainer. This is the “Define-by-Run” feature. It’s a large part of what makes PyTorch fast and easy to use.
  • Easy debugging. This is largely a result of the item above. It is more like plan old python debugging. Not too bad.
  • PyTorch tensors are essentially equivalent to numpy arrays. You can switch back and forth with ease and they use the same memory space.
  • Strong GPU acceleration. NVIDIA CUDA is well utilized and it is very simple to load and execute code on the GPU.
  • It has TensorBoard support. tensorboard-pytorch
  • It has multi-processing and distributed computing modules for multi-GPU multi-node communication and execution.
  • Did I mention that it is being rapidly developed and the community is extending functionality at a quick pace … It has great momentum right now and version 1.0 will out soon. It looks very promising.


Yes, there are missing features, bugs and all that. However, if you read any criticism that is more than a few months old there is a good chance that the deficiencies have been addressed. PyTorch is very young but it is already quite useful and fun to work with. Will it be “just another” framework? I don’t think so since a lot of researchers seem to be getting fond of it. Is it going to overtake TensorFlow? I highly doubt that! I think it fits best in the ML/AI/HPC researchers bag-of-tricks for exploration, working up new ideas and, for rapid prototyping.

Installing PyTorch

PyTorch is very easy to install. They also have good support for Anaconda Python. Pytorch.org has an “Organization” repository on Anaconda Cloud with their latest builds (even nightly builds). Nice! [That is something that I’ve felt TensorFlow was falling way short with.] The PyTorch package on Anaconda Cloud is optimized for recent CUDA, cuDNN and MKL. It fits in nicely with the excellent “data science stack” that Anaconda Python is.

I’ll show you the PyTorch.org home page with the simple install selector and then describe how to, optionally, create a specific pytorch conda “environment” to install into and how to add a “kernel” for Jupyter Notebooks using that environment.


Notice those buttons. Just use them to select the install configuration you want and it will give you a command-line argument to run for the install. Also, notice that Windows is now officially supported.

For information on installing Anaconda Python you might find the following posts generally useful.
Install TensorFlow with GPU Support the Easy Way on Ubuntu 18.04 (without installing CUDA) and The Best Way to Install TensorFlow with GPU Support on Windows 10 (Without Installing CUDA).

Optional: Create a conda environment and Jupyter Notebook Kernel for PyTorch

You could just run the command that is suggested from the install configuration buttons. That would install to your “base” Python environment. There is nothing wrong with that if you are adding well maintained packages. However, my preference is to keep extra packages like frameworks separate from my “base” environment. [It’s also good practice to create “per project” environments that include just-what-you-need when you are working.]

Here’s how you can create a conda environment for PyTorch.

From your shell or (CMD in Windows) do,

conda create --name pytorch       
source activate pytorch

For windows it’s activate pytorch

Now you can install PyTorch in this activated environment using the command you got from pytorch.org.

conda install pytorch torchvision cuda91 -c pytorch

Now setup a kernel for Jupyter Notebooks

conda install ipykernel
python -m ipykernel install --user --name pytorch --display-name "PyTorch"

Now when you are in the “pytorch” environment starting a Jupyter notebook will list “PyTorch” in the “New” menu.

Some Examples (Experiments)

I’m not going to just go over one of the tutorials. There are some good resources listed on pytorch.org and Ritchie Ng GitHub repo “The Incredible PyTorch” has a lot of great links.

I’m going to to show you a few simple testing experiments. What follows are clips of “In” and “Out” cells from a Jupyter notebook using PyTorch.

import numpy as np
import torch  # Yes, it's "torch" not "pytorch"

import time

print("PyTorch version: ", torch.__version__ )
print("CUDA available: ", torch.cuda.is_available())
print("CUDA version: ", torch.version.cuda)
PyTorch version:  0.4.0
CUDA available:  True
CUDA version:  9.1.85

Norm of matrix product: numpy array, pytorch tensor, GPU tensor

For a first test we can see how variables are defined with PyTorch and do little performance testing.

I’m using a system with a Xeon-W 2175 14-core CPU and a NVIDIA 1080Ti GPU. The CPU calculation utilizes all 14-cores.

First numpy

I’ll force “float32” type so I have the same type on the CPU and GPU. [I’m using the new in-line matrix multiply operator “@”]

n = 10000

A = np.random.randn(n,n).astype('float32')
B = np.random.randn(n,n).astype('float32')

start_time = time.time()
nrm = np.linalg.norm(A@B)
print(" took {} seconds ".format(time.time() - start_time))
print(" norm = ",nrm)
took 1.2215766906738281 seconds
norm =  1000197.3

Now PyTorch

I’ll generate PyTorch tensors and do the same thing. Here “norm” is a method of the Tensor class. [The random number generation code in PyTorch was much faster than numpy but I didn’t time it.]

tA = torch.randn(n,n)
tB = torch.randn(n,n)

start_time = time.time()
tnrm = (tA@tB).norm()
print(" took {} seconds ".format(time.time() - start_time))
print(" norm = ",tnrm)
took 1.2832603454589844 seconds
norm =  tensor(1.00000e+06 * 1.0001)

About the same as numpy.

PyTorch with CUDA

Now lets do this on the GPU.

gA = torch.randn(n,n, device="cuda")
gB = torch.randn(n,n, device="cuda")

start_time = time.time()
gnrm = (gA@gB).norm()
print(" took {} seconds ".format(time.time() - start_time))
print(" norm = ",gnrm)
took 0.32875895500183105 seconds
norm =  tensor(1.00000e+06 * 1.0002, device='cuda:0')

All I had to do to get use of the GPU was add device=”cuda”, nice! It ran about 4 times faster on the 1080Ti than the 14-cores of the Xeon-W.

Gradient (Jacobian Matrix) of the norm of a matrix product

This is a little more challenging/interesting test. I want to see if the PyTorch auto-differentiation is doing the right thing.

The norm of a matrix product AX is, [note: I use ‘ to represent matrix transpose]

norm(AX) = trace[(AX)’ AX]^1/2

and from that (if you know matrix calculus) you can find the derivative with respect to the matrix X, (trust me!)

d(norm(AX))/dX = A’AX/norm(AX)

Since I know the closed form analytic formula I can check the PyTorch “autograd”. I’ll use small matrices so we can see the output.

First numpy to check the norm and derivative formula

A = np.random.randn(4,4).astype('float32')
X = np.random.randn(4,4).astype('float32')

array([[-2.0457231e-01,  1.1254210e+00, -1.2376943e-01, -3.7727800e-01],
       [-3.4825677e-01,  3.9902925e-01, -3.2082230e-01, -4.6699187e-01],
       [ 3.2728058e-01, -8.5915554e-01,  1.7892727e+00,  1.5172162e-03],
       [-1.4879596e+00, -2.7159253e-01, -3.5554385e-01,  4.1141266e-01]],
array([[ 1.224036  ,  0.7789602 ,  0.56167126, -0.33754435],
       [ 0.17940208, -0.46755362,  0.1660711 ,  0.8927178 ],
       [ 0.31906456, -0.1333178 ,  1.022022  ,  0.9054658 ],
       [-0.18456306,  0.43555623,  0.05201409, -0.5809525 ]],

Now get the norm,

nrmAX = np.linalg.norm(A@X)

And, now the derivative (Jacobian) from the analytic formula,

dnrmAX = (A.transpose()@A@X)/nrmAX
array([[ 0.9124136 ,  0.45334324,  0.68848616,  0.0766317 ],
       [-0.0819881 , -0.34912252, -0.40818632,  0.25145066],
       [ 0.6097719 ,  0.3521573 ,  1.0386969 ,  0.30083472],
       [-0.17536718,  0.07175894, -0.06515761, -0.20556062]],

PyTorch norm and derivative (Jacobian) using autograd

This time we will get our PyTorch tensors by converting from the numpy arrays (this is a great feature!)

tA = torch.tensor(A)
tensor([[-0.2046,  1.1254, -0.1238, -0.3773],
        [-0.3483,  0.3990, -0.3208, -0.4670],
        [ 0.3273, -0.8592,  1.7893,  0.0015],
        [-1.4880, -0.2716, -0.3555,  0.4114]])

For tX I’ll set “requires_grad=True”. This tells PyTorch to keep track of this variable for gradient computation.

tX = torch.tensor(X, requires_grad=True)
tensor([[ 1.2240,  0.7790,  0.5617, -0.3375],
        [ 0.1794, -0.4676,  0.1661,  0.8927],
        [ 0.3191, -0.1333,  1.0220,  0.9055],
        [-0.1846,  0.4356,  0.0520, -0.5810]])

Compute the norm of tAtX,

nrmtAX = (tA@tX).norm()

Now, use autograd to compute the derivative (Jacobian)

torch.autograd.grad(nrmtAX, tX)
(tensor([[ 0.9124,  0.4533,  0.6885,  0.0766],
         [-0.0820, -0.3491, -0.4082,  0.2515],
         [ 0.6098,  0.3522,  1.0387,  0.3008],
         [-0.1754,  0.0718, -0.0652, -0.2056]]),)

Excellent! It works!

I’m really impressed with PyTorch and having fun using it. I will certainly spend some quality time with PyTorch.

I hope this is inspired you to check out PyTorch. It’s at 0.4 version right now but should just be getting bug fixes and optimization on the way to 1.0. Enjoy!

Happy computing –dbk