PyTorch for Scientific Computing - Quantum Mechanics Example Part 3) Code Optimizations - Batched Matrix Operations, Cholesky Decomposition and InverseWritten on August 31, 2018 by Dr Donald Kinghorn
An amazing result in this testing is that "batched" code ran in constant time on the GPU. That means that doing the Cholesky decomposition on 1 million matrices took the same amount of time as it did with 10 matrices! In this post we start looking at performance optimization for the Quantum Mechanics problem/code presented in the first 2 posts. This is the start of the promise to make the code over 15,000 times faster! I still find the speedup hard to believe but it turns out little things can make a big difference.
PyTorch for Scientific Computing - Quantum Mechanics Example Part 2) Program Before Code OptimizationsWritten on August 16, 2018 by Dr Donald Kinghorn
This is the second post on using Pytorch for Scientific computing. I'm doing an example from Quantum Mechanics. In this post we go through the formulas that need to coded and write them up in PyTorch and give everything a test.
Doing Quantum Mechanics with a Machine Learning Framework: PyTorch and Correlated Gaussian Wavefunctions: Part 1) IntroductionWritten on July 31, 2018 by Dr Donald Kinghorn
A Quantum Mechanics problem coded up in PyTorch?! Sure! Why not? I'll explain just enough of the Quantum Mechanics and Mathematics to make the problem and solution (kind of) understandable. The focus is on how easy it is to implement in PyTorch. This first post will give some explanation of the problem and do some testing of a couple of the formulas that will need to be coded up.
NAMD Custom Build for Better Performance on your Modern GPU Accelerated Workstation -- Ubuntu 16.04, 18.04, CentOS 7Written on July 20, 2018 by Dr Donald Kinghorn
In this post I will be compiling NAMD from source for good performance on modern GPU accelerated Workstation hardware. Doing a custom NAMD build from source code gives a moderate but significant boost in performance. This can be important considering that large simulations over many time-steps can run for days or weeks. I wanted to do some custom NAMD builds to ensure that that modern Workstation hardware was being well utilized. I include some results for the STMV benchmark showing the custom build performance boost. I've included some results using NVIDIA 1080Ti and Titan V GPU's as well as an "experimental" build using an Ubuntu 18.04 base.
PyTorch is a relatively new ML/AI framework. It combines some great features of other packages and has a very "Pythonic" feel. It has excellent and easy to use CUDA GPU acceleration. It is fun to use and easy to learn. read on for some reasons you might want to consider trying it. I've got some unique example code you might find interesting too.
In this post I go through a simple modification to the VGG Image Annotator that adds easy to use buttons for adding labels to image object bounding-boxes. It is very fast way to do what could be a tedious machine learning data preparation task.
In this post I'll walk you through the best way I have found so far to get a good TensorFlow work environment on Windows 10 including GPU acceleration. YOU WILL NOT HAVE TO INSTALL CUDA! I'll also go through setting up Anaconda Python and create an environment for TensorFlow and how to make that available for use with Jupyter notebook. As a "non-trivial" example of using this setup we'll go through training LeNet-5 with Keras using TensorFlow with GPU acceleration. We'll get a setup that is 18 times faster than using the CPU alone.
In this post I'll be going over details of Installing Ubuntu 18.04 including the NVIDIA display driver and, any one of the available desktop environments. I'll do this starting from a base server install. I'll go over a few possible pitfalls and end with a short discussion on the new netplan configuration tool for Ubuntu networking.
In this post I'll walk you through the best way I have found so far to get a good TensorFlow work environment on Windows 10 including GPU acceleration. I'll go through how to install just the needed libraries (DLL's) from CUDA 9.0 and cuDNN 7.0 to support TensorFlow 1.8. I'll also go through setting up Anaconda Python and create an environment for TensorFlow and how to make that available for use with Jupyter notebook. As a "non-trivial" example of using this setup we'll go through training LeNet-5 with Keras using TensorFlow with GPU acceleration. We'll get a setup that is 18 times faster than using the CPU alone.
TensorFlow is a very important Machine/Deep Learning framework and Ubuntu Linux is a great workstation platform for this type of work. If you are wanting to setup a workstation using Ubuntu 18.04 with CUDA GPU acceleration support for TensorFlow then this guide will hopefully help you get your machine learning environment up and running without a lot of trouble. And, you don't have to do a CUDA install!
One of the questions I get asked frequently is "how much difference does PCIe X16 vs PCIe X8 really make?" Well, I got some testing done using 4 Titan V GPU's in a machine that will do 4 X16 cards. I ran several jobs with TensorFlow with the GPU's at both X16 and X8. Read on to see how it went.
I attended the Microsoft Build 2018 developers conference this week and really enjoyed it. I wanted to share my "big picture" feelings about it and some of the things that stood out to me. I'm not going to give you a "reporters" view or repeat press-release items. This is just my personal impression of the conference.
I have been qualifying a 4 GPU workstation for Machine Learning and HPC use. The last confirmation testing I wanted to do was running it with TensorFlow benchmarks on 4 NVIDIA Titan V GPU's. I have that systems up and running and the multi-GPU scaling looks very good.
GPU Memory Size and Deep Learning Performance (batch size) 12GB vs 32GB -- 1080Ti vs Titan V vs GV100Written on April 27, 2018 by Dr Donald Kinghorn
Batch size is an important hyper-parameter for Deep Learning model training. When using GPU accelerated frameworks for your models the amount of memory available on the GPU is a limiting factor. In this post I look at the effect of setting the batch size for a few CNN's running with TensorFlow on 1080Ti and Titan V with 12GB memory, and GV100 with 32GB memory.
Tensor-cores are one of the compelling new features of the NVIDIA Volta architecture. In this post I discuss the some thought on mixed precision and FP16 related to Tensor-cores. I have some performance results for large convolution neural network training that makes a good argument for trying to use them. Performance looks very good.
Building TensorFlow from source is challenging but the end result can be a version tailored to your needs. This post will provide step-by-step instructions for building TensorFlow 1.7 linked with Anaconda3 Python, CUDA 9.1, cuDNN7.1, and Intel MKL-ML. I do the build in a docker container and show how the container is generated from a Dockerfile.
In this post I go through how to use Docker to create a container with all of the libraries and tools needed to compile TensorFlow 1.7. The build will include links to Intel MKL-ML (Intel's math kernel library plus extensions for Machine Learning) and optimizations for AVX512.
NVIDIA's Graphics Technology Conference (GTC) is probably my all-time favorite conference. It's an interesting blend of "Scientific Research meeting" and Trade-Show. It's put on by a hardware vendor but still feels like a scientific meeting. It's not just a "Kool-Aid" fest! In this post I go present some of my thoughts about this years conference.
TensorFlow is a very powerful numerical computing framework. However, like any large research level program it can be challenging to install and configure. In this post I'll try to give some guidance on relatively easy ways to get started with TensorFlow. I'll only look at relatively simple "CPU only" Installs with "standard" Python and Anaconda Python in this post. (I also have a quick test with Intel Python.)
TensorFlow is on it's way to becoming the "standard" framework for machine learning. There are many reasons for that, and, it is not just for machine learning! In this post I'll give a descriptive introduction to TensorFlow. This is the first post in a series on how to work with TensorFlow. Hopefully after reading thsi you will have a better understanding of the What? and Why? of TensorFlow.
This post will look at the molecular dynamics program, NAMD. NAMD has good GPU acceleration but is heavily dependent on CPU performance as well. It achieves best performance when there is a proper balance between CPU and GPU. The system under test has 2 Xeon 8180 28-core CPU's. That's the current top of the line Intel processor. We'll see how many GPU's we can add to those Xeon 8180 CPU's to get optimal CPU/GPU compute balance with NAMD.
TensorFlow Scaling on 8 1080Ti GPUs - Billion Words Benchmark with LSTM on a Docker Workstation ConfigurationWritten on March 2, 2018 by Dr Donald Kinghorn
In this post I present some Multi-GPU scaling tests running TensorFlow on a very nice system with 8 1080Ti GPU's. I use the Docker Workstation setup that I have recently written about. The job I ran for this testing was the "Billion Words Benchmark" using an LSTM model. Results were very good and better than expected.
How-To Setup NVIDIA Docker and NGC Registry on your Workstation - Part 5 Docker Performance and Resource TuningWritten on February 23, 2018 by Dr Donald Kinghorn
This should be the last post in this series dealing with the Docker setup for accessing the NVIDIA NCG Docker registry on your workstation. There are a couple of configuration tuning changes that you may want to make. These will improve performance and ensure that you have proper system "user limit" resources to handle large application and job runs with docker.
This post will go through how to get access to the NVIDIA NGC container registry on your workstation. The first 3 posts in this series gave instructions on how to install and configure a base Ubuntu 16.04 workstation system with Docker and NVIDIA-Docker for a usable work-flow. With that taken care of we can get setup to use the many useful docker images in the NGC container registry for your local system.