GPU Memory Size and Deep Learning Performance (batch size) 12GB vs 32GB -- 1080Ti vs Titan V vs GV100Written on April 27, 2018 by Dr Donald Kinghorn
Batch size is an important hyper-parameter for Deep Learning model training. When using GPU accelerated frameworks for your models the amount of memory available on the GPU is a limiting factor. In this post I look at the effect of setting the batch size for a few CNN's running with TensorFlow on 1080Ti and Titan V with 12GB memory, and GV100 with 32GB memory.
Tensor-cores are one of the compelling new features of the NVIDIA Volta architecture. In this post I discuss the some thought on mixed precision and FP16 related to Tensor-cores. I have some performance results for large convolution neural network training that makes a good argument for trying to use them. Performance looks very good.
Building TensorFlow from source is challenging but the end result can be a version tailored to your needs. This post will provide step-by-step instructions for building TensorFlow 1.7 linked with Anaconda3 Python, CUDA 9.1, cuDNN7.1, and Intel MKL-ML. I do the build in a docker container and show how the container is generated from a Dockerfile.
In this post I go through how to use Docker to create a container with all of the libraries and tools needed to compile TensorFlow 1.7. The build will include links to Intel MKL-ML (Intel's math kernel library plus extensions for Machine Learning) and optimizations for AVX512.
NVIDIA's Graphics Technology Conference (GTC) is probably my all-time favorite conference. It's an interesting blend of "Scientific Research meeting" and Trade-Show. It's put on by a hardware vendor but still feels like a scientific meeting. It's not just a "Kool-Aid" fest! In this post I go present some of my thoughts about this years conference.
TensorFlow is a very powerful numerical computing framework. However, like any large research level program it can be challenging to install and configure. In this post I'll try to give some guidance on relatively easy ways to get started with TensorFlow. I'll only look at relatively simple "CPU only" Installs with "standard" Python and Anaconda Python in this post. (I also have a quick test with Intel Python.)
TensorFlow is on it's way to becoming the "standard" framework for machine learning. There are many reasons for that, and, it is not just for machine learning! In this post I'll give a descriptive introduction to TensorFlow. This is the first post in a series on how to work with TensorFlow. Hopefully after reading thsi you will have a better understanding of the What? and Why? of TensorFlow.
This post will look at the molecular dynamics program, NAMD. NAMD has good GPU acceleration but is heavily dependent on CPU performance as well. It achieves best performance when there is a proper balance between CPU and GPU. The system under test has 2 Xeon 8180 28-core CPU's. That's the current top of the line Intel processor. We'll see how many GPU's we can add to those Xeon 8180 CPU's to get optimal CPU/GPU compute balance with NAMD.
TensorFlow Scaling on 8 1080Ti GPUs - Billion Words Benchmark with LSTM on a Docker Workstation ConfigurationWritten on March 2, 2018 by Dr Donald Kinghorn
In this post I present some Multi-GPU scaling tests running TensorFlow on a very nice system with 8 1080Ti GPU's. I use the Docker Workstation setup that I have recently written about. The job I ran for this testing was the "Billion Words Benchmark" using an LSTM model. Results were very good and better than expected.
How-To Setup NVIDIA Docker and NGC Registry on your Workstation - Part 5 Docker Performance and Resource TuningWritten on February 23, 2018 by Dr Donald Kinghorn
This should be the last post in this series dealing with the Docker setup for accessing the NVIDIA NCG Docker registry on your workstation. There are a couple of configuration tuning changes that you may want to make. These will improve performance and ensure that you have proper system "user limit" resources to handle large application and job runs with docker.
This post will go through how to get access to the NVIDIA NGC container registry on your workstation. The first 3 posts in this series gave instructions on how to install and configure a base Ubuntu 16.04 workstation system with Docker and NVIDIA-Docker for a usable work-flow. With that taken care of we can get setup to use the many useful docker images in the NGC container registry for your local system.
In this post I'll go through setting up Docker to use User-Namespaces. This is a very important step to achieving a comfortable docker work-flow on a personal Workstation. I will show you how to configure Docker so that instead of files and processes being owned by root they will be owned by your personal user account. This will make using Docker containers on your system safer and feel much the same as a "normally" installed application.
How-To Setup NVIDIA Docker and NGC Registry on your Workstation - Part 2 Docker and NVIDIA-Docker-v2Written on February 2, 2018 by Dr Donald Kinghorn
This post will build on top of the base systems setup described in Part1. We will go through installing,configuring and testing Docker and NVIDIA-Docker version 2.
How-To Setup NVIDIA Docker and NGC Registry on your Workstation - Part 1 Introduction and Base System SetupWritten on January 26, 2018 by Dr Donald Kinghorn
One of my New Years resolutions was to adopt a Docker based workflow. I had also promised in my recent post on testing the Titan V that I would do a series of How-To's on setting up docker and ultimately configuring and using the excellent NVIDIA NGC docker registry. This is the fist post of that series and covers the base system setup, motivation and references.
In this post I'll be going over details of Installing Ubuntu 16.04 including the NVIDIA display driver and, optionally, NVIDIA CUDA. I have found the method presented here to be the most likely to succeed no matter what hardware configuration you are installing onto.
The Intel CPU flaw and the Meltdown and Spectre security exploits are causing a lot of concern. There is a possibility of application slowdown from the kernel patches to mitigate the exploits. This slowdown concern is a concern for GPU accelerated application because of the systems calls they require for moving data between CPU and GPU memory space. I did some testing on a couple of large Tensorflow and Caffe machine learning jobs along with the creation of a LMDA database from 1.3 million images.
New Years resolutions are notorious for being overly ambitious, vague, and quickly forgotten.But, I'm not going to let that stop me from making some! In order to keep myself from forgetting what I resolve to do I'm going to write them down in public! These are my resolutions for when I'm wearing my System Administrator and Developer hats.
I've been exposed to enough computing "teasers" in 2017 that I feel I can stick my neck out a little and make some predictions for 2018. Some of these are pretty wild i.e. unlikely but I want to put them out there anyway.
NIVIDA announced availability of the the Titan V card Friday December 8th. We had a couple in hand for testing on Monday December 11th, nice! I ran through many of the machine learning and simulation testing problems that I have done on Titan cards in the past. Results are not the near doubling in performance of past generations... but read on.
The new Intel core-i9 and core-i7 "enthusiast" "X", Skylake-X processors and the single socket Xeon Skylake-W (Workstation) processors seem nearly identical. I'll discuss the differences and make my recommendation on which to use.
Intel Purley platform, Skylake-SP, Xeon "Scalable" processors (Platinum, Gold, Sliver, Bronze) are here. All 58 of them! Hopefully this post will help you to decide which of these (excellent) processors may be of use for your applications. I trim the list do to just a few of my favorites and break them down by use-case.
ARM for HPC? Supercomputers using ARM processors? Yes! I was at SC17 last week and ARM was a hot topic. There are new ARM processor designs that are fully competitive with Intel and AMD CPU's for high performance computing.
Which Intel CPU is for heavy numerical compute workloads, Skylake-X core i7 7800X or Coffee-Lake core i7 8700K? They are priced nearly the same. The 8700K has high core clock frequencies and good power management but the 7800X has AVX-512. I show you which one comes out on top using an Intel optimized Linpack benchmark.
Intel Core-i9 7900X and 7980XE are very good desktop processors for mathematical computing workloads. This post is a short listing of results for the Linpack benchmark which is still my personal favorite CPU performance metric.
I can't think of of trending field of scientific research that has ever been better suited for "beginners" than Machine Learning and AI. Even though the field has been around for decades it feels like day one. There is now a perfect convergence of resources to facilitate the learning and doing of Machine Learning.