PCIe X16 vs X8 with 4 x Titan V GPUs for Machine Learning

One of the questions I get asked frequently is “how much difference does PCIe X16 vs PCIe X8 really make?” Well, I got some testing done using 4 Titan V GPU’s in a machine that will do 4 X16 cards. I ran several jobs with TensorFlow with the GPU’s at both X16 and X8. Read on to see how it went.

Microsoft Build 2018 — impressions

I attended the Microsoft Build 2018 developers conference this week and really enjoyed it. I wanted to share my “big picture” feelings about it and some of the things that stood out to me. I’m not going to give you a “reporters” view or repeat press-release items. This is just my personal impression of the conference.

GPU Memory Size and Deep Learning Performance (batch size) 12GB vs 32GB — 1080Ti vs Titan V vs GV100

Batch size is an important hyper-parameter for Deep Learning model training. When using GPU accelerated frameworks for your models the amount of memory available on the GPU is a limiting factor. In this post I look at the effect of setting the batch size for a few CNN’s running with TensorFlow on 1080Ti and Titan V with 12GB memory, and GV100 with 32GB memory.

GTC 2018 Impressions

NVIDIA’s Graphics Technology Conference (GTC) is probably my all-time favorite conference. It’s an interesting blend of “Scientific Research meeting” and Trade-Show. It’s put on by a hardware vendor but still feels like a scientific meeting. It’s not just a “Kool-Aid” fest! In this post I go present some of my thoughts about this years conference.

NAMD Performance on Xeon-Scalable 8180 and 8 GTX 1080Ti GPUs

This post will look at the molecular dynamics program, NAMD. NAMD has good GPU acceleration but is heavily dependent on CPU performance as well. It achieves best performance when there is a proper balance between CPU and GPU. The system under test has 2 Xeon 8180 28-core CPU’s. That’s the current top of the line Intel processor. We’ll see how many GPU’s we can add to those Xeon 8180 CPU’s to get optimal CPU/GPU compute balance with NAMD.

Intel CPU flaw kernel patch effects – GPU compute Tensorflow Caffe and LMDB database creation

The Intel CPU flaw and the Meltdown and Spectre security exploits are causing a lot of concern. There is a possibility of application slowdown from the kernel patches to mitigate the exploits. This slowdown concern is a concern for GPU accelerated application because of the systems calls they require for moving data between CPU and GPU memory space. I did some testing on a couple of large Tensorflow and Caffe machine learning jobs along with the creation of a LMDA database from 1.3 million images.

NVIDIA Titan V vs Titan Xp Preliminary Machine Learning and Simulation Tests

NIVIDA announced availability of the the Titan V card Friday December 8th. We had a couple in hand for testing on Monday December 11th, nice! I ran through many of the machine learning and simulation testing problems that I have done on Titan cards in the past. Results are not the near doubling in performance of past generations… but read on.