Table of Contents
NVIDIA has released the Titan Xp which is an update to the Titan X Pascal (they both use the Pascal GPU core). They also recently released the GTX1080Ti which proved to be every bit as good at the Titan X Pascal but at a much lower price. The new Titan Xp does offer better performance and is currently their fastest GeForce card. How much faster? I decided to find out by running a large Deep Learning image classification job to see how it performs for GPU accelerated Machine Learning.
The Titan Xp offers 10-20% performance gain over the Titan X Pascal and the GTX1080Ti for training a large Deep Neural Network.
Visually the only difference between the Titan X and Titan Xp is the lack of DVI on the Xp!
The details about the test system and how the jobs were setup will follow the results. The primary results are for a training a Deep Neural Network (GoogleLeNet) for image classification with a 1.3 million image data set from ImageNet. I also have comparative nbody benchmark performance for several cards.
I have included results from a couple of older posts for comparison.
NVIDIA GTX 1080Ti Performance for Machine Learning — as Good as TitanX?
GoogLeNet model training with Caffe on 1.3 million image dataset for 30 epochs
|GPU’s||Model training runtime||~ GPU(s) cost ($)|
|(1) GTX 1070||32hr||400|
|(2) GTX 1070||19hr 32min||800|
|(4) GTX 1070||12hr 43min||1600|
|(1) GTX 1080Ti||19hr 43min||700|
|(2) GTX 1080Ti||13hr 12min||1400|
|(4) GTX 1080Ti||7hr 43min||2800|
|(1) Titan X Pascal||19hr 34min||1400|
|(2) Titan X Pascal||13hr 21min||2800|
|(4) Titan X Pascal||8hr 1min||5600|
|(1) Titan Xp||17hr 33min||1400|
|(2) Titan Xp||10hr 40min||2800|
The Titan Xp offers 10-20% performance gain over the Titan X Pascal and the GTX1080Ti but at twice the cost
The (1) and (2) GTX 1070 and the (1) Titan Xp job runs were done with an image batch size of 64 all others used an image batch size of 128
It’s not unusual to see fluctuations in run times on the order 30min.
The next table shows the results of
nbody -benchmark -numbodies=256000 (
nbody from the CUDA samples source code).
GTX 1070, 1080Ti, Titan X Pascal and Titan Xp nbody Benchmark
|(1) GTX 1070||4137 GFLOP/s|
|(1) GTX 1080Ti||7514 GFLOP/s|
|(1) Titan X Pascal||7524 GFLOP/s|
|(1) Titan Xp||7904 GFLOP/s|
Video cards used for testing.( Data from nvidia-smi )
|Card||CUDA cores||GPU clock MHz||Memory clock MHz*||Application clock MHz*||FB Memory MiB|
|TITAN X Pascal||3584||1911||5005||1417||12186|
* Clocks can vary by manufacture and are not always displayed with nvidia-smi
The testing was done on my test-bench layout of our Peak Single (DIGITS GPU Workstation) recommended system for DIGITS/Caffe.
The Peak Single (“DIGITS” GPU Workstation)
CPU: Intel Core i7 6850K 6-core @ 3.6GHz (3.7GHz All-Core-Turbo)
Memory: 128 GB DDR4 2133MHz Reg ECC
PCIe: (4) X16-X16 v3
Motherboard: ASUS X99-E-10G WS
Heavy compute on GeForce cards can shorten their lifetime! I believe it is perfectly fine to use these cards but keep in mind that you may fry one now and then!
The OS I used for this testing was Ubuntu 16.04.2 install with the Docker and NVIDIA-Docker Workstation configuration I’ve been working on. See, these posts for information about that;
- Docker and NVIDIA-docker on your workstation: Motivation
- Docker and NVIDIA-docker on your workstation: Installation
- Docker and NVIDIA-docker on your workstation: Setup User Namespaces
- Docker and NVIDIA-docker on your workstation: Using Graphical Applications
- Docker and Nvidia-Docker on your workstation: Common Docker Commands Tutorial
Following is a list of the software in the nvidia/digits Docker image used in the testing.
- Ubuntu 14.04
- CUDA 8.0.61
- DIGITS 5.0.0
- caffe-nv (0.15.13-3ubuntu14.04+cuda8.0), cuDNN 5
Host environment was,
- Ubuntu 16.04
- Docker version 17.03.0-ce
- NVIDIA-Docker version 1.0.1
- NVIDIA display driver 375.39
- Training set from Imagenet ILSVRC2012
Test job image dataset
I used the training image set from
IMAGENET Large Scale Visual Recognition Challenge 2012 (ILSVRC2012)
I only used the the training set images from the “challenge”. All 138GB of them! I used the tools in DIGITS to partition this set into a training set and validation set and then used the GoogLeNet 22-layer network.
- Training set — 960893 images
- Validation set — 320274 images
- Model — GoogLeNet
- Duration — 30 Epochs
Many of the images in the IMAGENET collection are copyrighted. This means that usage and distribution is somewhat restricted. One of the things listed in the conditions for download is this,
“You will NOT distribute the above URL(s)”
So, I wont. Please see the IMAGENET site for information on obtaining datasets.
Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014.
The NVIDIA Titan Xp is a great card for GPU accelerated machine learning workloads and offers a noticeable improvement the Titan X Pascal card that it replaces. However, for these workloads running on a workstation the GTX 1080Ti offers a much better value. There is also a compelling argument for the GTX 1070 since it is also an excellent value given the respectable performance it is capable of.
Happy computing –dbk