It's time for a "Docker with NVIDIA GPU support" update. This post will guide you through a useful Workstation setup (including User-name-spaces and performance tuning) with the new versions of Docker and the NVIDIA GPU container toolkit.
In this post I've done more testing with Ryzen 3900X looking at the effect of BLAS libraries on a simple but computationally demanding problem with Python numpy. The results may surprise you! I start with a little bit of history of Intel vs AMD performance to give you what may be a new perspective on the issue.
This is a short post showing a performance comparison with the RTX2070 Super and several GPU configurations from recent testing. The comparison is with TensorFlow running a ResNet-50 and Big-LSTM benchmark.
I was able to spend a little time with an AMD Ryzen 3900X. Of course the first thing I wanted know was the double precision floating point performance. My two favorite applications for a "first look" at a new processor are Linpack and NAMD. The Ryzen 3900X is a pretty impressive processor!
Docker is a great Workstation tool. It is mostly used for command-line application or servers but, ... What if you want to run an application in a container, AND, use an X Window GUI with it? What if you are doing development work with CUDA and are including OpenGL graphic visualization along with it? You CAN do that!
Install TensorFlow 2 beta1 (GPU) on Windows 10 and Linux with Anaconda Python (no CUDA install needed)Written on June 26, 2019 by Dr Donald Kinghorn
TensorFlow 2.0.0-beta1 is available now and ready for testing. What if you want to try it but don't want to mess with doing an NVIDIA CUDA install on your system. The official TensorFlow install documentations has you do that, but it's really not necessary.
Being able to run Jupyter Notebooks on remote systems adds tremendously to the versatility of your workflow. In this post I will show a simple way to do this by taking advantage of some nifty features of secure shell (ssh). What I'll do is mostly OS independent but I am putting an emphasis on Windows 10 since many people are not familiar with tools like ssh on that OS.
This post is a setup guide and introduction to ssh client and server on Windows 10. Microsoft has a native OpenSSH client AND server on Windows. They are standard (and in stable versions) on Windows 10 since the 1809 "October Update". This guide should helpful to both Windows and Linux users who want better interoperability.
Being able to get Docker and the NVIDIA-Docker runtime working on Ubuntu 19.04 makes this new and (currently) mostly unsupported Linux distribution a lot more useful. In this post I'll go through the steps that I used to get everything working nicely.
This post is the needed update to a post I wrote nearly a year ago (June 2018) with essentially the same title. This time I have presented more details in an effort to prevent many of the "gotchas" that some people had with the old guide. This is a detailed guide for getting the latest TensorFlow working with GPU acceleration without needing to do a CUDA install.
Ubuntu 19.04 will be released soon so I decided to see if CUDA 10.1 could be installed on it. Yes, it can and it seems to work fine. In this post I walk through the install and show that docker and nvidia-docker also work. I ran TensorFlow 2.0- alpha on Ubuntu 19.04 beta.
TensorFlow Performance with 1-4 GPUs -- RTX Titan, 2080Ti, 2080, 2070, GTX 1660Ti, 1070, 1080Ti, and Titan VWritten on March 14, 2019 by Dr Donald Kinghorn
I have updated my TensorFlow performance testing. This post contains up-to-date versions of all of my testing software and includes results for 1 to 4 RTX and GTX GPU's. It gives a good comparative overview of most of the GPU's that are useful in a workstation intended for machine learning and AI development work.
There are 2 recent Intel processors that are really strange, the Xeon W-3175X 28-core, and the Core i9 9990XE overclocked 14-core. I was able to get a little time in on the these processors. I ran a couple of numerical compute performance tests with the Intel MKL Linpack benchmark and NAMD. I used the same system image that I had used recently to look at 3 Intel 8-core processors so I will include those results here as well. **There will be results for W-3175, 9990XE, 9800X, W-2145, and 9900K**.
RTX Titan TensorFlow performance with 1-2 GPUs (Comparison with GTX 1080Ti, RTX 2070, 2080, 2080Ti, and Titan V)Written on January 30, 2019 by Dr Donald Kinghorn
I've done some testing with 2 NVIDIA RTX Titan GPU's running machine learning jobs with TensorFlow. The RTX Titan is a great card but there is good news and bad news.
In this post I'll take a brief look at the numerical computing performance of three very capable 8-core processors -- i9 9900K, i9 9800X and Xeon 2145W All three are great CPU's but there are some significant differences that can cause confusion. I'll discuss these differences and see how the processors stack up when running Linpack and NAMD molecular dynamics simulations.
There has been some concern about Peer-to-Peer (P2P) on the NVIDIA RTX Turing GPU's. P2P is not available over PCIe as it has been in past cards. It is available with very good performance when using NVLINK with 2 cards. I did some testing to see how the performance compared between the GTX 1080Ti and RTX 2080Ti. There were some interesting results!
In my recent testing with the AMD Threadripper 2990WX is was impressed by the CPU based performance with the molecular dynamics program NAMD. NAMD makes a good benchmark for looking at CPU/GPU performance since it requires a balance and is usually limited by CPU. After some discussions I decided it would be good to look at multi-GPU performance with NAMD on Threadripper.
I recently wrote a post about building and running AMD Threadripper 2990WX with HPL Linpack - a "How-To". Most of the time I had with the processor went into getting that to work. However, I did run a few other test jobs that I thought the 2990WX would do well with. I compared that against my personal workstation with a Xeon-W 2175. In this post I share those test runs with you. It's not thorough testing by any means but it was interesting and I was surprised a couple of times with the results.
How to Run an Optimized HPL Linpack Benchmark on AMD Ryzen Threadripper -- 2990WX 32-core PerformanceWritten on November 30, 2018 by Dr Donald Kinghorn
The AMD Ryzen Threadripper 2990WX with 32 cores is an intriguing processor. I've been asked about performance for numerical computing and decided to find out how well it would do with my favorite benchmark the "High Performance Linpack" benchmark. This is used to rank Supercomputers on the Top500 list. It is not always simple to run this test since it can require building a few libraries from source. This includes the all important BLAS library which AMD has optimized in their BLIS package. I give you a complete How-To guide for getting this running to see what the 2990WX is capable of.
RTX 2080Ti with NVLINK - TensorFlow Performance (Includes Comparison with GTX 1080Ti, RTX 2070, 2080, 2080Ti and Titan V)Written on October 26, 2018 by Dr Donald Kinghorn
More Machine Learning testing with TensorFlow on the NVIDIA RTX GPU's. This post adds dual RTX 2080 Ti with NVLINK and the RTX 2070 along with the other testing I've recently done. Performance in TensorFlow with 2 RTX 2080 Ti's is very good! Also, the NVLINK bridge with 2 RTX 2080 Ti's gives a bidirectional bandwidth of nearly 100 GB/sec!
NVLINK is one of the more interesting features of NVIDIA's new RTX GPU's. In this post I'll take a look at the performance of NVLINK between 2 RTX 2080 GPU's along with a comparison against single GPU I've recently done. The testing will be a simple look at the raw peer-to-peer data transfer performance and a couple of TensorFlow job runs with and without NVLINK.
Are the NVIDIA RTX 2080 and 2080Ti good for machine learning? Yes, they are great! The RTX 2080 Ti rivals the Titan V for performance with TensorFlow. The RTX 2080 seems to perform as well as the GTX 1080 Ti (although the RTX 2080 only has 8GB of memory). I've done some testing using **TensorFlow 1.10** built against **CUDA 10.0** running on **Ubuntu 18.04** with the **NVIDIA 410.48 driver**.
NVIDIA recently released version 10.0 of CUDA. This is an upgrade from the 9.x series and has support for the new Turing GPU architecture. This CUDA version has full support for Ubuntu 18.4 as well as 16.04 and 14.04. The CUDA 10.0 release is bundled with the new 410.x display driver for Linux which will be needed for the 20xx Turing GPU's. If you are doing development work with CUDA or running packages that require you to have the CUDA toolkit installed then you will probably want to upgrade to this. I'll go though how to do the install of CUDA 10.0 either by itself or along with an existing CUDA 9.2 install.
PyTorch for Scientific Computing - Quantum Mechanics Example Part 4) Full Code Optimizations -- 16000 times faster on a Titan V GPUWritten on September 14, 2018 by Dr Donald Kinghorn
This is the 16000 times speedup code optimizations for the scientific computing with PyTorch Quantum Mechanics example. The following quote says a lot, "The big magic is that on the Titan V GPU, with batched tensor algorithms, those million terms are all computed in the same time it would take to compute 1!!!"
PyTorch for Scientific Computing - Quantum Mechanics Example Part 3) Code Optimizations - Batched Matrix Operations, Cholesky Decomposition and InverseWritten on August 31, 2018 by Dr Donald Kinghorn
An amazing result in this testing is that "batched" code ran in constant time on the GPU. That means that doing the Cholesky decomposition on 1 million matrices took the same amount of time as it did with 10 matrices! In this post we start looking at performance optimization for the Quantum Mechanics problem/code presented in the first 2 posts. This is the start of the promise to make the code over 15,000 times faster! I still find the speedup hard to believe but it turns out little things can make a big difference.