Docker is a great Workstation tool. It is mostly used for command-line application or servers but, … What if you want to run an application in a container, AND, use an X Window GUI with it? What if you are doing development work with CUDA and are including OpenGL graphic visualization along with it? You CAN do that!
TensorFlow 2.0.0-beta1 is available now and ready for testing. What if you want to try it but don’t want to mess with doing an NVIDIA CUDA install on your system. The official TensorFlow install documentations has you do that, but it’s really not necessary.
Ubuntu 19.04 will be released soon so I decided to see if CUDA 10.1 could be installed on it. Yes, it can and it seems to work fine. In this post I walk through the install and show that docker and nvidia-docker also work. I ran TensorFlow 2.0- alpha on Ubuntu 19.04 beta.
There has been some concern about Peer-to-Peer (P2P) on the NVIDIA RTX Turing GPU’s. P2P is not available over PCIe as it has been in past cards. It is available with very good performance when using NVLINK with 2 cards. I did some testing to see how the performance compared between the GTX 1080Ti and RTX 2080Ti. There were some interesting results!
More Machine Learning testing with TensorFlow on the NVIDIA RTX GPU’s. This post adds dual RTX 2080 Ti with NVLINK and the RTX 2070 along with the other testing I’ve recently done. Performance in TensorFlow with 2 RTX 2080 Ti’s is very good! Also, the NVLINK bridge with 2 RTX 2080 Ti’s gives a bidirectional bandwidth of nearly 100 GB/sec!
NVLINK is one of the more interesting features of NVIDIA’s new RTX GPU’s. In this post I’ll take a look at the performance of NVLINK between 2 RTX 2080 GPU’s along with a comparison against single GPU I’ve recently done. The testing will be a simple look at the raw peer-to-peer data transfer performance and a couple of TensorFlow job runs with and without NVLINK.
Are the NVIDIA RTX 2080 and 2080Ti good for machine learning?
Yes, they are great! The RTX 2080 Ti rivals the Titan V for performance with TensorFlow. The RTX 2080 seems to perform as well as the GTX 1080 Ti (although the RTX 2080 only has 8GB of memory). I’ve done some testing using **TensorFlow 1.10** built against **CUDA 10.0** running on **Ubuntu 18.04** with the **NVIDIA 410.48 driver**.
NVIDIA recently released version 10.0 of CUDA. This is an upgrade from the 9.x series and has support for the new Turing GPU architecture. This CUDA version has full support for Ubuntu 18.4 as well as 16.04 and 14.04. The CUDA 10.0 release is bundled with the new 410.x display driver for Linux which will be needed for the 20xx Turing GPU’s. If you are doing development work with CUDA or running packages that require you to have the CUDA toolkit installed then you will probably want to upgrade to this. I’ll go though how to do the install of CUDA 10.0 either by itself or along with an existing CUDA 9.2 install.
This is the 16000 times speedup code optimizations for the scientific computing with PyTorch Quantum Mechanics example. The following quote says a lot,
“The big magic is that on the Titan V GPU, with batched tensor algorithms, those million terms are all computed in the same time it would take to compute 1!!!”
An amazing result in this testing is that “batched” code ran in constant time on the GPU. That means that doing the Cholesky decomposition on 1 million matrices took the same amount of time as it did with 10 matrices!
In this post we start looking at performance optimization for the Quantum Mechanics problem/code presented in the first 2 posts. This is the start of the promise to make the code over 15,000 times faster! I still find the speedup hard to believe but it turns out little things can make a big difference.