This post is the needed update to a post I wrote nearly a year ago (June 2018) with essentially the same title. This time I have presented more details in an effort to prevent many of the “gotchas” that some people had with the old guide. This is a detailed guide for getting the latest TensorFlow working with GPU acceleration without needing to do a CUDA install.
How To Install CUDA 10.1 on Ubuntu 19.04
Ubuntu 19.04 will be released soon so I decided to see if CUDA 10.1 could be installed on it. Yes, it can and it seems to work fine. In this post I walk through the install and show that docker and nvidia-docker also work. I ran TensorFlow 2.0- alpha on Ubuntu 19.04 beta.
TensorFlow Performance with 1-4 GPUs — RTX Titan, 2080Ti, 2080, 2070, GTX 1660Ti, 1070, 1080Ti, and Titan V
I have updated my TensorFlow performance testing. This post contains up-to-date versions of all of my testing software and includes results for 1 to 4 RTX and GTX GPU’s. It gives a good comparative overview of most of the GPU’s that are useful in a workstation intended for machine learning and AI development work.
Intel Xeon W-3175X and i9 9990XE Linpack and NAMD on Ubuntu 18.04
There are 2 recent Intel processors that are really strange, the Xeon W-3175X 28-core, and the Core i9 9990XE overclocked 14-core. I was able to get a little time in on the these processors. I ran a couple of numerical compute performance tests with the Intel MKL Linpack benchmark and NAMD. I used the same system image that I had used recently to look at 3 Intel 8-core processors so I will include those results here as well. **There will be results for W-3175, 9990XE, 9800X, W-2145, and 9900K**.
RTX Titan TensorFlow performance with 1-2 GPUs (Comparison with GTX 1080Ti, RTX 2070, 2080, 2080Ti, and Titan V)
I’ve done some testing with 2 NVIDIA RTX Titan GPU’s running machine learning jobs with TensorFlow. The RTX Titan is a great card but there is good news and bad news.
Numerical Computing Performance of 3 Intel 8-core CPUs – i9 9900K vs i7 9800X vs Xeon 2145W
In this post I’ll take a brief look at the numerical computing performance of three very capable 8-core processors — i9 9900K, i9 9800X and Xeon 2145W All three are great CPU’s but there are some significant differences that can cause confusion. I’ll discuss these differences and see how the processors stack up when running Linpack and NAMD molecular dynamics simulations.
P2P peer-to-peer on NVIDIA RTX 2080Ti vs GTX 1080Ti GPUs
There has been some concern about Peer-to-Peer (P2P) on the NVIDIA RTX Turing GPU’s. P2P is not available over PCIe as it has been in past cards. It is available with very good performance when using NVLINK with 2 cards. I did some testing to see how the performance compared between the GTX 1080Ti and RTX 2080Ti. There were some interesting results!
AMD Threadripper and (1-4) NVIDIA 2080Ti and 2070 for NAMD Molecular Dynamics
In my recent testing with the AMD Threadripper 2990WX is was impressed by the CPU based performance with the molecular dynamics program NAMD. NAMD makes a good benchmark for looking at CPU/GPU performance since it requires a balance and is usually limited by CPU. After some discussions I decided it would be good to look at multi-GPU performance with NAMD on Threadripper.
AMD Threadripper 2990WX 32-core vs Intel Xeon-W 2175 14-core – Linpack NAMD and Kernel Build Time
I recently wrote a post about building and running AMD Threadripper 2990WX with HPL Linpack – a “How-To”. Most of the time I had with the processor went into getting that to work. However, I did run a few other test jobs that I thought the 2990WX would do well with. I compared that against my personal workstation with a Xeon-W 2175. In this post I share those test runs with you. It’s not thorough testing by any means but it was interesting and I was surprised a couple of times with the results.
How to Run an Optimized HPL Linpack Benchmark on AMD Ryzen Threadripper — 2990WX 32-core Performance
The AMD Ryzen Threadripper 2990WX with 32 cores is an intriguing processor. I’ve been asked about performance for numerical computing and decided to find out how well it would do with my favorite benchmark the “High Performance Linpack” benchmark. This is used to rank Supercomputers on the Top500 list. It is not always simple to run this test since it can require building a few libraries from source. This includes the all important BLAS library which AMD has optimized in their BLIS package. I give you a complete How-To guide for getting this running to see what the 2990WX is capable of.




