Table of Contents
- Test systems: AMD 2990WX and Intel Xeon-W 2175
- Linux kernel build time
- How do you know if the programs you want to run will do well on the high core count AMD Threadripper processors?
When I did my recent AMD Threadripper 2990WX HPL Linpack “How-To” most of the time I had with the processor went into getting that to work. However, I did run a few other test jobs that I thought the 2990WX would do well with. I compared that against my personal workstation with a Xeon-W 2175. In this post I share those test runs with you. It’s not thorough testing by any means but it was interesting and I was surprised a couple of times with the results.
The post How to Run an Optimized HPL Linpack Benchmark on AMD Ryzen Threadripper — 2990WX 32-core Performance is a good read for a high performance compute perspective on Threadripper.
Note: I have results for the NAMD jobs runs including an NVIDIA Titan V. NAMD has good GPU acceleration but needs a lot of CPU performance to balance that.
Test systems: AMD 2990WX and Intel Xeon-W 2175
The AMD Threadripper system I used was a test-bed build with the following main components,
- AMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (4.2GHz Turbo)
- Gigabyte X399 AORUS XTREME-CF Motherboard
- 128GB DDR4 2666 MHz memory
- Samsung 970 PRO 512GB M.2 SSD
- NVIDIA Titan V GPU
The Intel system is my personal workstation
- Puget Systems Peak Single
- Intel Xeon-W 2175 14-core @ 2.5GHz (4.3GHz turbo)
- ASUS C422 Pro SE (My sys board, the Peak Single uses the very nice ASUS WS C422 SAGE/10G )
- 128GB DDR4 2400 MHz Reg ECC memory
- Samsung 960 EVO 1TB NVMe M.2
- NVIDIA Titan V GPU
Intel MKL for Linpack benchmark on Xeon-W 2175
NAMD, NAMD_2.13_Linux-x86_64-multicore and NAMD_2.13_Linux-x86_64-multicore-CUDA
Docker 18.06.1-ce see, How-To Setup NVIDIA Docker and NGC Registry on your Workstation – Part 5 Docker Performance and Resource Tuning and links therein for my normal docker setup.
I’ll start with the Linpack benchmark that I went to great pains to optimize and compile for AMD Threadripper. For the Intel benchmark I’ll just use the omp threaded binary included with Intel MKL.
Linpack is my favorite benchmark for CPU performance because it exposes near maximium processor double precision floating point numerical performance. It is a good measure of how well optimized (vectorized) numerical linear algebra, matrix/vector algorithms will perform. That is basis for the majority of scientific high performance computing software.
The Intel “X-series” and Xeon -W or -SP are much faster for this kind of intense compute workload. Here’s why,
- AMD Threadripper 2990WX — 32-cores, each core has 1 AVX2 (256bit) vector units == 597 GFLOPS
- Intel Xeon-W 2175 — 14-cores, each core has 2 AVX512 (512bit) vector units == 838 GFLOPS
The Intel AVX512 vector units provide great performance for well optimized code … but keep in mind not all code is well optimized (or at least not well vectorized).
NAMD is a molecular dynamics program. It has very good parallel performance. It has good parallel scaling on systems ranging from multi-core workstations to the largest supercomputers. It also has very good GPU acceleration. GPU’s greatly improve performance but NAMD has a significant portion of code that needs to run on CPU. It is important to get a balance between CPU and GPU for best hardware utilization.
NAMD ran really well on the 2990WX!
Note that the Threadripper 2990WX was nearly twice as fast as the Xeon-W 2175 for the CPU only runs. With the addition of the Titan V the performance was much closer for both processors. This may indicate that the 2990WX would do better with more GPU’s. I don’t know that for sure because I didn’t test with multi-GPU’s. Balance is important with NAMD and you need a lot of CPU performance to keep up with newer NVIDIA GPU’s. It would be interesting to test the 2990WX with 2 x 2080Ti’s or perhaps 4 x 2070’s. If there is interest I may see if I can do that testing.
Linux kernel build time
I ran this benchmark from the Phoronix test suite. I wanted to see the non-floating point performance on the 2990WX and compiling a large code base is an important application. Compiling a software package consisting of a large number of source files can often be done with a lot of parallelism. I expected that the Threadripper 2990WX with it’s 32-cores would be a good processor for this and, indeed, it did quite well.
That’s all of the performance testing results I have for the Threadripper 2990WX. I still feel that the Intel X-series and Xeon -W (and -SP) are “generally” better processors for HPC workloads. But, note that I said generally! I’m thinking about code that I would compile and optimize myself! I would make an effort to get good vectorization and for that the Intel AVX512 vector units are great.
How do you know if the programs you want to run will do well on the high core count AMD Threadripper processors?
Here are a few considerations;
Try to find trusted performance evaluations for the software you are going to use (with similar types of job runs). This can be tricky because despite Google’s great power some stuff just doesn’t get posted online (or it’s just very poor quality). You could try asking for advise in user or developer forums for recommendations. You could also try bugging me to do it but you might not succeed with that.
Understand the parallel scaling performance of your code. Try running it on your current system with from 1 to the maximum number of processors you have. If it doesn’t scale close to linear then having a large number of cores is likely not going to help. Keep in mind that scaling may fall off rapidly for more than 4-8 cores. Think Amdhal’s Law!
If you know that your code or job run workflow scales nearly perfectly because it’s “embarrassingly parallel” or you just have lots of jobs to run simultaneously then a high core count processor may be great. Keep in mind that you could run into memory contentions or space limitations or I/O limitations.
If you can, try to see if your program has good vectorization or not. You may be able to turn off AVX in the BIOS. If you are building from source try compiling with
noavx(or similar for your compiler). If you don’t see much performance change with AVX disabled then those great AVX512 vector units on the Intel processors won’t be doing you much good. If you cannot fix that then maybe you are best off just considering lost of cores.
I enjoyed the Threadripper 2990WX. Having 32-cores to play with makes you think differently. It’s like a “single node cluster” … kind of … really it felt a lot like Quad socket system that would have cost $25000 a few years ago!
Happy computing –dbk