AMD 3900X (Brief) Compute Performance Linpack and NAMD

Table of Contents

Introduction

I was able to spend a little time with an AMD Ryzen 3900X. Of course the first thing I wanted know was the double precision floating point performance. My two favorite applications for a "first look" at a new processor are Linpack and NAMD.

I expect to get some more time with this great new processors from AMD in the coming weeks and will likely do at least one more post on numerical compute performance with the Ryzen 3900X. You can take this post as, a hopefully informative, teaser.

I would like to start by saying my first impressions of the 3900X are quite positive. I didn't have any difficulties during testing. Performance was good and the subjective "feel" of the system was that is was quite "snappy". I had hoped (expected) to get a little better performance with Linpack than what I think this may be partly because the architecture is new and the libraries needed for performance have not yet been optimized for Zen2.

System Configuration

Hardware:

AMD Ryzen 3900X
Motherboard Gigabyte X570 AORUS ULTRA
Memory 4x DDR4-3200 16GB (64GB total)
2TB Intel 660p NVMe M.2
NVIDIA 1660 GPU

Software:

Ubuntu 18.04
Compiler gcc 9.1
AMD BLIS library v 1.3
HPL Linpack 2.2
OpenMPI 3.1.4 (compiled with gcc 9.1)
NAMD 2.13 (Molecular Dynamics)

Linpack

Notes:

gcc9.1 was used for code compilation in order to use the newly added -march=znver2 for Zen2 optimizations. However, the Zen2 support is incomplete in gcc9.1. Full Zen2 optimizations are not expected until the release of gcc10.0. I don't know the status of support in LLVM or the status of AMD AOCC compiler.
AMD BLIS (a.k.a. AMD's BLAS library) has not been updated with specific support for Zen2. I compiled it with gcc9.1 but did not see any performance improvements.
I tried several combinations with MPI ranks together with OMP threads. The best results obtained were with using only OMP threads and the pre-built binary of the multi-threaded (OMP) BLIS without MPI. **1 OMP thread per "real" core i.e. 12 OMP processes gave the best result.**
I did not test with any other BLAS libraries (OpenBLAS).

I have a detailed description of HPL Linpack with instructions for building it using AMD BLIS in the post, How to Run an Optimized HPL Linpack Benchmark on AMD Ryzen Threadripper — 2990WX 32-core Performance

– I used a large problem size approx. 90% of available memory (64GB) in order to maximize performance results, Ns=85200.

Here is the HPL.dat file used,

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
85200        Ns
1            # of NBs
240 	     NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1	     Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
2            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
2            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
1            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

The following environment variables were set for the Ryzen 3900X Linpack run

export OMP_PROC_BIND=TRUE
export OMP_PLACES=cores
export OMP_NUM_THREADS=12

Now what you have been waiting for …

The following plot shows HPL Linpack results (in GFLOPS) for the Ryzen 3900X and a few other CPU's that I have recently tested.

Even though the Ryzen 3900X is at the bottom of this list it is a pretty good result! It is essentially the same as the Intel i9 9900K and is the same cost as that CPU, all of the others are much more expensive.

The Intel processors with AVX-512 vector units have a big advantage for Linpack. Also,the Linpack used for the Intel processors is built with the BLAS library from Intel's excellent MKL (Math Kernel Library). The Ryzen 3900X was built with AMD's current v1.3 BLIS library (BLAS) which has not yet been optimized for Zen2. I expect performance for this type of workload to improve for Zen2 processors once better optimized libraries and compiler support is available.

NAMD

Now on to the real world! … sort of … NAMD is one of my favorite programs to use for benchmarking because it has great parallel scaling across cores (and cluster nodes). It does not significantly benefit from linking with the Intel MKL library and it runs on a wide variety of hardware and OS platforms. It's also a very important Molecular Dynamics research program.

When is said "sort of" above I'm referring to the fact that NAMD also has very good GPU acceleration. Adding CUDA capable GPU's will increase throughput by an order of magnitude. However, with NAMD and other codes like it, only some of the heavy compute can be offloaded to GPU. A good CPU is necessary to achieved balanced performance. I like NAMD as a CPU benchmark because I believe it is an excellent representative of scientific applications and reflects performance characteristic of many other programs in this domain.

This plot show the performance of a molecular dynamics simulation on the million atom "stmv" ( satellite tobacco mosaic virus ). These job runs are with CPU only. Performance is in "day/ns" (days to compute a nano second of simulation time).

With these results the excellent performance of the Ryzen 3900X is more apparent. It gives significantly better performance than the Intel 9900K and rivals much more expensive processors.

Conclusion (and a couple of caveats)

From the brief time that I spent with the Ryzen 3900X my impression is that it's a very good processor and is an excellent value. I expect that performance for heavy compute applications will improve when the development tools catch up to the new architecture. My guess is that by the end of the year there will be better optimized libraries and full compiler support for Zen2. This will be needed for the next gen Threadripper and Epyc processors too. It is an exciting platform. I am curious about performance with the new PCIe v4 BUS, etc..

Caveats:

I have a few reservations that are keeping me from making a strong recommendation (at this time). The CPU seems great but my main concern is the platform as a whole. Several of us at Puget Systems have been testing this new hardware and we have encounter bugs/quirks etc. This is expected for a new platform but it is still concerning. I think we need to see BIOS updates and perhaps revision updates on motherboards before we are comfortable with stability. There are a some reported issues with Linux too, although I did not have any trouble with Ubuntu 18.04.

I will be doing more testing with the 3900X next week so expect another post. I know a lot of people are excited about the new Ryzen Zen2 processors and anxious to see more results … myself included.

It's nice to see AMD back in the game!

Happy computing! –dbk @dbkinghorn