Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1540
Dr Donald Kinghorn (Scientific Computing Advisor )

AMD 3900X (Brief) Compute Performance Linpack and NAMD

Written on July 26, 2019 by Dr Donald Kinghorn
Share:

Introduction

I was able to spend a little time with an AMD Ryzen 3900X. Of course the first thing I wanted know was the double precision floating point performance. My two favorite applications for a "first look" at a new processor are Linpack and NAMD.

I expect to get some more time with this great new processors from AMD in the coming weeks and will likely do at least one more post on numerical compute performance with the Ryzen 3900X. You can take this post as, a hopefully informative, teaser.

I would like to start by saying my first impressions of the 3900X are quite positive. I didn't have any difficulties during testing. Performance was good and the subjective "feel" of the system was that is was quite "snappy". I had hoped (expected) to get a little better performance with Linpack than what I think this may be partly because the architecture is new and the libraries needed for performance have not yet been optimized for Zen2.

System Configuration

Hardware:

  • AMD Ryzen 3900X
  • Motherboard Gigabyte X570 AORUS ULTRA
  • Memory 4x DDR4-3200 16GB (64GB total)
  • 2TB Intel 660p NVMe M.2
  • NVIDIA 1660 GPU

Software:

Linpack

Notes:

  • gcc9.1 was used for code compilation in order to use the newly added -march=znver2 for Zen2 optimizations. However, the Zen2 support is incomplete in gcc9.1. Full Zen2 optimizations are not expected until the release of gcc10.0. I don't know the status of support in LLVM or the status of AMD AOCC compiler.
  • AMD BLIS (a.k.a. AMD's BLAS library) has not been updated with specific support for Zen2. I compiled it with gcc9.1 but did not see any performance improvements.
  • I tried several combinations with MPI ranks together with OMP threads. The best results obtained were with using only OMP threads and the pre-built binary of the multi-threaded (OMP) BLIS without MPI. **1 OMP thread per "real" core i.e. 12 OMP processes gave the best result.**
  • I did not test with any other BLAS libraries (OpenBLAS).

I have a detailed description of HPL Linpack with instructions for building it using AMD BLIS in the post, How to Run an Optimized HPL Linpack Benchmark on AMD Ryzen Threadripper -- 2990WX 32-core Performance

  • - I used a large problem size approx. 90% of available memory (64GB) in order to maximize performance results, Ns=85200.

Here is the HPL.dat file used,

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
85200        Ns
1            # of NBs
240 	     NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1	     Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
2            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
2            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
1            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

The following environment variables were set for the Ryzen 3900X Linpack run

export OMP_PROC_BIND=TRUE
export OMP_PLACES=cores
export OMP_NUM_THREADS=12

Now what you have been waiting for ...

The following plot shows HPL Linpack results (in GFLOPS) for the Ryzen 3900X and a few other CPU's that I have recently tested.

Ryzen 3900X Linpack

Even though the Ryzen 3900X is at the bottom of this list it is a pretty good result! It is essentially the same as the Intel i9 9900K and is the same cost as that CPU, all of the others are much more expensive.

The Intel processors with AVX-512 vector units have a big advantage for Linpack. Also,the Linpack used for the Intel processors is built with the BLAS library from Intel's excellent MKL (Math Kernel Library). The Ryzen 3900X was built with AMD's current v1.3 BLIS library (BLAS) which has not yet been optimized for Zen2. I expect performance for this type of workload to improve for Zen2 processors once better optimized libraries and compiler support is available.

NAMD

Now on to the real world! ... sort of ... NAMD is one of my favorite programs to use for benchmarking because it has great parallel scaling across cores (and cluster nodes). It does not significantly benefit from linking with the Intel MKL library and it runs on a wide variety of hardware and OS platforms. It's also a very important Molecular Dynamics research program.

When is said "sort of" above I'm referring to the fact that NAMD also has very good GPU acceleration. Adding CUDA capable GPU's will increase throughput by an order of magnitude. However, with NAMD and other codes like it, only some of the heavy compute can be offloaded to GPU. A good CPU is necessary to achieved balanced performance. I like NAMD as a CPU benchmark because I believe it is an excellent representative of scientific applications and reflects performance characteristic of many other programs in this domain.

This plot show the performance of a molecular dynamics simulation on the million atom "stmv" ( satellite tobacco mosaic virus ). These job runs are with CPU only. Performance is in "day/ns" (days to compute a nano second of simulation time).

NAMD Ryzen 3900X

With these results the excellent performance of the Ryzen 3900X is more apparent. It gives significantly better performance than the Intel 9900K and rivals much more expensive processors.

Conclusion (and a couple of caveats)

From the brief time that I spent with the Ryzen 3900X my impression is that it's a very good processor and is an excellent value. I expect that performance for heavy compute applications will improve when the development tools catch up to the new architecture. My guess is that by the end of the year there will be better optimized libraries and full compiler support for Zen2. This will be needed for the next gen Threadripper and Epyc processors too. It is an exciting platform. I am curious about performance with the new PCIe v4 BUS, etc..

Caveats:

I have a few reservations that are keeping me from making a strong recommendation (at this time). The CPU seems great but my main concern is the platform as a whole. Several of us at Puget Systems have been testing this new hardware and we have encounter bugs/quirks etc. This is expected for a new platform but it is still concerning. I think we need to see BIOS updates and perhaps revision updates on motherboards before we are comfortable with stability. There are a some reported issues with Linux too, although I did not have any trouble with Ubuntu 18.04.

I will be doing more testing with the 3900X next week so expect another post. I know a lot of people are excited about the new Ryzen Zen2 processors and anxious to see more results ... myself included.

It's nice to see AMD back in the game!

Happy computing! --dbk @dbkinghorn


Looking for a GPU Accelerated Workstation?

Puget Systems offers a range of workstations that are tailor-made for your unique workflow. Our goal is to provide the most effective and reliable system possible so you can concentrate on your work and not worry about your computer.

Configure a System!

Why Choose Puget Systems?


Built specifically for you

Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.

Fast Build Times

By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry leading ship time of 7-10 business days on nearly all our system orders.

We're Here, Give Us a Call!

We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!

Lifetime Support/Labor Warranty

Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.

Click here for even more reasons!

Puget Systems Hardware Partners

Tags: AMD, HPL, linpack, NAMD, Ryzen
English(sorryForMyBadHi)

very good article thanks. I was expecting a bit more linpack score for 3900X as well but still this CPU is unmatched in the value department.

i was thinking double floating point unit + IPC gains (larger caches, better branch predictor etc etc) + higher clocks thanks to TSMC 7nm should allow the 12 core to match or exceed the 32 core 2990WX.

Still hopeful for 16 core ryzen to beat 2990wx

Posted on 2019-07-27 06:02:45
Donald Kinghorn

Thank you! I had hoped that AMD would have used AVX512 on Zen2 ... not yet, maybe on the next one. Intel has performance nailed with MKL + AVX512

I'm looking forward to testing on the 16 core ... and especially on the new TR!

Posted on 2019-09-06 16:31:50

As always thank you for the review, Dr. Kinghorn. I especially appriciate and value your recommendation note regarding bugs and quirks. With that my heart goes out to the brave early adopters.

Posted on 2019-07-28 23:07:29
Donald Kinghorn

Thanks Kacey sorry I'm late with replies ... There definitely has been bumps-in-the-road on the early platform

Posted on 2019-09-06 16:28:24
Methylzero

FYI, if you do decide to test OpenBLAS, make sure to grab the latest git revision. As of now, several new assembly optimizations for Zen/Zen2 have been merged very recently. Single core perf is looking quite nice, parallel scaling less so, but that is sometimes problematic even on Intel.

Posted on 2019-07-31 18:56:02
Donald Kinghorn

Thank you ... I will do that ... I want to include OpenBLAS I hope they get the AVX512 code working again too!

Posted on 2019-09-06 16:25:07
Reuven Meir

Hi - very nice! It would be VERY helpful if you could add HPCG to your benchmarking as it is a memory-bound code whose underlying algorithms are used widely in the physical sciences. (http://www.hpcg-benchmark.o.... Often CPUs with great FLOP performance can have very low memory-bound algorithm performance and vice-versa.

Posted on 2019-08-01 20:56:44
Donald Kinghorn

HPCG would be nice! I've wanted to do it for some time ... should probably get on with that :-) I agree with you

Posted on 2019-09-06 16:22:42
solaric

Zen2 server (Rome) is out and wow, what a show by AMD. I sure hope Puget will be taking a hard look at creating some EPYC options for Genesis/Peak/Summit, because I don't think it'll be possible to justify going with Intel for a high end system until at least Ice Lake sometime in 2020 (if they hit that), and even then only if they cut it out with some of their BS about fusing off features for artificial segmentation.

Posted on 2019-08-08 17:46:06

Hopefully availability and choice is better with the new EPYC platforms. We had a really, really hard time getting our hands on anything that wasn't geared towards pure server or storage rackmounts. There was apparently no interest from motherboard manufacturers to really invest in EPYC for workstations - I guess they view Threadripper as the platform for that kind of thing? It looks like things have gotten better, but there is still a really big lack of choice out there. Having 2 USB ports is fine on a server, but on a workstation?

Hopefully the performance of these new chips prompts some better options from Gigabyte, Asus, Supermicro, etc. The CPU performance looks amazing, but it the boards are terrible...

Posted on 2019-08-08 18:06:27
Jeff

Looks like AMD BLIS 2.0 is out. It may be worthwhile to re-run some of these tests as mentioned in the article.

Posted on 2019-08-09 03:50:02
Donald Kinghorn

I will definitely be redoing this! Really hoping for early samples of the new TR :-) I will do the work to get the best performance I can out of these.

System adoption looks promising but we have to get a better over-all platform together before we can commit seriously to it. (Like Matt said, the CPU's look great, but we need a really solid platform...) I expect that to happen.

Posted on 2019-09-06 16:22:39
Tim Artz

I do software development on FEA programs that use the MKL library for solving. I am hoping to get the 3950x and around that price is what I'm looking at for my next compute machine. Do you have any recommendations or data on what might make the best processor in that price range?

Posted on 2019-09-21 00:43:11
Donald Kinghorn

With 16 cores the 3950X could be formidable. For what you are doing Intel Core-X with AVX512 will be hard to beat though. MKL can be a significant advantage on Intel ... but for the money, $750!, that 16 core 3950x could win-out from the better cost per core. AXV2-256 actually runs at a higher clock than AVX512 too ...( AVX512 clocks down the cores when it's active so it's not twice as fast as AVX2-256 even in the best case) It's likely that your code will scale well on 16 cores ....

It's hard to directly answer your question! It will depend to a large extent on what Intel does with pricing on the new Core-X line coming up. They may be very aggressive with price cuts to fend off AMD. A new 12 core Core-X could do better than 16 core 3950x for this kind of work ?? (not sure) I just don't know what 10th gen core-X price and performance will be like ...yet ...

Posted on 2019-09-23 16:17:49
Tim Artz

That's about in line with what I was expecting, with some added great info. Thanks! I'll have to wait and see how the new pricing shakes out with some numbers. I appreciate your work on this!

Posted on 2019-09-23 16:26:08