Table of Contents
Introduction
This post presents scientific application performance testing on the new AMD Ryzen 7950X. I am impressed! Seven applications that are heavy parallel numerical compute workloads were tested. The 7950X outperformed the Ryzen 5950X by as much as 25-40%. For some of the applications it provided nearly 50% of the performance of the much larger and more expensive Threadripper Pro 5995WX 64-core processor. That's remarkable for a $700 CPU! The Ryzen 7950X is not in the same platform class as the Tr Pro but it is a respectable, budget friendly, numerical computing processor.
Note:
The Ryzen 7950X is using the Zen4 cores and the 5950X and Tr Pro 5995WX are using Zen3. I created docker containers with applications compiled from source that were optimized for Zen3 using AMD compilers and libraries. The currently available AMD compilers (also gcc) do not have Zen4 specific optimization paths. This is significant since the Ryzen 7000 CPUs have AVX512 vector units that have not been exploited by builds used in this testing. That means that Ryzen 7000 performance will only get better once compilers with full Zen4 support are available. I will be watching out for the new compiler releases and will re-optimize the applications and write up the results.
The Motherboard used during this testing had "auto-overclocking" features enabled by default.
In other testing we found this greatly increases CPU runtime temperatures. Please see the following for more info, AMD Ryzen 7950X: Impact of Precision Boost Overdrive (PBO) on Thermals and Content Creation Performance
I redid the testing with these features disabled. About have the results were unaffected and a few were degraded. The worst was approx. 9%
Results were mixed some actually faster and some slower with out the auto-overclock. [ old-result "to" new-result ]
- HPL 972 to 928 GFLOP/s (approx 5% slower)
- HPCG 7.43 to 7.43 GFLOP/s same
- NAMD apoa1 0.166 to 0.158, f1atpase .526 to .505, stmv 2.29 to 2.18 day/ns (approx 5% faster -lower is better)
- LAMMPS rhodo 2.287 to 2.115, Cu_u3 1.443 to 1.333, lj .968 to .946 step/s (2-8% slower)
- NWChem 1245 to 1359 sec (approx 9% slower)
- OpenFOAM 126 to 124 sec (slightly faster -lower is better)
- WRF 25 to 24 min. (slightly faster -lower is better)
Test Systems
AMD Ryzen 7950X Test Platform
- CPU AMD Ryzen 9 7950X 16 Core
- CPU Cooler Noctua NH-U12A
- Motherboard Gigabyte X670E AORUS MASTER
- RAM 2x DDR5-4800 32GB (64GB total)
AMD Ryzen 5950X Test Platform
- CPU AMD Ryzen 9 5950X 16 Core
- CPU Cooler Noctua NH-U12A
- Motherboard Gigabyte X570 AORUS ULTRA
- RAM 4x DDR4-3200 16GB (64GB total)
AMD Threadripper Pro Test Platform
- CPU AMD Threadripper PRO 5995WX 64 Core
- CPU Cooler Noctua NH-U14S TR4-SP3
- Motherboard Asus Pro WS WRX80E-SAGE SE WIFI
- RAM 8x DDR4-3200 16GB ECC Reg. (128GB total)
Shared Hardware and Software
- Video Card NVIDIA GeForce RTX 3080 10GB
- Hard Drive Seagate Firecuda 530 4TB Gen4 M.2 SSD
- Ubuntu 22.04 Linux
- Docker 20.10.12
- Spack 0.19.0.dev0
- AMD AOCC 3.2.0 Compiler
- AMD AOCL 3.2 Numerical libraries
Containerized Applications (recompiled and optimized)
- HPL 2.3 High Performance Linpack
- HPCG 3.1 High Performance Conjugate Gradient solver
- NAMD 2.14 Molecular Dynamics
- LAMMPS 20220623 Molecular Dynamics
- NWChem 7.0 Quantum Chemistry
- OpenFOAM 2012 Computational Fluid Dynamics
- WRF 4.3 Weather Simulation
Application Containers Optimized for AMD
This testing is the first use of a (large) project I'm working on. I'm creating optimized application containers targeted to specific hardware platforms. AMD, Intel, and NVIDIA GPU. The applications used in this post are built from source using AMD AOCC compilers and AOCL libraries. I'm using a package build tool called "spack" for managing the build process and templates for multi-stage Dockerfiles. These containers will be available to the public on Docker Hub in the near future. All containers will have full application builds and include easy to use benchmark code. Full specification for the container builds will be hosted on GitHub. This is a long term project. I plan to maintain the container builds and automate the process as much as possible. In a few weeks I'll be writing posts on how this is being done.
Benchmarks
The following charts show the excellent numerical performance of the Ryzen 7950X in relation to the previous generation 5950X and current high-end Threadripper Pro 5995WX 64-core.
There are Spack specs listed that were used in the application build process and benchmark job execution lines from benchmark code for each package.
Note on GROMACS: I would have had gromacs benchmarks in this post but I had a mistake in my testing code. Gromacs writes result output on stderr rather than stdout. I only captured stdout, so no results, and I didn't catch the problem until analyzing the performance data.
HPL (Linpack)
specs: [[email protected]%[email protected]+openmp ^[email protected] threads=openmp arch=linux-None-zen3]
specs: [[email protected]%[email protected]+openmp ^[email protected] threads=openmp arch=linux-None-zen3]
HPCG
specs: [[email protected]%[email protected]+openmp arch=linux-None-zen3 ^[email protected]]
mpirun --allow-run-as-root -np ${NUM_CORES} --map-by l3cache --mca btl self,vader -x OMP_NUM_THREADS=1 xhpcg
NAMD
specs: [[email protected]%[email protected] fftw=amdfftw arch=linux-None-zen3 ^[email protected]]
namd2 +p${NUM_CORES} +setcpuaffinity +idlepoll ${JOB}/${JOB}.namd
LAMMPS
lammps@20220623%[email protected]+asphere+class2+granular~kim+kspace+manybody+molecule+mpiio+openmp+openmp-package+opt+replica+rigid build_type=Release arch=linux-None-zen3 ^[email protected] ^openmpi fabrics=auto
mpirun --allow-run-as-root -np ${NUM_CORES} --oversubscribe --use-hwthread-cpus --map-by hwthread --bind-to core lmp -var x ${SCALE} -var y ${SCALE} -var z ${SCALE} -sf omp -in in.${JOB}
NWChem
specs: [[email protected]%[email protected]+mpipr+openmp arch=linux-None-zen3 ^[email protected] threads=openmp ^[email protected] ^[email protected] ^[email protected] ^openmpi]
mpirun -np ${NUM_CORES} --map-by l3cache -x KMP_WARNINGS=0 -x OMP_NUM_THREADS=1 -x OMP_STACKSIZE="32M" nwchem ./c240_631gs.nw
OpenFOAM
specs: [openfoam@2012%[email protected] arch=linux-None-zen3 ^[email protected] ^openmpi]
mpirun --allow-run-as-root -np ${NUM_CORES} --map-by core simpleFoam -parallel
WRF
specs: [[email protected]%[email protected] build_type=dm+sm arch=linux-None-zen3 ^hdf5+fortran ^jemalloc ^netcdf-c ^netcdf-fortran ^openmpi+cxx]
OMP_NUM_THREADS=${NUM_OMP} mpirun -np ${NUM_MPI} --allow-run-as-root --map-by ppr:4:l3cache $wrf_exe
Conclusions
- Performance for numerical computing of the Ryzen 7950X coupled with the fast DDR5 memory sits 25-40% above it's predecessor the 5950X. It even makes a good showing against the current top of the line Threadripper Pro 5995WX 64 core processor.
- The Ryzen 7950X is the first CPU with AMDs new Zen4 architecture. That architecture includes AVX512 vector units that were not exploited in this testing. Recompiling the applications with compilers supporting Zen4 should only improve the results that were presented, maybe significantly.
- The utility of the performance optimized container project I'm working on was quite satisfying. Compiling the tested application from source with AMD specific optimization using AMD AOCC and AOCL gave a nice performance improvement over "standard" builds of these applications. I didn't compare this improvement in this post but will write more about that in future posts.
Happy computing! –dbk @dbkinghorn
Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.
Related Content
Why Choose Puget Systems?
Built Specifically for You
Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.
We’re Here, Give Us a Call!
We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!
Fast Build Times
By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry-leading ship time.
Lifetime Labor & Tech Support
Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.
Click here for even more reasons!