Puget Systems print logo
Read this article at https://www.pugetsystems.com/guides/1666
Dr Donald Kinghorn (Scientific Computing Advisor )

AMD Threadripper 3990x 64-core Linpack and NAMD Performance (Linux)

Written on February 7, 2020 by Dr Donald Kinghorn


64 cores! The latest AMD Threadripper is out, the 3990x 64-core. I've spent the last couple of days running benchmarks and have some results showing raw numerical compute performance using my standard CPU testing applications HPL Linpack and the molecular dynamics program NAMD. The 3990x is a great processor, however, there were difficulties and some disappointments during the testing.

It is nice having AMD making exceptionally great processors again. This 64-core Threadripper 3990x is the pinnacle of the "consumer" Zen2 core processors. (EPYC Rome is the server line based on Zen2 core)


Version 2.0 of the AMD "BLIS" library was used which gives very good performance with Linpack. I did have some scaling anomalies with the 3990x that I have not resolved yet (but still achieved very good results).

This post revisits the recent Ryzen and Threadripper posts and adds in new results for the Threadripper 3990x I'm including NAMD Molecular Dynamics results for my usual test molecule, STMV as well as a smaller molecular system, ApoA1. ApoA1 seems to be a popular system for benchmarking on CPU with NAMD. GPU acceleration results are reported for the STMV and ApoA1 job runs.

Other recent posts related to this testing are; "AMD Threadripper 3970x Compute Performance Linpack and NAMD", "AMD Ryzen 3950x Compute Performance Linpack and NAMD" and "AMD 3900X (Brief) Compute Performance Linpack and NAMD".


I'll start with some of the problems I encountered during the testing that prompted me to remark that these "results are preliminary". This should temper the results that follow. There is room for improvement!

Install issues

  • Ubuntu with any kernel newer than 5.0.0 would hang during install (on the hardware I was using).
  • Ubuntu 18.04 with HWE kernel would boot but would hang after update
  • Ubuntu 19.10 would hang during install
  • I had to drop back to Ubuntu 18.04 with 4.15 kernel for a stable install. That is too old to be fully "Zen2 aware".

I expect this to be a better platform using the finial release of Ubuntu 20.04 in April.

HPL Linpack anomalies

  • HPL Linpack did not achieve expected performance based on comparison with 32-core 3970x
  • Performance was better with 3990x than 3970x but I could drop 16 cores from the 3990x with only minimally lower performance.

I did experiments with openMP threads and hybrid parallelism with openMP threads and MPI ranks. Results are approximately 25% lower than expected.

NAMD performance was very good but I can't help but think results could be better based on what I saw with Linpack.

System Configuration


(see the posts linked in the Introduction for older test configurations)

  • AMD Threadripper 3990x
  • Motherboard Gigabyte TRX40 AORUS
  • Memory 8x DDR4-2933 16GB (128GB total)
  • 1TB Samsung 960 EVO M.2
  • 2 x NVIDIA RTX Titan GPU's



  • The Ryzen 3900x and 3950x worked well on Ubuntu 19.10. Both the Threadripper 3970x and 3990x required dropping back to 18.04.

  • New results in this post are for Threadripper 3990x only. The other results are from previous testing.



  • I'm using the same HPL binary that was used for testing the 3970x i.e. the pre-built muit-threaded HPL binary provided by AMD. This is the "MT" build but it still looks for MPI header files on start-up and uses the HPL.dat file for job run configuration. This is why an OpenMPI install is needed to run this benchmark.
  • AMD BLIS (a.k.a. AMD's BLAS library) version 2.0 with specific support for Zen2 was used.
  • Several combinations with MPI ranks together with OMP threads were tried. The best results obtained were using only OMP threads and the pre-built binary without MPI. 1 OMP thread per "real" core gave the best result. (WITH SMT DISABLED IN THE BIOS)

  • There is a detailed description of HPL Linpack testing for Threadripper 2990WX in the post, How to Run an Optimized HPL Linpack Benchmark on AMD Ryzen Threadripper -- 2990WX 32-core Performance The 2990WX testing in this post and the result presented could probably be improved with the new BLIS lib.
  • The Intel CPU's were tested with the (highly) optimized Linpack benchmark program included with Intel MKL performance library.
  • A large problem size approx. 90% of available memory (128GB) was used in order to maximize performance results, Ns=116000.
T/V                N    NB     P     Q               Time                 Gflops
WR12R2R4      116000  1024     1     1             662.21             1.5714e+03
HPL_pdgesv() start time Thu Feb  6 10:50:10 2020

HPL_pdgesv() end time   Thu Feb  6 11:01:12 2020

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.80355268e-03 ...... PASSED

Here is an HPL.dat file used, [this file automates using 3 problems sizes (Ns) and 3 Block sizes (NBs), also note that P and Q are set to 1 i.e. 1 MPI Rank, parallelism was from OMP threads]

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
3            # of problems sizes (N)
112000 114000 116000 Ns
5            # of NBs
512 640 768 896 1024  NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1            Qs

The following environment variables were set for the Ryzen Linpack runs

export OMP_PLACES=cores
export OMP_NUM_THREADS=32   (16 for 3950x ...)

The AMD Threadripper 3990x results are not as high as expected and can likely be improved.

The following plot shows HPL Linpack results (in GFLOPS).

TR3990X Linpack

The TR3990x results are impressive for a processors with AVX2 (rather than AVX512) but I expect that these results could be better. (see "issues" section) I will repeat this testing after Ubuntu 20.04 is released.

The Intel processors with AVX-512 vector units have an advantage for Linpack. Also,the Linpack used for the Intel processors is built with the BLAS library from Intel's excellent MKL (Math Kernel Library).


NAMD is one of my favorite programs to use for benchmarking because it has great parallel scaling across cores (and cluster nodes). It does not significantly benefit from linking with the Intel MKL library and it runs on a wide variety of hardware and OS platforms. It's also a very important Molecular Dynamics research program.

NAMD also has very good GPU acceleration. Adding CUDA capable GPU's will increase throughput significantly. However,with NAMD and other codes like it, only a portion of the heavy compute can be offloaded to GPU. A good CPU is necessary to achieved balanced performance.

This plot shows the performance of a molecular dynamics simulation on the million atom "stmv" ( satellite tobacco mosaic virus ). These job runs are with CPU and with 1 or 2 RTX Titan GPU's added. Performance is in "day/ns" (days to compute a nano second of simulation time) This is the standard output for NAMD. If you prefer ns/day then just take the reciprocal.


The Threadripper 3990x gave excellent performance for NAMD. Results are exceptionally good for CPU alone and with added GPU's. These are the best results I have ever obtained for these job runs.

This last set of results is using the smaller ApoA1 problem (it's still pretty big with 92000 atoms!) These results are CPU only.

*Results with added RTX Titan are 0.031 day/ns and 2 x RTX Titan 0.020 day/ns. These results are similar to those with the 3970x and 1 or 2 2080Ti GPU's. The job runs are so fast that there is little difference because of communication dominating the calculation time.

TR 3990x  NAMD ApoA1

I expected the TR3990x 64-core CPU together with 2-4 high-end NVIDIA GPU's to "set the bar" for performance as a workstation platform for this class of applications. I believe this is indeed the case!


The AMD Threadripper 3990x is mile-stone in computing. A 64-core desktop workstation processor was unimaginable a few years ago. This is definitely a "specialty" processor. This is a processor for large parallel computing problems that have excellent scalability. It will be a very compelling Scientific Workstation processor.

Happy computing! --dbk @dbkinghorn

Looking for a GPU Accelerated Workstation?

Puget Systems offers a range of poweful and reliable systems that are tailor-made for your unique workflow.

Configure a System!

Labs Consultation Service

Our Labs team is available to provide in-depth hardware recommendations based on your workflow.

Find Out More!

Why Choose Puget Systems?

Built specifically for you

Rather than getting a generic workstation, our systems are designed around your unique workflow and are optimized for the work you do every day.

Fast Build Times

By keeping inventory of our most popular parts, and maintaining a short supply line to parts we need, we are able to offer an industry leading ship time.

We're Here, Give Us a Call!

We make sure our representatives are as accessible as possible, by phone and email. At Puget Systems, you can actually talk to a real person!

Lifetime Support/Labor Warranty

Even when your parts warranty expires, we continue to answer your questions and even fix your computer with no labor costs.

Click here for even more reasons!

Puget Systems Hardware Partners

Tags: AMD, HPL, linpack, NAMD, Threadripper

Hi Donald, very nice review. One question, did you notice if any application being limited by the memory bandwith? I always see people complaining about this but TR seems to be fighting very well against 6 channels Xeons for example.

Posted on 2020-02-07 18:09:54
Donald Kinghorn

I did some testing with the HPCG benchmark during the last couple of hours I had access to the sys (hopefully I get access again soon)

The results were not wonderful but a colleague that has experience with that benchmark assured that the results were not unexpected and not too bad...

I did tweet those results on Firday (7th) HPCG makes heavy demands on the memory subsystem ... ( I think you may have seen these :-) ... I'm late responding to your comment here) I will probably write this stuff up but maybe not until I get my hands on the system again.

with nx=ny=nz=104 (large enough to not run in cache)
GB/s Summary::Total with convergence and optimization phase overhead=70.3232
Final Summary::HPCG result is VALID with a GFLOP/s rating of=9.27327

(dual Xeon Scalable 16-core does about 4 times that. I'd like to do more optimization work on HPCG for Zen2! my result above was pretty much just a "reference" run)

Posted on 2020-02-10 20:08:23

Many tks for the answer Donald, I had seen your tweet.

Posted on 2020-02-11 01:34:05

How did you cool this beast? Assuming it will run above 80% utilization for a week (a typical deep learning training run)

Posted on 2020-02-09 16:48:56

A Noctua NH-U14S TR4-SP3 cooler should be OK, probably, with a high airflow case. As long as you run it stock, no OC on the cores.

Posted on 2020-02-10 12:12:19

Thanks, but I'm looking for actual experience rather than "should be", and "probably". In my case, there will also be four Quadro 8000 cards (blower type).

Posted on 2020-02-10 15:48:32

The system Don was testing on used a Noctua NH-U12S TR4-SP3. We had been using that cooler on previous Threadripper 3rd Gen processors up here in Labs, and continued to use it for our initial round of testing in the last couple of weeks. In our open-air test bed systems it was sufficient to keep the 3990X from thermally throttling, but since Don's testing was going to be putting it under extended load for longer periods than most of our other benchmarks I did throw a second fan on (in a push-pull configuration).

Our hardware qualification department found that, when inside one of our normal chassis, the U12S was borderline with a single fan. It was sufficient in both single-threaded and fully-threaded situations, though the temps were quite high, but with the right combination of workloads it could actually have *slightly* thermal performance degradation when some (but not all) of the cores were active and running at a higher clock speed. In such situations, adding a second fan resolved the throttling but still left higher temperatures than they liked... but moving to the U14S dropped the temps by several more degrees Celsius, into a comfortable range. Because of that, we are going to be using the U14S going forward on our systems.

HOWEVER, it may not be a good choice for your specific situation. I say that because that heatsink is so wide that it can block the top PCI-Express slot on many motherboards. It doesn't on the particular board we are using, but that board only supports three full-size GPUs. Since you mentioned using four Quadro RTX 8000 video cards, I assume your board will have a different slot layout than ours - and there is a very high chance that the U14S would then block one of the slots and prevent you from having the video card configuration you want.

In your situation, the U12S with dual fans could be an option - if your chassis has sufficient airflow, which you will want since you are going to have four video cards - or else a nice AIO liquid cooler. Selecting one of those is highly chassis dependent, though, since you need to consider where you can mount the radiator and how that will impact the airflow within the system. Good luck :)

Posted on 2020-02-10 17:46:38

Thank you for the detailed response! So if you decide to sell a TR 3970X + 4xQuadro 8000 workstation (btw do you plan to sell them?), which combination of motherboard and cooler would you use? There are currently just two boards (Gigabyte Aorus Xtreme and ASRock Creator) that support quad GPU systems with TR3, and only 2-3 AIO coolers with a proper TR base plate (Thermaltake Floe Triple Riing TR4 Edition seems to have the best ratings). As far as cases, there's Corsair Carbide 540 case (not my favorite design tbh), and some nice high airflow cases from Fractal Design (Meshify S2 looks good, but might not fit the XL-ATX Gigabyte board, need to check). Any advice?

Posted on 2020-02-10 18:34:17

Wait... are you going with the 3990X or 3970X? If you are doing the 3970X (per your last comment) then I would go with the U12S... and probably toss a second fan on, just to give you some extra headroom. I personally *far* prefer heatsinks over AIOs, due to a number of factors (less expensive, easier to install, fewer points of failure, no risk of catastrophic failure (leak), etc).

If you are going for the 3990X, then I don't have any specific advice as I don't have much recent experience with the larger AIOs :/

As for which motherboard... I guess the Gigabyte Aorus Xtreme? We used that in our early 3970X and 3960X testing, but it is physically too large for most of our current cases. I haven't done a good job of keeping up with chassis options outside of those we carry, but if Fractal Design has something large enough for that motherboard and with good airflow / fan layout (keeping in mind the potential mounting needs of an AIO) then that is probably the way to go. I've heard good things about Corsair's cases as well, but never used one personally.

In terms of what we actually sell, currently it is limited to three GPUs on the Threadripper platform, which conveniently allows us to use the U14S for cooling. We are still looking into qualifying a quad-GPU capable motherboard + chassis + cooling combination, but I don't know if we will end up finding a setup that passes our qualification process or not :(

Posted on 2020-02-10 18:42:55

I haven't decided yet on 3990X vs 3970X - 32 higher frequency cores might actually be better to feed four GPUs than 64 slower ones. And if this is true, then 3970X would have higher utilization (and therefore require better cooling) than 3990X.

My main concern is having four GPUs in the case - without them I'd not hesitate to go with an air cooler, but I'm afraid the GPUs will heat up the inside of the case quite a bit, and then the air cooler will struggle. Can you please point me to your build with 3 GPUs on TR platform?

Posted on 2020-02-10 19:09:20

If you are using Quadro GPUs, then they won't be adding much heat inside the chassis. They use blower-style fans, which do a pretty good job of pushing the heat out the back of the card (and thus outside of the system / case). What you do need to ensure, with such cards, is that you have enough fresh air intake to keep those fans fed with cool air from in front or on the side of the system (you don't want to pull air from behind it back in, since that will be heated already).

One of the easiest places to see our options for a Threadripper with up to three GPUs is our V-Ray recommended system. Here is a link to that page, with the 3990X, a lot of RAM, and three RTX 8000s pre-selected (though you can, of course, adjust those options around as you see fit):


Posted on 2020-02-10 19:23:38
Donald Kinghorn

Thanks for jumping in on this William!

Posted on 2020-02-10 20:09:16
Lol Lollo

Hi, thank you for the awesome review. Just a question out of the blue, may I ask what PSU you were using for the test?

Posted on 2020-04-08 04:51:01
Donald Kinghorn

I had this running on a test-bed open air platform with a big 1600W EVGA https://www.pugetsystems.co...
That's a bit overkill for that setup but it's a great power supply. A 1200W would have been plenty. The 1600W will take care of a high-end system with 4 GPU's under constant load without any problem. It's only drawbacks are that its really big and has a beefy "hospital grade" power cable. It's a standard grounded plug but the connector to the PSU itself is different than common PSU cables

Posted on 2020-04-08 15:33:15
Lol Lollo

I was thinking of using the Corsair AX1600i for my setup (Ryzen TR 3990x, 4x RTX 2080Ti) and thanks to your help I know that it'll be enough! As for the cables, I actually prefer having bigger cables since it provides a sense of relief for some unfounded reason. Thank you for your answer!

Posted on 2020-04-09 01:34:59

Hi Donald,

I am considering a system with an AMD 3970X CPU and 1-2 GPUs for doing various molecular modeling tasks, including virtual screening and MD simulations. I had originally looked at configurations with 2 x RTX2080Ti GPUs, but now that the RTX 30-series have been introduced, I think it would be better to go with 2 x RTX3080 GPUs or perhaps 1 x RTX3090 GPU.

If you would care to speculate, what would you project as the optimal workstation configuration for these molecular modeling applications (i.e., with the 3970X CPU and 1 or 2 of the above-mentioned GPUs)? Some concerns I have about the 30-series include price, power consumption, cooling, and accommodating their large size on motherboards and computer cases.

Posted on 2020-09-09 15:11:59
Donald Kinghorn

:-) I should be able to give you a good answer to this soon. I'll be testing with NAMD for comparison ... I honestly don't know what to expect and I'm eager to find out. ... soon.

We will be doing extensive power and cooling testing and design qualification. We'll sort out all the issues ... and there are plenty :-)

Your base platform sounds great. The 3970x + 1 or 2 GPU's is excellent. I won't be doing initial testing on that platform but I will have a good idea of CPU-GPU balance. (and will test specifically on 3970x after the dust settles from the first release of the new cards)

Posted on 2020-09-09 17:42:54

Hi Donald, I am also using Nvidia GPU(GTX 1080 Ti) + Threadripper 3990x, and I tried Ubuntu 18.04 you mentioned, but I just can't get it installed. I am keep getting black screen.

How exactly should one get Ubuntu installed on such system?

I mean I tried mce=off and nomodeset, still does not work.

Posted on 2020-12-04 15:09:36
Donald Kinghorn

First I would go ahead and move to Ubuntu 20.04 now. If you get a recent install image it may detect your 1080Ti at boot and do the right thing.

If you have trouble with the 20.04 install using the "desktop live" installer then I recommend trying the server installer. After you have the basic server installed you can do,
sudo apt-get dist-upgrade
sudo shutdown -r now
sudo apt-get install dkms build-essential
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get install nvidia-driver-455

Then do sudo tasksel and pick the desktop environment you want to use and install it
sudo shutdown -r now
... and you should be all set!

If the 20.04 server "live" installer still gives you trouble then grab the "legacy" installer. I personally use the legacy installer for every Ubuntu install that I do since is almost always works on any hardware (it uses the Debian installer) . Of course they are not supporting it any more! Arrrggg!! You have to search for it since they don't link to it. You can google search for "Ubuntu 20.04 legacy server" and should find the page ...
here it is ... https://cdimage.ubuntu.com/...

They pop up a question panel asking why you want to use it. They don't have an option saying " [x] because the live installers suck!" :-)

Posted on 2020-12-04 18:04:24

Hi Donald,

Do I have to format the disk first? I mean it's a fresh new PC and have no OS installed previously.

Right now I tried the desktop version if I boot into the usb drive with the UEFI I got black screen, if I boot into the same usb drive without the UEFI it says not enough memory.

I am now trying the server edition...

Posted on 2020-12-04 19:46:29
Donald Kinghorn

No, you don't need to format your drive first ... I think server will work for you. If the live server installer still gives you trouble then try the legacy image. ... you might just want to go straight to the "legacy server" USB image

The "not enough memory" message seems a little strange but I don't know what it's about ... You try to stick with UEFI in any case

Posted on 2020-12-07 16:35:51

Hello, Donald, after trying the server version you mentioned which still does not work, I finally I identified the problem, and it's a trivial one. The BIOS for my Designare TRX40 mother motherboard has to be updated to be fully compatible with the Threadripper 3990X. Silly me. Sorry for all the trouble, and thank you very much for your time.

Posted on 2020-12-07 21:13:38
Donald Kinghorn

Yes indeed :-) I'm happy you got it sorted out. And, thanks for posting back, that may help someone else. I didn't think of that but it's one of the things to look out for when a newer CPU is going onto an older motherboard ... We sometimes have trouble with that here when vendors send us boards that haven't been flashed to recent BIOS. The builders hate it! They have to interrupt their workflow to put in an old CPU just to flash the BIOS before they can proceed with the build ... and they may not find out they need to do that until they have the system built up with a new CPU that the BIOS doesn't support. ... like what just happened to you!
Best wishes --Don

Posted on 2020-12-08 15:45:30

Ok, I tried the server version, but still got black screen. What I did is the followings:

Boot into the UEFI usb drive directory, then the first option is install Ubuntu hit enter and I got black screen. I even tried putting the mce=off and nomodeset, but it still give me black screen https://uploads.disquscdn.c... https://uploads.disquscdn.c...

Posted on 2020-12-04 20:17:20
Uxia Pavlowa

Hey, Now after so long, new updates, new AGESA, new BIOSes, new drivers ... I think this test could get an update in March 2021 after all these multi-tier improvements to the whole platform and system. The results would probably vary considerably.

Posted on 2021-03-06 08:41:19