Table of Contents
- Test systems: AMD 2990WX and Intel Xeon-W 2175
- Testing Results
In my recent testing with the AMD Threadripper 2990WX is was impressed by the CPU based performance with the molecular dynamics program NAMD. Of course adding NVIDIA GPU’s to the system gives a dramatic improvement since NAMD has good GPU acceleration. I like NAMD for many reasons and one of them is that it makes makes a pretty good benchmark for looking at CPU/GPU performance. NAMD requires a balance between CPU and GPU for the best results. It is also not very sensitive to speedup from AVX vector units. NAMD generally scales well with lots of cores (or lots of cluster nodes). After some discussions I decided it would be good to look at multi-GPU performance with NAMD on Threadripper. The assumption being that there would be enough cores to keep up with the NVIDIA’s powerful new GPU’s.
My last post AMD Threadripper 2990WX 32-core vs Intel Xeon-W 2175 14-core – Linpack NAMD and Kernel Build Time is good background for the present post and has some interesting comparison with an Intel 14-core Xeon-W system.
I spent a long afternoon on the same basic system I used in the last post. I was able to get a little testing done with the 24-core Threadripper 2970WX but most of the results are utilizing the 2990WX 32-core processor.
I had 2 “side fan” cooled NVIDIA RTX 2070 GPU’s. It is not practical to use more than 2 of these types of cards in a system because of thermal throttling issues (very bad), see NVIDIA Dual-Fan GeForce RTX Coolers Ruining Multi-GPU Performance. A couple of days after doing the testing we got in our first batch of RTX 2070’s with blower fans! You should be able to configure systems with these now.
We did have blower fan versions of the RTX 2080Ti so I was able to test with 1 to 4 of these great cards.
Test systems: AMD 2990WX and Intel Xeon-W 2175
The AMD Threadripper system I used was a test-bed build with the following main components,
- AMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (4.2GHz Turbo)
- AMD Ryzen Threadripper 2970WX 24-Core @ 3.00GHz (4.0GHz Turbo)
- Gigabyte X399 AORUS XTREME-CF Motherboard
- 128GB DDR4 2666 MHz memory
- Samsung 970 PRO 512GB M.2 SSD
- NVIDIA RTX 2070
- NVIDIA RTX 2090Ti
NAMD, NAMD_2.13_Linux-x86_64-multicore and NAMD_2.13_Linux-x86_64-multicore-CUDA
When I sat down in front of the system it had a TR 2970WX 24-core processors in it so I did a few job runs with that before I swapped in the 2990WX. The first jobs I ran were CPU only. the results were very satisfying in that the scaling with increasing number of threads was very uniform. It is interesting that NAMD performance improved uniformly with SMT “hyper-threads”. This is not always the case and you often see that only “real” cores improve performance.
The graph shows how well the SMT threads worked with NAMD. Note that Lower is Better! What is being reported is the default NAMD performance output in day/ns i.e days needed to do 1 nano-second of simulation. Yes, this is a very compute intensive task! Big jobs can run for weeks or months. My job runs were for 500 time steps of the simulation.
That is very good CPU performance for that job run! In an older post, NAMD Performance on Xeon-Scalable 8180 and 8 GTX 1080Ti GPUs, using a dual Xeon 8180 system with a total of 56 CPU cores I had a result of 2.93 day/ns with 32 cores. Those processors cost over $10,000 each. So the 32-core Threadripper is a bargain by comparison. [Using all 56 cores on that Intel system I got 1.68 day/ns]. Note: if you look at that older post you will see that I took the inverse of the normal NAMD output and reported ns/day. Keep that in mind if you make a comparison. (sorry about that)
GPU accelerated results
The first thing I should say about the GPU results is that, even with the good performance from the 32-cores of the 2990WX, it’s just not enough to keep up with more than 1 or 2 of the new NVIDIA RTX GPU’s. The range of the worst result with 1 2070 to the best result with 4 2080Ti’s is only a speedup of 1.6.
I’m not saying these results are bad! They are actually very good and they clearly show how much performance gain there is from adding even a “modest” GPU like the RTX 2070 which gives a speedup of nearly 5 over the CPU only result. However, by the time you have added 2 of the RTX 2070’s or 2080Ti’s you are being limited by the CPU.
In the older post I mentioned above, the dual Xeon 8180’s provided enough CPU capability to get 0.438 day/ns with 1 GTX 1080Ti and using 2 1080Ti’s gave 0.248 day/ns. Additional GPU’s only made a small performance improvement over that, again being limited by CPU. (I tested with up to 8 GPU’s).
Another thing to note in these results is the effect of the SMT “hyper-threads”. With the CPU only runs there was a nice improvement with more SMT threads. When the GPU’s were added the results were not as predictable. With more than 1 GPU it seemed that the SMT treads were a determent to performance.
The following table has all of the results of the testing.
In this chart you can see that there is not much performance difference for many of the configurations. Also note that there can be significant performance variation between job runs. I only did two job runs on each test configuration and took the best one. It is clear that the TR 2990WX is providing more CPU performance than what is balanced with 1 RTX 2070. Adding a second RT 2070 or 1-2 RTX 2080Ti’s provided more GPU performance than the CPU could effectively keep up with.
The following chart gives an easier to see, more uniform, performance scaling as the system specs are improved.
No mater what CPU you have in your system if you are running NAMD then adding an NVIDIA GPU will be a significant performance boost. Hopefully this post shows that, and also makes clear the need for significant CPU performance to efficiently balance with modern GPU’s. My recommendation for an AMD CPU based NAMD system would be the TR 2990WX and either 2 RTX 2070’s or 1 RTX 2080Ti.
I will be doing a more comprehensive test with many GPU’s for jobs including NAMD ( but with more focus on Machine Learning/AI ). That will be using the new Intel Core-X processors. In general I personally prefer an Intel CPU with AVX512 vector units for the basis of any scientific workstation. However, the high core count AMD Threadripper did really well in this NAMD testing. … but see the note below …
As a last note, I had to cut my testing short because the system failed after a normal OS update that I did in preparation to install CUDA for CPU-GPU memory bandwidth testing. I had 4 RTX 2080Ti’s in the system and had booted to that with no problems. After a simple “apt-get upgrade” the system would no longer get to the boot prompt. I didn’t have the time to try to find the problem.
Happy computing! –dbk