Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1339
Dr Donald Kinghorn (Scientific Computing Advisor )

Numerical Computing Performance of 3 Intel 8-core CPUs - i9 9900K vs i7 9800X vs Xeon 2145W

Written on January 25, 2019 by Dr Donald Kinghorn
Share:


Intel makes a lot of different CPU's! There are the very expensive multi-socket Xeon Scalable (Purley) processors, 58 of them! There are low power mobile and embedded processors and of course the single socket "Desktop PC", "Enthusiast" and "Workstation" processors. In this post I'll take a brief look at the numerical computing performance of three very capable 8-core processors. These CPU's are in the "sweet-spot" with 8-cores and high core clock frequencies. All three are great CPU's but there are some significant differences that can cause confusion.

i9 9900K or i9 9800X or Xeon 2145W -- which processor is best for you? The answer is, as always, -- it depends... Hopefully this post will help you decide which processor fits best with your dependencies.


The 8-core "sweet-spot"(?)

Why do I say 8-cores is the CPU "sweet-spot"?

  • 8-core systems in the form of dual socket 4-core CPU's from both Intel and AMD were the the foundation of modern parallel computing. That was the standard scientific workstation configuration through most of the 2000's. Dual 4-core system nodes in clusters were the base of distributed parallel super-computing.
  • There are a lot of applications that will scale in parallel efficiently on 8-cores. Writing parallel code can be very difficult and scaling can fall off rapidly after 4-8 processes. There are inherently parallel applications that will scale to 10's of thousands of processor cores but a typical target for a programmer is to get good scaling with 4-8 cores on a single system. That, in its self, can be a remarkable achievement!
  • Modern 8-core processors as presented here offer very good performance for the cost.
  • 8-cores allows simultaneous application and job runs allowing efficient workflow and good hardware utilization.
  • A system with a good 8-core CPU makes a great platform for GPU accelerated computing!

For a very simple low cost workstation you could use a processor with fewer cores but these days I feel an 8-core is a good base-line for a compute oriented workstation.

It can certainly be advantageous to have more cores. If you have code that scales well in parallel or a heavy multi-tasking workflow an Intel X-series or Xeon-W 18-core processor offers excellent performance for a very reasonable cost. In fact the 18-core processors are so good that I generally don't recommend dual Xeon workstations very often anymore.


Important differences between i9 9900K, i7 9800X, and 2145W Xeon

The following table list some of the specification differences between these processors relevant for consideration in a numerical computing workstation configuration.

Intel 8-Core i9 9900K, i7 9800X, Xeon 2145W Features

Features i9-9900K i7 9800X Xeon 2145W
Code Name Coffee Lake Skylake-X Skylake-W
Base Clock 3.6GHz 3.8GHz 3.7GHz
Max Turbo 5.0GHz 4.5GHz 4.5GHz
All Core 4.7GHz 4.1GHz 4.3GHz
Cache 16 MB 16.5 MB 11 MB
TDP 95 W 165 W 140 W
Max Mem 64 GB 128 GB 512 GB (Reg ECC)
Mem Channels 2 4 4
Max PCIe lanes 16 44 48
X16 GPU support 1 2 3 (4 w/PLX)
Vector Unit AVX2 AVX512 AVX512
Price $500 $600 $1113

The features that will have the biggest impact on compute performance are core clocks and AVX unit. The high clock speeds, fast memory, large cache and low power consumption of the Coffee-Lake processor is very compelling. However, the last item in the table above, AVX, can have a significant impact on numerical compute performance.

Important differences to note for a system specification are the maximum amount of memory that can be used and the number of PCIe lanes. The number of PCIe lanes is particularly important for GPU accelerated workstations since it is good (but not essential) to use X16 slots for multi-GPU configurations.

Note: 32GB non-Reg DDR4 memory modules are becoming available so it may be possible soon to have 128GB memory in a 9900K system and 256GB in a 9800X system.


Hardware under test:

I used open test-beds with the hardware but you can try different configurations using all of these components on our general "Custom Computers" page. (We do have more application oriented pages too so feel free to explore.)

  • Intel Core i9 9900K 3.6GHz 8-Core
    • Gigabyte Z390 Designare Motherboard (1 x X16 PCIe)
    • 64 GB DDR4-2666 Memory
    • 1 TB Intel 660p M.2 SSD
    • NVIDIA RTX 2080Ti
  • Intel Core i7 9800X 3.8GHz 8-Core
    • Gigabyte X299 Designare Motherboard (2 x X16 PCIe)
    • 128GB DDR4-2666 Memory
    • 1 TB Intel 660p M.2 SSD
    • NVIDIA RTX 2080Ti
  • Intel Xeon 2145W 3.7GHz 8-Core
    • Asus WS C422 SAGE/10G Motherboard (4 x X16 PCIe)
    • 256GB DDR4-2666 Reg ECC Memory
    • 1 TB Intel 660p M.2 SSD
    • NVIDIA RTX 2080Ti

Software:

I had the OS and applications installed on the Intel 660p M.2 drive and swapped it between the test systems.

I am running Linux for this testing but there is no reason to expect that the same types of workloads on Windows 10 would show any significant difference in performance.


Results

Linpack

An optimized Linpack benchmark can achieve near theoretical peak performance for double precision floating point on a CPU. It is the first benchmark I run on any new CPU's. It is the benchmark (still) used to rank the Top500 supercomputers in the world. I feel it is the best performance indicator for numerical computation with maximally optimized software. I even went to the trouble to build an optimized Linpack for AMD Threadripper recently. The Intel optimized Linpack makes great use of the excellent MKL library. There are many programs that link to MKL for performance. This includes the very useful "numerical compute scripting" packages Anaconda Python and Mathworks MATLAB.

linpack chart

Clearly the AVX512 vector units in the 9800X and 2145W have a significant impact on Linpack performance. This is basic numerical linear algebra which is the core of a lot of compute intensive application.

Note: These jobs ran with 8 "real" threads since "Hyperthreads" are not useful for this calculation.

Note: These results are with a large problems size of 75000 simultaneous equations (a 75000 x 75000 "triangular solve") and used approximately 44GB of systems memory.

NAMD

I also tested with the Molecular Dynamics package NAMD. NAMD scales really well across multiple cores and it is not specifically optimized for Intel hardware. It is highly optimized code and it uses the very interesting Charm++ for it's parallel capabilities. NAMD is an important program and I like it for testing since it is a good example of well optimized code that scales to massive numbers of processes and also has very good GPU acceleration that needs to be balanced by good CPU performance.

NAMD CPU

For these job runs the high all-core-turbo clock of the 9900K has the advantage. The AVX512 vector units are not that important for this code that is designed to run well on a wide variety of hardware.

Note: These jobs ran with 16 threads since "Hyperthreads" help with the way NAMD uses threads. It is always worth experiment with Hyperthreads to see if they help or not.

Note: The performance units here are "days per nano-second" of simulation time. The 9900K would save 1 day out of a week long job run to get 1 nano-second of simulation time. Adding a GPU will dramatically increase the performance as will be seen in the next chart.

NAMD GPU

The first thing to notice is that the performance has increased by over a factor of 10 by including the NVIDIA RTX 2080Ti! There seems to be an advantage for the 9800X and 2145W when teh GPU is added to the system. I'm not sure exactly why that is. These CPU's do have a lot more PCIe lanes than the 9900K but all 3 of these systems were running with 1 GPU in a full X16 slot.

Conclusions and Recommendations

All 3 of these CPU's are great!

Given that my focus is high performance numerical computing I would probably not recommend the i9 9900K. It is a very good processor and the high core clocks will give many applications excellent performance. It is limited by not having the newest Intel core architecture. At it's center it is basically a Haswell core (with lots of incremental tweaks). It is also very limited as a platform CPU for a GPU accelerated system since it only supplies 16 PCIe lanes.

The i7 9800X and Xeon 2145W share the same core architecture as the Intel Scalable (Purley) high-end Xeon CPU's. There are 2 AVX512 vector units per core and the numerical compute performance is outstanding (For code that is optimized for it!). I like both of these processors a lot! The i7 9800X is part of the newly released "X-Series" processors. They offer tremendous performance value. The Xeon 2145W and in general the Xeon-W series are also offer great performance for the cost compared to the much more expensive Xeon Scalable Xeon (Skylake-SP). Both "X-series" and Xeon-W CPU's are available in a variety of core counts up to 18-core. They are great alternatives for what in the past would have been a dual socket Workstation. The Xeon-W also has the advantage of being a Xeon processor i.e. it has more PCIe lanes and supports a larger memory footprint has ECC memory support, very high-end motherboards etc..

Note: Skylake is the newest "core" architecture for Intel. It is the basis for their high-end processors. There was a "Skylake" CPU that was based on the Haswell "core" in the desktop core-i7 line a few "generations" ago. Marketing! I believe we will see an new "core" architecture from Intel by the end of 2019 (hopefully along with a new PICe v4 capable chipset).

Here's my recommendations for a CPU intended for the base of a numerical compute oriented Workstation.

  • For a system where cost is a significant concern I would certainly recommend the "X-series" CPU's and the 8-core i7 9800X is easy to like. If you are working with code capable of GPU acceleration you can configure a system with 2 GPU's at X16 and have what is probably the best high-end performance per dollar you can get.

  • For a more high-end Workstation capable of using 4 GPU's for acceleration and capable of large memory configurations (the best overall platform configuration). The single socket Xeon-W CPU's are the way to go. I recommend these CPU's in a single socket configuration over a dual Xeon configuration for most applications since it avoids problems that can be caused by memory contention in multi-socket systems.

I hope this post has cleared up any confusion you may have had about these different CPU's. If you still have question go ahead and ask in the comments!

Happy computing --dbk

Tags: Intel, i9 9900K, i7 9800X, Xeon 2145W, RTX 2080Ti, Linpack, NAMD
Homeo Morphism

Don,

could you recommend a workstation for someone with a budget of 5k USD with a very real possibility of adding another 5k to upscale it by the end of the year? By the end of the year what will we have got?

If single-socket, which particular CPU would you recommend? I've read this review, but you examine 8-core processors in it, but mention that the best option is 18-core. But which one in particular? Should we eventually aim for 4 GPUs or 2 is only slightly worse? Are there any substantial benefits to having 2080 over an 1080? TI over non-TI?

Say, we bought an 18-core CPU that you would recommend and a single fast GPU now and added three more GPUs by the end of the year... how much do you think our performance gains would be compared to a more conservative/cost-benefit rig? It looks like we'll be running protein-ligand sims that don't involve too many atoms but we want longer sims instead (microsecond trajectories). So it's not like a single task would run for months where a 10% improvement could mean saving us weeks -- it's more like it would be saving us mere hours. Which makes me somewhat hesitant about splurging on TIs, 20xx instead of 10xx, 4 GPUs instead of 2, etc. What's your opinion?

And if it's a dual-socket system, what would your recommendations be? I've read your early analysis, where you examine workstations with 1070 and 1080, but it dates back to 2016. The recommendations must have changed since then...

Also, to clear something up, there's no reason we should go for Tesla, right? I mean they are for supercomputers and their effects are really noticeable when there are thousands of them running in conjunction, which is when their individual gains really add up. Is that correct?

Thank you very much.

Posted on 2019-01-28 22:56:52
Donald Kinghorn

You have a lot of questions :-) I can answer a bunch of then easily ... you really need to look at some of my more recent posts. I'm guessing you landed here from a google search. I've got a lot of posts with testing the new GPU's etc. on our HPC blog https://www.pugetsystems.co...

A good place to play with configurations is https://www.pugetsystems.co... That pages is not oriented to anything specific but you can try a lot of ideas there. Do please remember that when you get a system from us you are getting a lot more than just a list of parts!

Upgrading a system later almost never works out the way you think it might. Things change components go out of production, and, there are only so many things that can be updated. My recommendation is usually to get as good a system as will fit your budget and go to work on jobs you have at hand. If you have a similar budget a year later then you can consider upgrading or adding to some key components like GPU accelerators or system memory. However, it is often that you find your workflow might be best with an additional system or an updated replacement or that you can simply keep using what you have and save you budget for a replacement after 2-3 years.

Your questions about number of GPU's is something that a lot of people think about. The answer is always "it depends". 4 GPU's (or even 2 instead of 1) may not double your throughput. It depends on the code you are running and your overall workflow. The newer GPU's are really fast for a lot of ML tasks and you may not need more than 1 for your job size. If you have code that works well with 4 GPU's and you can benefit from the increase in performance then that can be an incredible amount of computation throughput. You also may have code that will only run on 1 GPU but you have lots of jobs so you could run 4 jobs at a time. Then you have to decide to get one larger system or maybe a couple of smaller ones that are more task specific.

Some general guidelines:
Try to evaluate the code you are planning on using before you make a big investment in hardware.
For Multi-GPU I recommend 2-4 CPU cores for each GPU and twice the amount of system memory as your total GPU memory
Keep GPU's on X16 PCIe if possible (and PLX chips are like network switches for PCIe they work really well )
Keep in mind that some code may only be partially GPU accelerated and you may need a balance with CPU power. NAMD is a great example of this.

I hardly ever recommend dual or quad socket systems anymore. Not that they are not good, it's just that you can get a great single socket system for less then half the cost that will likely be all the compute capability you need. (and it avoids problems with memory contention across CPU mem spaces)

This was a long reply but I hope it helps you and others. I should add that our sales consultants here at Puget are really good and if they are unsure about some specific scientific workload questions they come to me and we can usually work up a good recommendation.

Best wishes! --Don

Posted on 2019-01-31 17:27:44
Donald Kinghorn

I wanted to add one more comment ... since you asked ... I don't have any firm insight or timelines but I do expect to see platform, i.e. chipset, updates by the end of the year that include PCIe v4. That should be backward compatible initially but sometime in 2020 I expect significant overall platform updates; boards, CPU's GPU's etc. That should provide a nice bump up in compute performance ... In the mean time get what you can afford and get some work done! Things are still getting better for compute :-)

Posted on 2019-01-31 17:37:10
Homeo Morphism

Thank you for the reply. Much food for thought... We are not in the US though -- considering getting a rig from you is not an option, unfortunately.

Posted on 2019-02-03 14:51:50
Alexandre Gandini

Will numpy code from Anaconda python distribution automatically take advantage of AVX 512, or is necessary to tweak the code somehow?

I have some highly intense numpy/pandas financial calculations and if it's the case I will be glad to upgrade my machine to a 9800X processor.

Don't know where to buy it for MSRP, though.

Thanks a lot, congrats for your awesome work comparing hardware for numerical calculations.

Posted on 2019-01-30 01:46:22
Donald Kinghorn

Thanks! Much to their credit, Anaconda Python includes Intel's MKL by default. If you do conda list you will see several MKL packages

The new X-series CPU's are really good (as were the old ones) They go all the way up to an 18-core. Any of them are great value for compute on CPU. You will need a platform that supports it i.e X299 chipset these are socket LGA 2066 processors.

Posted on 2019-01-31 16:49:54
Alexandre Gandini

Thanks for the reply, you convinced me, X-series CPU and X299 mobo is the way to go.

Posted on 2019-01-31 17:01:17
Nathan Zechar

Would you be able to re-run these test, and possibly some others with an Nvidia RTX Quadro?

The article here - https://www.velocitymicro.c...

states that the Quadro is better for double precision computation. I'm wonder just how big a difference in performance this would be.

Posted on 2019-03-11 04:30:19

Don could probably elaborate on this more, but the tasks he ran in this post are purely CPU-based. So changing the GPU won't affect the results at all. He does have many other posts in his HPC Blog that are focused on GPU performance, however, like his recent RTX Titan TensorFlow performance post: https://www.pugetsystems.co...

Be very wary of that article you linked to - it has some true facts in it, but also a ton of very inaccurate statements that are not at all correct. Their talk about double precision is unfortunately one of those areas that they are not correct on. A long time ago, it was true that Quadro had better FP64 (double precision) performance, but it hasn't been that way for at least several generations. I think you have to go back to 2011 with the original Quadro 2000-6000 series (After FX, but before the ones with the K/M/P prefix) in order to get better FP64 performance from Quadro compared to GeForce.

Today, most Quadro and GeForce cards run at 1:32 for FP64, meaning that the raw performance for that is going to be about 32x slower than FP32 (single precision). There are a few exceptions, but they are not specific to any one line. For example, the Titan V and Quadro GP100 both have good FP64 performance, but pretty much no other modern card is going to be good for FP64 unless you get into the Tesla line.

Double precision is really only used for a handful of workflows, most commonly high-end engineering simulations and some AI/machine learning. Even in machine learning, however, most developers are going to stick with single precision since double precision either isn't necessary or the tradeoff in performance isn't worth it. Don would be able to talk more about the why behind that though.

If you have a workflow that uses FP64 or are writing code to use double precision on the GPU, then you pretty much need a Titan V (recently discontinued, but you can still find them), a Quadro GP100, or a Tesla card.

Posted on 2019-03-11 18:25:01
Nathan Zechar

Thank you for the detailed reply Matt!

I'm a little confused about your statement that the tasks that were ran were purely CPU-based, because Don is showing NAMD benchmark with GPU acceleration.

I accidentally ran across this website when doing some research on building a home scientific computing workstation.

And I'm so glad I did.

Posted on 2019-03-12 00:16:41

You are right, I missed that he included the one GPU accelerated benchmark to show how each platform does with that kind of workload as well. So if he switched to Quadro, the results would be slightly different on that test but it completely depends on the GPU model. Swiching to a RTX 8000 would be the same as switching to a Titan RTX or switching to a RTX 5000 would be the same as a RTX 2080. I'm pretty sure that is how it works out, but I may be off by a model or so.

The main thing you get from Quadro for this kind of stuff is the higher VRAM capacities. More VRAM does allow for some optimizations in the code if you are doing your own development, but that is starting to get beyond what I am personally familiar with so I can't go into detail.

Really, the short answer is that GeForce and Quadro perform the same now for double precision (with the few exceptions we mentioned).

Posted on 2019-03-12 00:22:27
Donald Kinghorn

Matt gave a very good reply! There are two Workstation cards that have good FP64 (double precision) support, The Titan V and the Quadro GV100. (also, last gen GP100 Quadro) Those cards used the same "core" design as the Tesla GPU's but they have display output. My personal feeling is the the Titan V was the best workstation compute device ever created! ... so far... I'm really sad that they have stopped making it. I think it was a bargain at $3000. The reason I think that is because I used it for a really cool calculation where I really needed double precision and I was blown away by how easy it was to work with and how good the performance was. See my series of post on doing Quantum Mechanics with PyTorch ending with https://www.pugetsystems.co...

I'm looking forward to seeing aother NVIDIA Titan that is based on a Tesla GPU "core" but that wont happen until they do a new Tesla (a "real Tesla not the T4 which is for ML/AI inference) :-)

Posted on 2019-03-11 20:15:42
Nathan Zechar

Thanks for the reply Don.

I was gearing up to purchase a i9-9900K for use with anaconda python, but started reading up on ECC RAM and wasn't sure if I should get a Xeon system instead. Then I came across this article and it appears a system with an i7-9800X would give me the best performance per dollar.

And you code in python.

This is awesome!

Posted on 2019-03-12 00:24:26
Donald Kinghorn

Yes, indeed! Python together with a framework like PyTorch or TensorFlow is amazingly useful and can give very good performance (for any kind of scientific programming, not just ML/AI stuff).

numpy (and other math packages) in Anaconda Python will get a nice speed up from the AVX512 vector units. They bundle the excellent Intel MKL library with the distribution. You really don't need the ECC memory and the new Core-X processors are all really nice!

The RTX 2080Ti is a great card and I would recommend it. It's a little expensive but it has 12GB mem which is nice. Double precision on GPU's has always been a problem so nearly everyone ends up using FP32 and that usually works out fine. My crazy QM stuff would be an exception. It does work OK in single (fp32) but I really needed fp64 because I was optimizing energy values down to 9 or 10 digits of accuracy!

I'm working on a post now with some ML testing with several GPU's.

Posted on 2019-03-12 01:02:48
Nathan Zechar

I look forward to it.

I have several GTX 1070's laying around that were once used for mining. The motherboard they were attached to could handle up to 12 of them, but only utilized one PCI lane per card, and the CPU it supported wasn't very powerful.

I believe I've found a X299 that can support up to 4 cards, which would be PCIe 8.0 x 4, giving me 32GB to play with. They aren't the RTX series cards, but it would be neat to start playing around them.

Posted on 2019-03-13 00:37:41
Alexandre Gandini

Apparently the 9800X is sold out everywhere for some months now. Could you get one?

Posted on 2019-04-08 20:03:31
Donald Kinghorn

The Intel shortages are affecting everyone! This is why they released the "F" processors (they have bad graphics silicon so they disable it. It's otherwise a good chip!) We do have a good supply at the moment of the 9800X but of course they are for our builds. It is a really good processor, I was impressed with the performance!

Posted on 2019-04-09 15:43:22