Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1339
Dr Donald Kinghorn (Scientific Computing Advisor )

Numerical Computing Performance of 3 Intel 8-core CPUs - i9 9900K vs i7 9800X vs Xeon 2145W

Written on January 25, 2019 by Dr Donald Kinghorn
Share:


Intel makes a lot of different CPU's! There are the very expensive multi-socket Xeon Scalable (Purley) processors, 58 of them! There are low power mobile and embedded processors and of course the single socket "Desktop PC", "Enthusiast" and "Workstation" processors. In this post I'll take a brief look at the numerical computing performance of three very capable 8-core processors. These CPU's are in the "sweet-spot" with 8-cores and high core clock frequencies. All three are great CPU's but there are some significant differences that can cause confusion.

i9 9900K or i9 9800X or Xeon 2145W -- which processor is best for you? The answer is, as always, -- it depends... Hopefully this post will help you decide which processor fits best with your dependencies.


The 8-core "sweet-spot"(?)

Why do I say 8-cores is the CPU "sweet-spot"?

  • 8-core systems in the form of dual socket 4-core CPU's from both Intel and AMD were the the foundation of modern parallel computing. That was the standard scientific workstation configuration through most of the 2000's. Dual 4-core system nodes in clusters were the base of distributed parallel super-computing.
  • There are a lot of applications that will scale in parallel efficiently on 8-cores. Writing parallel code can be very difficult and scaling can fall off rapidly after 4-8 processes. There are inherently parallel applications that will scale to 10's of thousands of processor cores but a typical target for a programmer is to get good scaling with 4-8 cores on a single system. That, in its self, can be a remarkable achievement!
  • Modern 8-core processors as presented here offer very good performance for the cost.
  • 8-cores allows simultaneous application and job runs allowing efficient workflow and good hardware utilization.
  • A system with a good 8-core CPU makes a great platform for GPU accelerated computing!

For a very simple low cost workstation you could use a processor with fewer cores but these days I feel an 8-core is a good base-line for a compute oriented workstation.

It can certainly be advantageous to have more cores. If you have code that scales well in parallel or a heavy multi-tasking workflow an Intel X-series or Xeon-W 18-core processor offers excellent performance for a very reasonable cost. In fact the 18-core processors are so good that I generally don't recommend dual Xeon workstations very often anymore.


Important differences between i9 9900K, i7 9800X, and 2145W Xeon

The following table list some of the specification differences between these processors relevant for consideration in a numerical computing workstation configuration.

Intel 8-Core i9 9900K, i7 9800X, Xeon 2145W Features

Features i9-9900K i7 9800X Xeon 2145W
Code Name Coffee Lake Skylake-X Skylake-W
Base Clock 3.6GHz 3.8GHz 3.7GHz
Max Turbo 5.0GHz 4.5GHz 4.5GHz
All Core 4.7GHz 4.1GHz 4.3GHz
Cache 16 MB 16.5 MB 11 MB
TDP 95 W 165 W 140 W
Max Mem 64 GB 128 GB 512 GB (Reg ECC)
Mem Channels 2 4 4
Max PCIe lanes 16 44 48
X16 GPU support 1 2 3 (4 w/PLX)
Vector Unit AVX2 AVX512 AVX512
Price $500 $600 $1113

The features that will have the biggest impact on compute performance are core clocks and AVX unit. The high clock speeds, fast memory, large cache and low power consumption of the Coffee-Lake processor is very compelling. However, the last item in the table above, AVX, can have a significant impact on numerical compute performance.

Important differences to note for a system specification are the maximum amount of memory that can be used and the number of PCIe lanes. The number of PCIe lanes is particularly important for GPU accelerated workstations since it is good (but not essential) to use X16 slots for multi-GPU configurations.

Note: 32GB non-Reg DDR4 memory modules are becoming available so it may be possible soon to have 128GB memory in a 9900K system and 256GB in a 9800X system.


Hardware under test:

I used open test-beds with the hardware but you can try different configurations using all of these components on our general "Custom Computers" page. (We do have more application oriented pages too so feel free to explore.)

  • Intel Core i9 9900K 3.6GHz 8-Core
    • Gigabyte Z390 Designare Motherboard (1 x X16 PCIe)
    • 64 GB DDR4-2666 Memory
    • 1 TB Intel 660p M.2 SSD
    • NVIDIA RTX 2080Ti
  • Intel Core i7 9800X 3.8GHz 8-Core
    • Gigabyte X299 Designare Motherboard (2 x X16 PCIe)
    • 128GB DDR4-2666 Memory
    • 1 TB Intel 660p M.2 SSD
    • NVIDIA RTX 2080Ti
  • Intel Xeon 2145W 3.7GHz 8-Core
    • Asus WS C422 SAGE/10G Motherboard (4 x X16 PCIe)
    • 256GB DDR4-2666 Reg ECC Memory
    • 1 TB Intel 660p M.2 SSD
    • NVIDIA RTX 2080Ti

Software:

I had the OS and applications installed on the Intel 660p M.2 drive and swapped it between the test systems.

I am running Linux for this testing but there is no reason to expect that the same types of workloads on Windows 10 would show any significant difference in performance.


Results

Linpack

An optimized Linpack benchmark can achieve near theoretical peak performance for double precision floating point on a CPU. It is the first benchmark I run on any new CPU's. It is the benchmark (still) used to rank the Top500 supercomputers in the world. I feel it is the best performance indicator for numerical computation with maximally optimized software. I even went to the trouble to build an optimized Linpack for AMD Threadripper recently. The Intel optimized Linpack makes great use of the excellent MKL library. There are many programs that link to MKL for performance. This includes the very useful "numerical compute scripting" packages Anaconda Python and Mathworks MATLAB.

linpack chart

Clearly the AVX512 vector units in the 9800X and 2145W have a significant impact on Linpack performance. This is basic numerical linear algebra which is the core of a lot of compute intensive application.

Note: These jobs ran with 8 "real" threads since "Hyperthreads" are not useful for this calculation.

Note: These results are with a large problems size of 75000 simultaneous equations (a 75000 x 75000 "triangular solve") and used approximately 44GB of systems memory.

NAMD

I also tested with the Molecular Dynamics package NAMD. NAMD scales really well across multiple cores and it is not specifically optimized for Intel hardware. It is highly optimized code and it uses the very interesting Charm++ for it's parallel capabilities. NAMD is an important program and I like it for testing since it is a good example of well optimized code that scales to massive numbers of processes and also has very good GPU acceleration that needs to be balanced by good CPU performance.

NAMD CPU

For these job runs the high all-core-turbo clock of the 9900K has the advantage. The AVX512 vector units are not that important for this code that is designed to run well on a wide variety of hardware.

Note: These jobs ran with 16 threads since "Hyperthreads" help with the way NAMD uses threads. It is always worth experiment with Hyperthreads to see if they help or not.

Note: The performance units here are "days per nano-second" of simulation time. The 9900K would save 1 day out of a week long job run to get 1 nano-second of simulation time. Adding a GPU will dramatically increase the performance as will be seen in the next chart.

NAMD GPU

The first thing to notice is that the performance has increased by over a factor of 10 by including the NVIDIA RTX 2080Ti! There seems to be an advantage for the 9800X and 2145W when teh GPU is added to the system. I'm not sure exactly why that is. These CPU's do have a lot more PCIe lanes than the 9900K but all 3 of these systems were running with 1 GPU in a full X16 slot.

Conclusions and Recommendations

All 3 of these CPU's are great!

Given that my focus is high performance numerical computing I would probably not recommend the i9 9900K. It is a very good processor and the high core clocks will give many applications excellent performance. It is limited by not having the newest Intel core architecture. At it's center it is basically a Haswell core (with lots of incremental tweaks). It is also very limited as a platform CPU for a GPU accelerated system since it only supplies 16 PCIe lanes.

The i7 9800X and Xeon 2145W share the same core architecture as the Intel Scalable (Purley) high-end Xeon CPU's. There are 2 AVX512 vector units per core and the numerical compute performance is outstanding (For code that is optimized for it!). I like both of these processors a lot! The i7 9800X is part of the newly released "X-Series" processors. They offer tremendous performance value. The Xeon 2145W and in general the Xeon-W series are also offer great performance for the cost compared to the much more expensive Xeon Scalable Xeon (Skylake-SP). Both "X-series" and Xeon-W CPU's are available in a variety of core counts up to 18-core. They are great alternatives for what in the past would have been a dual socket Workstation. The Xeon-W also has the advantage of being a Xeon processor i.e. it has more PCIe lanes and supports a larger memory footprint has ECC memory support, very high-end motherboards etc..

Note: Skylake is the newest "core" architecture for Intel. It is the basis for their high-end processors. There was a "Skylake" CPU that was based on the Haswell "core" in the desktop core-i7 line a few "generations" ago. Marketing! I believe we will see an new "core" architecture from Intel by the end of 2019 (hopefully along with a new PICe v4 capable chipset).

Here's my recommendations for a CPU intended for the base of a numerical compute oriented Workstation.

  • For a system where cost is a significant concern I would certainly recommend the "X-series" CPU's and the 8-core i7 9800X is easy to like. If you are working with code capable of GPU acceleration you can configure a system with 2 GPU's at X16 and have what is probably the best high-end performance per dollar you can get.

  • For a more high-end Workstation capable of using 4 GPU's for acceleration and capable of large memory configurations (the best overall platform configuration). The single socket Xeon-W CPU's are the way to go. I recommend these CPU's in a single socket configuration over a dual Xeon configuration for most applications since it avoids problems that can be caused by memory contention in multi-socket systems.

I hope this post has cleared up any confusion you may have had about these different CPU's. If you still have question go ahead and ask in the comments!

Happy computing --dbk

Tags: Intel, i9 9900K, i7 9800X, Xeon 2145W, RTX 2080Ti, Linpack, NAMD
Homeo Morphism

Don,

could you recommend a workstation for someone with a budget of 5k USD with a very real possibility of adding another 5k to upscale it by the end of the year? By the end of the year what will we have got?

If single-socket, which particular CPU would you recommend? I've read this review, but you examine 8-core processors in it, but mention that the best option is 18-core. But which one in particular? Should we eventually aim for 4 GPUs or 2 is only slightly worse? Are there any substantial benefits to having 2080 over an 1080? TI over non-TI?

Say, we bought an 18-core CPU that you would recommend and a single fast GPU now and added three more GPUs by the end of the year... how much do you think our performance gains would be compared to a more conservative/cost-benefit rig? It looks like we'll be running protein-ligand sims that don't involve too many atoms but we want longer sims instead (microsecond trajectories). So it's not like a single task would run for months where a 10% improvement could mean saving us weeks -- it's more like it would be saving us mere hours. Which makes me somewhat hesitant about splurging on TIs, 20xx instead of 10xx, 4 GPUs instead of 2, etc. What's your opinion?

And if it's a dual-socket system, what would your recommendations be? I've read your early analysis, where you examine workstations with 1070 and 1080, but it dates back to 2016. The recommendations must have changed since then...

Also, to clear something up, there's no reason we should go for Tesla, right? I mean they are for supercomputers and their effects are really noticeable when there are thousands of them running in conjunction, which is when their individual gains really add up. Is that correct?

Thank you very much.

Posted on 2019-01-28 22:56:52
Donald Kinghorn

You have a lot of questions :-) I can answer a bunch of then easily ... you really need to look at some of my more recent posts. I'm guessing you landed here from a google search. I've got a lot of posts with testing the new GPU's etc. on our HPC blog https://www.pugetsystems.co...

A good place to play with configurations is https://www.pugetsystems.co... That pages is not oriented to anything specific but you can try a lot of ideas there. Do please remember that when you get a system from us you are getting a lot more than just a list of parts!

Upgrading a system later almost never works out the way you think it might. Things change components go out of production, and, there are only so many things that can be updated. My recommendation is usually to get as good a system as will fit your budget and go to work on jobs you have at hand. If you have a similar budget a year later then you can consider upgrading or adding to some key components like GPU accelerators or system memory. However, it is often that you find your workflow might be best with an additional system or an updated replacement or that you can simply keep using what you have and save you budget for a replacement after 2-3 years.

Your questions about number of GPU's is something that a lot of people think about. The answer is always "it depends". 4 GPU's (or even 2 instead of 1) may not double your throughput. It depends on the code you are running and your overall workflow. The newer GPU's are really fast for a lot of ML tasks and you may not need more than 1 for your job size. If you have code that works well with 4 GPU's and you can benefit from the increase in performance then that can be an incredible amount of computation throughput. You also may have code that will only run on 1 GPU but you have lots of jobs so you could run 4 jobs at a time. Then you have to decide to get one larger system or maybe a couple of smaller ones that are more task specific.

Some general guidelines:
Try to evaluate the code you are planning on using before you make a big investment in hardware.
For Multi-GPU I recommend 2-4 CPU cores for each GPU and twice the amount of system memory as your total GPU memory
Keep GPU's on X16 PCIe if possible (and PLX chips are like network switches for PCIe they work really well )
Keep in mind that some code may only be partially GPU accelerated and you may need a balance with CPU power. NAMD is a great example of this.

I hardly ever recommend dual or quad socket systems anymore. Not that they are not good, it's just that you can get a great single socket system for less then half the cost that will likely be all the compute capability you need. (and it avoids problems with memory contention across CPU mem spaces)

This was a long reply but I hope it helps you and others. I should add that our sales consultants here at Puget are really good and if they are unsure about some specific scientific workload questions they come to me and we can usually work up a good recommendation.

Best wishes! --Don

Posted on 2019-01-31 17:27:44
Donald Kinghorn

I wanted to add one more comment ... since you asked ... I don't have any firm insight or timelines but I do expect to see platform, i.e. chipset, updates by the end of the year that include PCIe v4. That should be backward compatible initially but sometime in 2020 I expect significant overall platform updates; boards, CPU's GPU's etc. That should provide a nice bump up in compute performance ... In the mean time get what you can afford and get some work done! Things are still getting better for compute :-)

Posted on 2019-01-31 17:37:10
Homeo Morphism

Thank you for the reply. Much food for thought... We are not in the US though -- considering getting a rig from you is not an option, unfortunately.

Posted on 2019-02-03 14:51:50
Alexandre Gandini

Will numpy code from Anaconda python distribution automatically take advantage of AVX 512, or is necessary to tweak the code somehow?

I have some highly intense numpy/pandas financial calculations and if it's the case I will be glad to upgrade my machine to a 9800X processor.

Don't know where to buy it for MSRP, though.

Thanks a lot, congrats for your awesome work comparing hardware for numerical calculations.

Posted on 2019-01-30 01:46:22
Donald Kinghorn

Thanks! Much to their credit, Anaconda Python includes Intel's MKL by default. If you do conda list you will see several MKL packages

The new X-series CPU's are really good (as were the old ones) They go all the way up to an 18-core. Any of them are great value for compute on CPU. You will need a platform that supports it i.e X299 chipset these are socket LGA 2066 processors.

Posted on 2019-01-31 16:49:54
Alexandre Gandini

Thanks for the reply, you convinced me, X-series CPU and X299 mobo is the way to go.

Posted on 2019-01-31 17:01:17