Puget Systems print logo

https://www.pugetsystems.com

Read this article at https://www.pugetsystems.com/guides/1059
Dr Donald Kinghorn (Scientific Computing Advisor )

Intel Core-i9 7900X and 7980XE Skylake-X Linux Linpack Performance

Written on October 10, 2017 by Dr Donald Kinghorn
Share:

Intel Core-i9 7900X and 7980XE are very good desktop processors for mathematical computing workloads. This post is a short listing of results for the Linpack benchmark which is still my personal favorite CPU performance metric.

These Skylake-X Core i9 "desktop" processors benefit form having Intel's latest vectorization hardware, AVX512. AVX512 doubles the vector width from 256 bit to 512 bit over AVX2. This feature is one of the most significant technologies in Intel's new high-end "scalable processor" architecture Xeon processors a.k.a. Purley, a.k.a. Skylake-SP. It is nice to see that this technology is included in these "desktop" CPU's. There is also a Xeon Skylake-W processor that is very similar to Skylake-X "desktop" processors.

The systems I tested were based on the Puget Systems "Peak Mini" compact HPC workstations. The relevant specs are;

Hardware:

  • Intel Core i9 7900X 3.3GHz Ten Core

    • Base clock 3.3GHz

    • All-Core-Turbo 4.0GHZ

    • Max Turbo 4.5GHz

  • Intel Core i9 7980XE 2.6GHz Eighteen Core

    • Base clock 2.6GHz

    • All-Core-Turbo 3.4GHz

    • Max Turbo 4.4GHz

  • EVGA X299 Micro mATX Motherboard

  • 64 GB DDR4-2400 Memory

[I also had 2 NVIDIA Titan Xp GPU's in the 7980XE system but that's another story :-) ]

Software:

  • Ubuntu 16.04 kernels 4.4.0, 4.11.0 and 4.13 ( Performance was identical on all kernels, but see note below )

  • Intel MKL 2018 (Math Kernel Library)

  • Intel optimized Linpack Benchmark

Note: The CPU frequency reported in /proc/cpu info was always either the base clock or 1.2GHz low power state. I tested different kernels but never saw any difference. I did install linux-tools and used "cpupower" which did report correct CPU core frequencies extracted from "Intel P-state" information. The performance numbers presented here are consistent with listed frequencies for single core and many core "Turbo" frequencies. I have information listed at the end of this post that shows this.


Linpack Benchmark Results

Now for what you came here to see! I was unable to find any reports of these Linpack benchmark numbers from a google search. The performance is very good! I'm not really doing a comparative discussion in this post but will included one number from a dual Xeon 2690v4 Broadwell system that I recently ran the same benchmark on so you can see how well these "desktop" Core i9 processors do under heavy mathematical compute load.
GFLOPS = "Billions of Floating Point Operations per Second"

  • Intel Core i9 7900X -- 638.9 GFLOPS

  • Intel Core i9 7980XE -- 977.0 GFLOPS


  • Intel Dual Xeon 2690v4 -- 1123 GFLOPS

Note: The dual Xeon 2690v4 system has 28 "real" cores and had 512GB memory. The Linpack number for that systems was at a very large problem size of 100,0000 equations using nearly all of that 512GB memory! More typical is around 980 GFLOPS.


These Core i9 processors were remarkably good on this standard HPC benchmark. They were both near 100 GFLOPS with a single core thanks to the high CPU clock boost from "Max-Turbo". The CPU clocks on Intel processors decrease from the Max-Turbo frequency down to the All-Core-Turbo frequency with increasing power draw i.e. number of loaded cores. In the table below you can see that the 18-core 7980XE did exceptionally well at 8 and 10 cores. That's because it is still operating near it's max frequency at that point compared to the 7900X which is at it's all-core frequency by then.


Intel Core i9 7900X and 7980XE Linpack GFLOPS

CPU Cores i9 7900X GFLOPS i9 7980XE GFLOPS
18 977.0
16 954.5
10 638.9 773.9
8 576.0 648.3
4 341.8 347.6
2 191.0 185.7
1 97.5 95.2


The following plot gives a visual representation of the Linpack performance of these outstanding processors.

CPU Linpack GFLOPS


Incorrectly reported Linux ACPI CPU frequency states

CPU frequency vales in /proc/cpuinfo and those reported by cpufreq-info were incorrect as seen in the output from cpufreq-info below from the Core i9 7980XE

analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 10.0 us.
  hardware limits: 1.20 GHz - 2.60 GHz
  available frequency steps: 2.60 GHz, 2.60 GHz, 2.50 GHz, 2.40 GHz, 2.30 GHz, 2.20 GHz, 2.10 GHz, 2.00 GHz, 1.90 GHz, 1.80 GHz, 1.70 GHz, 1.60 GHz, 1.50 GHz, 1.40 GHz, 1.30 GHz, 1.20 GHz

Using "cpupower" which access the intel_pstate does seem to report correct frequencies but they are very dynamic under load.

Following is output from running the Linpack benchmark with OMP_NUM_THREADS=1. This is consistent with the listed Max Turbo frequency

kinghorn@mini:~$ sudo cpupower monitor -m Mperf | sort -k2 -r
CPU | C0   | Cx   | Freq 
   0| 99.77|  0.23|  4357

Below is output from the benchmark run with OMP_NUM_THREADS=18. This is consistant with the All-Core_turbo frequencies.

kinghorn@mini:~$ sudo cpupower monitor -m Mperf | sort -k2 -r
CPU | C0   | Cx   | Freq 
   5| 99.99|  0.01|  3355
   8| 99.99|  0.01|  3354
  11| 99.99|  0.01|  3349
  10| 99.99|  0.01|  3349
   1| 99.99|  0.01|  3348
   3| 99.99|  0.01|  3342
   2| 99.99|  0.01|  3341
   7| 99.99|  0.01|  3308
   6| 99.99|  0.01|  3308
  12| 99.98|  0.02|  3363
   9| 99.98|  0.02|  3354
  15| 99.98|  0.02|  3353
  14| 99.98|  0.02|  3352
   0| 99.98|  0.02|  3347
  16| 99.97|  0.03|  3355
  17| 99.83|  0.17|  3355
  13| 99.76|  0.24|  3363
   4| 99.72|  0.28|  3354

Happy computing! --dbk

Tags: Intel Core i9, 7900X, 7980XE, Skylake-X, Linpack, Linux, HPC
Brad Jascob

I'm curious under what conditions you tested your systems. My 7940x is giving as high as 1035 GFlops using the pre-compiled l_mklb_p_2018.0.006 from intel (running their runme_xeon64 script which tests multiple sized arrays) under Linux 1710. LDAs over 10,000 are giving between 900 and 1035 GFlops. Intel advertised that the 7980xe was their first TFlops class consumer processor but I was surprised to see my 14 core hitting these numbers.
BTW, on my system, looking at /sys/devices/system/cpu/cpuXX/cpufreq/scaling_cur_freq, shows 3.8GHz through most of the tests, though it did sometimes drop back to the base clock of 3.1GHz for a few seconds. The 3.8GHz is it's max 14-core turbo, whereas the 7980xe has a 3.4GHz max turbo when all 18-cores are running. I'm sort of curious if the 18-core is actually not performing any better than the 14-core because it's generally running at a lower clock rate.

Posted on 2017-11-01 03:04:48
Kivanc KARANIS

I don`t want to be annoying (especially after an embarrasing intro) but I`d like to share my experience on 7980XE .

This article is based on "Linux Linpack Performance" and what is said in here is perfectly true, but benchmarks are a bit disturbing.
I love benchmarks, because it brings you the common platform to discuss things about.
But I hate benchmarks, because it`s like solving a physics problem in a frictionless environment. It is never real.

First difference between i9-780XE and Xeon is scalability. You can not scale dual or quad i9 but you can scale Xeons.
Second difference is, overclocking.
If your workload can be handled without scaling, you don`t need Xeon. This is the truth, if you can handle every overclocking steps with care.

Currently we are using WRF to forecast regional weather on CPU (without GPU) and my mission is to beat a target Xeon based system with a desktop.
WRF does many double precision floating point math with fortran code, and that goes for AVX512 but, it has many more packages to execute in pre-execution and post-execution and not all of them are AVX512 triggering. (I`ll come to that)

Hardware I`ve built is an asus rampage extreme VI (a gamer thing), i9-7980XE, 128GB 3200Mhz dominator platinum, corsair hx1200i, closed loop hx150i, 2x 4mm noctua fans on VRM and misc fans in the case for proper airflow.
The reason I`ve chosen a flag-ship gamer board is the ease of overclocking, since I`d push everything to limits.

Just a small note, if you are willing to push the limits, you have to use a custom compiled kernel. Otherwise, you may not be able to even get right readings of cpu frequencies from /proc or cpufreq-info. Furthermore, compiler flags are your friends to take out the most from your source code so you have to focus on them a lot. Using PGI in my case was needed because, Fortran code is very nice compiled there. (nvidia worked hard since cuda 1.0)

The important thing on my board was the ability to scale down clock multiplier for AVX512 and that was everything that matters.
I could achieve standard 4.6GHz and AVX512 to 4.3GHz stable in standart workload conditions and accomplish WRF runs, faster than my target.
I could not finish a Linpack run.
I knew if I did more tuning on my tuning, I could achieve that also but I did not built this system for Linpack performance.

When it comes to overclocking, your focus is to manage the speed and amount amount of increased voltage sent to components, securing temperatures.
This is why I do not like benchmarks. Benchmarks stress everything always but in real life, you generally do not do that, CPU has a time to breathe writing to disk, parsing input etc.

As a conclusion,
i9-7980XE is a bit far away to talk about benchmarks. Since it is an overclocking cpu and limited to 1 processor, you have to buy one, optimize your kernel and compiler flags and keep temps low. It will crash, and you`ll tune it again. BUT you will catch it and it works rock solid.
Do not confuse your mind comparing benchmark results because they will not reflect your actual usage for this CPU because it behaves very divverent when going into AVX512 or not.

I couldn`t get into delidding on this first cpu yet but it is worth doing, after making this one working stable, next one will get delidding.

Xeon`s, ah yes, I love them. But if your workload does not need scaling, you do actually beat them with 7980XE with proper configuration.

Posted on 2018-05-07 20:58:29