Intel Core-i9 7900X and 7980XE Skylake-X Linux Linpack Performance


Intel Core-i9 7900X and 7980XE are very good desktop processors for mathematical computing workloads. This post is a short listing of results for the Linpack benchmark which is still my personal favorite CPU performance metric.

These Skylake-X Core i9 “desktop” processors benefit form having Intel’s latest vectorization hardware, AVX512. AVX512 doubles the vector width from 256 bit to 512 bit over AVX2. This feature is one of the most significant technologies in Intel’s new high-end “scalable processor” architecture Xeon processors a.k.a. Purley, a.k.a. Skylake-SP. It is nice to see that this technology is included in these “desktop” CPU’s. There is also a Xeon Skylake-W processor that is very similar to Skylake-X “desktop” processors.

The systems I tested were based on the Puget Systems “Peak Mini” compact HPC workstations. The relevant specs are;

Hardware:

  • Intel Core i9 7900X 3.3GHz Ten Core

    • Base clock 3.3GHz

    • All-Core-Turbo 4.0GHZ

    • Max Turbo 4.5GHz

  • Intel Core i9 7980XE 2.6GHz Eighteen Core

    • Base clock 2.6GHz

    • All-Core-Turbo 3.4GHz

    • Max Turbo 4.4GHz

  • EVGA X299 Micro mATX Motherboard

  • 64 GB DDR4-2400 Memory

[I also had 2 NVIDIA Titan Xp GPU’s in the 7980XE system but that’s another story 🙂 ]

Software:

  • Ubuntu 16.04 kernels 4.4.0, 4.11.0 and 4.13 ( Performance was identical on all kernels, but see note below )

  • Intel MKL 2018 (Math Kernel Library)

  • Intel optimized Linpack Benchmark

Note: The CPU frequency reported in /proc/cpu info was always either the base clock or 1.2GHz low power state. I tested different kernels but never saw any difference. I did install linux-tools and used “cpupower” which did report correct CPU core frequencies extracted from “Intel P-state” information. The performance numbers presented here are consistent with listed frequencies for single core and many core “Turbo” frequencies. I have information listed at the end of this post that shows this.


Linpack Benchmark Results

Now for what you came here to see! I was unable to find any reports of these Linpack benchmark numbers from a google search. The performance is very good! I’m not really doing a comparative discussion in this post but will included one number from a dual Xeon 2690v4 Broadwell system that I recently ran the same benchmark on so you can see how well these “desktop” Core i9 processors do under heavy mathematical compute load.
GFLOPS = “Billions of Floating Point Operations per Second”

  • Intel Core i9 7900X — 638.9 GFLOPS

  • Intel Core i9 7980XE — 977.0 GFLOPS


  • Intel Dual Xeon 2690v4 — 1123 GFLOPS

Note: The dual Xeon 2690v4 system has 28 “real” cores and had 512GB memory. The Linpack number for that systems was at a very large problem size of 100,0000 equations using nearly all of that 512GB memory! More typical is around 980 GFLOPS.


These Core i9 processors were remarkably good on this standard HPC benchmark. They were both near 100 GFLOPS with a single core thanks to the high CPU clock boost from “Max-Turbo”. The CPU clocks on Intel processors decrease from the Max-Turbo frequency down to the All-Core-Turbo frequency with increasing power draw i.e. number of loaded cores. In the table below you can see that the 18-core 7980XE did exceptionally well at 8 and 10 cores. That’s because it is still operating near it’s max frequency at that point compared to the 7900X which is at it’s all-core frequency by then.

Intel Core i9 7900X and 7980XE Linpack GFLOPS

CPU Cores i9 7900X GFLOPS i9 7980XE GFLOPS
18 977.0
16 954.5
10 638.9 773.9
8 576.0 648.3
4 341.8 347.6
2 191.0 185.7
1 97.5 95.2

The following plot gives a visual representation of the Linpack performance of these outstanding processors.

CPU Linpack GFLOPS


Incorrectly reported Linux ACPI CPU frequency states

CPU frequency vales in /proc/cpuinfo and those reported by cpufreq-info were incorrect as seen in the output from cpufreq-info below from the Core i9 7980XE

analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 10.0 us.
  hardware limits: 1.20 GHz - 2.60 GHz
  available frequency steps: 2.60 GHz, 2.60 GHz, 2.50 GHz, 2.40 GHz, 2.30 GHz, 2.20 GHz, 2.10 GHz, 2.00 GHz, 1.90 GHz, 1.80 GHz, 1.70 GHz, 1.60 GHz, 1.50 GHz, 1.40 GHz, 1.30 GHz, 1.20 GHz

Using “cpupower” which access the intel_pstate does seem to report correct frequencies but they are very dynamic under load.

Following is output from running the Linpack benchmark with OMP_NUM_THREADS=1. This is consistent with the listed Max Turbo frequency

kinghorn@mini:~$ sudo cpupower monitor -m Mperf | sort -k2 -r
CPU | C0   | Cx   | Freq 
   0| 99.77|  0.23|  4357

Below is output from the benchmark run with OMP_NUM_THREADS=18. This is consistant with the All-Core_turbo frequencies.

kinghorn@mini:~$ sudo cpupower monitor -m Mperf | sort -k2 -r
CPU | C0   | Cx   | Freq 
   5| 99.99|  0.01|  3355
   8| 99.99|  0.01|  3354
  11| 99.99|  0.01|  3349
  10| 99.99|  0.01|  3349
   1| 99.99|  0.01|  3348
   3| 99.99|  0.01|  3342
   2| 99.99|  0.01|  3341
   7| 99.99|  0.01|  3308
   6| 99.99|  0.01|  3308
  12| 99.98|  0.02|  3363
   9| 99.98|  0.02|  3354
  15| 99.98|  0.02|  3353
  14| 99.98|  0.02|  3352
   0| 99.98|  0.02|  3347
  16| 99.97|  0.03|  3355
  17| 99.83|  0.17|  3355
  13| 99.76|  0.24|  3363
   4| 99.72|  0.28|  3354

Happy computing! –dbk