Intel Core-i9 7900X and 7980XE Skylake-X Linux Linpack PerformanceWritten on October 10, 2017 by Dr Donald Kinghorn
Intel Core-i9 7900X and 7980XE are very good desktop processors for mathematical computing workloads. This post is a short listing of results for the Linpack benchmark which is still my personal favorite CPU performance metric.
These Skylake-X Core i9 "desktop" processors benefit form having Intel's latest vectorization hardware, AVX512. AVX512 doubles the vector width from 256 bit to 512 bit over AVX2. This feature is one of the most significant technologies in Intel's new high-end "scalable processor" architecture Xeon processors a.k.a. Purley, a.k.a. Skylake-SP. It is nice to see that this technology is included in these "desktop" CPU's. There is also a Xeon Skylake-W processor that is very similar to Skylake-X "desktop" processors.
The systems I tested were based on the Puget Systems "Peak Mini" compact HPC workstations. The relevant specs are;
Intel Core i9 7900X 3.3GHz Ten Core
Base clock 3.3GHz
Max Turbo 4.5GHz
Intel Core i9 7980XE 2.6GHz Eighteen Core
Base clock 2.6GHz
Max Turbo 4.4GHz
EVGA X299 Micro mATX Motherboard
64 GB DDR4-2400 Memory
[I also had 2 NVIDIA Titan Xp GPU's in the 7980XE system but that's another story :-) ]
Ubuntu 16.04 kernels 4.4.0, 4.11.0 and 4.13 ( Performance was identical on all kernels, but see note below )
Intel MKL 2018 (Math Kernel Library)
Intel optimized Linpack Benchmark
Note: The CPU frequency reported in /proc/cpu info was always either the base clock or 1.2GHz low power state. I tested different kernels but never saw any difference. I did install linux-tools and used "cpupower" which did report correct CPU core frequencies extracted from "Intel P-state" information. The performance numbers presented here are consistent with listed frequencies for single core and many core "Turbo" frequencies. I have information listed at the end of this post that shows this.
Linpack Benchmark Results
Now for what you came here to see! I was unable to find any reports of these Linpack benchmark numbers from a google search. The performance is very good! I'm not really doing a comparative discussion in this post but will included one number from a dual Xeon 2690v4 Broadwell system that I recently ran the same benchmark on so you can see how well these "desktop" Core i9 processors do under heavy mathematical compute load.
GFLOPS = "Billions of Floating Point Operations per Second"
Intel Core i9 7900X -- 638.9 GFLOPS
Intel Core i9 7980XE -- 977.0 GFLOPS
Intel Dual Xeon 2690v4 -- 1123 GFLOPS
Note: The dual Xeon 2690v4 system has 28 "real" cores and had 512GB memory. The Linpack number for that systems was at a very large problem size of 100,0000 equations using nearly all of that 512GB memory! More typical is around 980 GFLOPS.
These Core i9 processors were remarkably good on this standard HPC benchmark. They were both near 100 GFLOPS with a single core thanks to the high CPU clock boost from "Max-Turbo". The CPU clocks on Intel processors decrease from the Max-Turbo frequency down to the All-Core-Turbo frequency with increasing power draw i.e. number of loaded cores. In the table below you can see that the 18-core 7980XE did exceptionally well at 8 and 10 cores. That's because it is still operating near it's max frequency at that point compared to the 7900X which is at it's all-core frequency by then.
Intel Core i9 7900X and 7980XE Linpack GFLOPS
|CPU Cores||i9 7900X GFLOPS||i9 7980XE GFLOPS|
The following plot gives a visual representation of the Linpack performance of these outstanding processors.
Incorrectly reported Linux ACPI CPU frequency states
CPU frequency vales in /proc/cpuinfo and those reported by cpufreq-info were incorrect as seen in the output from cpufreq-info below from the Core i9 7980XE
Using "cpupower" which access the intel_pstate does seem to report correct frequencies but they are very dynamic under load.
Following is output from running the Linpack benchmark with OMP_NUM_THREADS=1. This is consistent with the listed Max Turbo frequency
Below is output from the benchmark run with OMP_NUM_THREADS=18. This is consistant with the All-Core_turbo frequencies.
Happy computing! --dbk