Kabylake vs Skylake for compute on Linux — Linpack on Ubuntu 1610

Intel core i7 7700K Kabylake … how about some giga flops! Being a science/computer/numerical-computing nerd the first thing I want to know about new CPU hardware is “how does it do on the Linpack benchmark?”. There is a reason. When I was doing my doctoral thesis work the computing hardware I had available had floating point performance measured in mega flops. Yes, millions of floating point operations per second, NOT billions or trillions or, soon to be, millions of billions of operations per second. So, yes, Linpack GFLOPS is a number that warms my heart. You can argue about how meaningful of a performance metric it is but they are still (for now) ranking the top500 supercomputers by their Linpack performance and for me it’s the first thing I want to know about a new CPU.

Intel Kabylake is the successor to core i7 Skylake. These processors are not necessarily intended for compute intensive workloads. That is usually the realm of the Xeon family of processors. However, the Skylake 6700K is a great processor and for it’s intended use as a standard 4-core desktop processor it is remarkably good. Kabylake core i7 is a “tuned” update to Skylake. The main difference seems to be slightly higher core clock frequencies.

Kabylake core i7 7700K
Base clock: 4.2GHz
All-core-turbo: 4.4GHz (that’s what I saw when I ran the benchmark)
Max turbo: 4.5GHz
Skylake core i7 6700K
Base clock: 4.0GHz
All-core-turbo: 4.0GHz
Max turbo: 4.2GHz

Are there any differences other than clock frequencies?

Yes, I’m sure there are differences but, if there is, they are not obvious! I don’t have detailed architecture information…

Here’s the CPUID flags for the Kabylake 7700K

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp 
lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 
sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer 
aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow 
vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 
erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves 
dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

… the CPUID flags for Skylake 6700K are

 EXACTLY THE SAME AS KABYLAKE!

Linpack benchmark on Intel core i7 7700K Kabylake

  • OS: Ubuntu 1610
  • Intel MKL version: 2017.1.132

I was using an ASUS Z270 motherboard. There were no problems installing Ubuntu on this new platform.

Best result with problem size of 85000 (88% of the 64GB system memory) < br/>
253 GFLOP/s < br/>
At problem size 40000
242 GFLOP/s

Comparison of Kabylake 7700K, Skylake 6700K and Haswell 4790K at problem size 40000

CPU All-Core-Turbo clock Linpack GFLOP/s
7700K 4.4GHz 242
6700K 4.0GHz 255
4790K 4.2GHz 234

So it looks like the Skylake 6700K out-performs Kabylake 7700K running the Intel optimized Linpack benchmark even though the 7700K is clocked 10% faster.

OK, this is just the Linpack benchmark! Things may look better for Kabylake after Intel releases a new MKL (There was a big difference for Skylake after MKL 11.3 came out). Also, yes, the higher clock on Kabylake does make a difference for most real world application and it is in general 5-10% faster for most tasks.

Here’s some of the output …

Intel(R) Core(TM) i7-7700K CPU

Number of CPUs: 1
Number of cores: 4
Number of threads: 4

Maximum memory requested that can be used=16200901024, at the size=45000

=================== Timing linear equation system solver ===================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check

30000  30000  1      75.697     237.8152 8.725493e-10 3.439598e-02   pass
35000  35000  1      118.446    241.3400 1.161127e-09 3.370575e-02   pass
40000  40000  1      176.087    242.3227 1.573162e-09 3.498767e-02   pass

Intel(R) Core(TM) i7-6700K CPU

Number of CPUs: 1
Number of cores: 4
Number of threads: 4

Maximum memory requested that can be used=16200901024, at the size=45000

=================== Timing linear equation system solver ===================

Size     LDA        Align.   Time(s)        GFlops    Residual           Residual(norm) Check
30000  30000  1          70.881         253.9736 6.426480e-10 2.533325e-02   pass
35000  35000  1          111.866        255.5348 7.896800e-10 2.292321e-02   pass
40000  40000  1          167.556        254.6599 1.071610e-09 2.383298e-02   pass

Intel(R) Core(TM) i7-4790K CPU

Number of CPUs: 1
Number of cores: 4
Number of threads: 4


Maximum memory requested that can be used=16200901024, at the size=45000


=================== Timing linear equation system solver ===================


Size   LDA        Align. Time(s)        GFlops   Residual         Residual(norm) Check
30000  30000  1          78.790         228.4792 6.018069e-10 2.372329e-02   pass
35000  35000  1          126.072        226.7424 8.098306e-10 2.350815e-02   pass
40000  40000  1          182.586        233.6979 1.081908e-09 2.406201e-02   pass


Happy computing! –dbk