Kabylake vs Skylake for compute on Linux -- Linpack on Ubuntu 1610Written on January 3, 2017 by Dr Donald Kinghorn
Intel core i7 7700K Kabylake ... how about some giga flops! Being a science/computer/numerical-computing nerd the first thing I want to know about new CPU hardware is "how does it do on the Linpack benchmark?". There is a reason. When I was doing my doctoral thesis work the computing hardware I had available had floating point performance measured in mega flops. Yes, millions of floating point operations per second, NOT billions or trillions or, soon to be, millions of billions of operations per second. So, yes, Linpack GFLOPS is a number that warms my heart. You can argue about how meaningful of a performance metric it is but they are still (for now) ranking the top500 supercomputers by their Linpack performance and for me it's the first thing I want to know about a new CPU.
Intel Kabylake is the successor to core i7 Skylake. These processors are not necessarily intended for compute intensive workloads. That is usually the realm of the Xeon family of processors. However, the Skylake 6700K is a great processor and for it's intended use as a standard 4-core desktop processor it is remarkably good. Kabylake core i7 is a "tuned" update to Skylake. The main difference seems to be slightly higher core clock frequencies.
- Kabylake core i7 7700K
- Base clock: 4.2GHz
- All-core-turbo: 4.4GHz (that's what I saw when I ran the benchmark)
- Max turbo: 4.5GHz
- Skylake core i7 6700K
- Base clock: 4.0GHz
- All-core-turbo: 4.0GHz
- Max turbo: 4.2GHz
Are there any differences other than clock frequencies?
Yes, I'm sure there are differences but, if there is, they are not obvious! I don't have detailed architecture information...
Here's the CPUID flags for the Kabylake 7700K
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
... the CPUID flags for Skylake 6700K are
EXACTLY THE SAME AS KABYLAKE!
Linpack benchmark on Intel core i7 7700K Kabylake
- OS: Ubuntu 1610
- Intel MKL version: 2017.1.132
I was using an ASUS Z270 motherboard. There were no problems installing Ubuntu on this new platform.
Best result with problem size of 85000 (88% of the 64GB system memory) < br/> 253 GFLOP/s < br/> At problem size 40000 242 GFLOP/s
Comparison of Kabylake 7700K, Skylake 6700K and Haswell 4790K at problem size 40000
|CPU||All-Core-Turbo clock||Linpack GFLOP/s|
So it looks like the Skylake 6700K out-performs Kabylake 7700K running the Intel optimized Linpack benchmark even though the 7700K is clocked 10% faster.
OK, this is just the Linpack benchmark! Things may look better for Kabylake after Intel releases a new MKL (There was a big difference for Skylake after MKL 11.3 came out). Also, yes, the higher clock on Kabylake does make a difference for most real world application and it is in general 5-10% faster for most tasks.
Here's some of the output ...
Intel(R) Core(TM) i7-7700K CPU
Number of CPUs: 1 Number of cores: 4 Number of threads: 4 Maximum memory requested that can be used=16200901024, at the size=45000 =================== Timing linear equation system solver =================== Size LDA Align. Time(s) GFlops Residual Residual(norm) Check 30000 30000 1 75.697 237.8152 8.725493e-10 3.439598e-02 pass 35000 35000 1 118.446 241.3400 1.161127e-09 3.370575e-02 pass 40000 40000 1 176.087 242.3227 1.573162e-09 3.498767e-02 pass
Intel(R) Core(TM) i7-6700K CPU
Number of CPUs: 1 Number of cores: 4 Number of threads: 4 Maximum memory requested that can be used=16200901024, at the size=45000 =================== Timing linear equation system solver =================== Size LDA Align. Time(s) GFlops Residual Residual(norm) Check 30000 30000 1 70.881 253.9736 6.426480e-10 2.533325e-02 pass 35000 35000 1 111.866 255.5348 7.896800e-10 2.292321e-02 pass 40000 40000 1 167.556 254.6599 1.071610e-09 2.383298e-02 pass
Intel(R) Core(TM) i7-4790K CPU
Number of CPUs: 1 Number of cores: 4 Number of threads: 4 Maximum memory requested that can be used=16200901024, at the size=45000 =================== Timing linear equation system solver =================== Size LDA Align. Time(s) GFlops Residual Residual(norm) Check 30000 30000 1 78.790 228.4792 6.018069e-10 2.372329e-02 pass 35000 35000 1 126.072 226.7424 8.098306e-10 2.350815e-02 pass 40000 40000 1 182.586 233.6979 1.081908e-09 2.406201e-02 pass
Happy computing! --dbk