Compute Performance: Ivy Bridge-E vs. Haswell

I was doing some GPU compute testing the other day and I happened to be using a nice setup with a new Ivy Bridge-E Core i7-4960X (Extreme edition) processor and decided that I would take a break and see what the CPU would do with my favorite benchmark, Linpack. This is a pretty nice 6-core desktop processor so I thought it would be interesting to see what it would do against my humble Haswell Core i7 4770.

…hint for the impatient, Haswell wins!

Keep in mind this is just a quick comparison running the Linpack benchmark from the Intel MKL library. I was not intending to do any kind of thorough testing. The system configurations just happened to be what I had had set up at the time.

System configurations — briefly

The processor is really the only relevant component since the jobs I ran fit in 16GB.

 
  Ivy Bridge-E Haswell
Motherborad ASUS Rampage iv Gene X79 ASUS Gryphon Z87
CPU Intel Core i7-4960X Intel Core i7-4770
Memory 16GB DDR3 1600 32GB DDR3 1600
OS Fedora 19 Fedora 20

There are lots of differences between these two processors and some may be more important than others depending on what you are trying to do. For example the current Haswell has 16 PCIe lanes and the Ivy Bridge-E has 40!

Notable Differences in CPU Archetecture

 
  Ivy Bridge-E Haswell
Name 🙂 Intel Core i7-4960X Intel Core i7-4770
# of Cores 6 4
Clock Speed 3.6GHz 3.4GHz
Max Turbo Frequency 4GHz 3.9GHz
Cache 15MB 8MB
Instruction Set Extensions SSE4.2, AVX SSE4.2, AVX 2.0 (FMA3)

Surprising(?) Result

The Haswell processor is just wonderful for numerical linear-algebra type of compute tasks!

 
  Ivy Bridge-E Haswell
# of Real Cores 6 4
# of Threads (*) 12 6
Approx. Price $1059.00 $312.00
Linpack Performance at size=35000 155.3 GFLOPS 182.8 GFLOPS

(*) I left Hyper-Threading on during these tests but it contributes nothing to this benchmark. It actually slows things down a bit, but you knew that …

It's interesting to me to see how much difference AVX2 makes with the Haswell. However, don't read too much into this result. The i7-4960X is a great processor and it has some nice features that Haswell lacks at this point. … but, for my favorite benchmark Haswell rocks!

I've included the terminal output for the job runs. It's kind of interesting to see how the processors do with smaller jobs etc..

Intel MKL Linpack Benchmark Core i7-4960X Output

Intel(R) Optimized LINPACK Benchmark data

Current date/time: Sat Feb 22 19:24:53 2014

CPU frequency:    3.601 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 12

Parameters are set to:

Number of tests: 15
Number of equations to solve (problem size) : 1000  2000  5000  10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array                  : 1000  2000  5008  10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run                     : 4     2     2     2     2     2     2     2     2     2     1     1     1     1     1    
Data alignment value (in Kbytes)            : 4     4     4     4     4     4     4     4     4     4     4     1     1     1     1    

Maximum memory requested that can be used=16200901024, at the size=45000

=================== Timing linear equation system solver ===================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
1000   1000   4      0.011      60.5545  1.031675e-12 3.518276e-02   pass
1000   1000   4      0.011      62.9133  1.031675e-12 3.518276e-02   pass
1000   1000   4      0.010      64.6291  1.031675e-12 3.518276e-02   pass
1000   1000   4      0.010      64.1308  1.031675e-12 3.518276e-02   pass
2000   2000   4      0.073      73.3250  4.382272e-12 3.812040e-02   pass
2000   2000   4      0.073      73.6438  4.382272e-12 3.812040e-02   pass
5000   5008   4      0.741      112.5025 2.581643e-11 3.599893e-02   pass
5000   5008   4      0.742      112.3135 2.581643e-11 3.599893e-02   pass
10000  10000  4      5.021      132.8041 8.700884e-11 3.068020e-02   pass
10000  10000  4      5.011      133.0682 8.700884e-11 3.068020e-02   pass
15000  15000  4      15.177     148.2760 2.225641e-10 3.505422e-02   pass
15000  15000  4      15.232     147.7491 2.225641e-10 3.505422e-02   pass
18000  18008  4      25.853     150.4133 2.894987e-10 3.170367e-02   pass
18000  18008  4      25.854     150.4091 2.894987e-10 3.170367e-02   pass
20000  20016  4      35.233     151.3944 4.097986e-10 3.627616e-02   pass
20000  20016  4      35.217     151.4647 4.097986e-10 3.627616e-02   pass
22000  22008  4      46.440     152.8784 4.548092e-10 3.331299e-02   pass
22000  22008  4      46.429     152.9131 4.548092e-10 3.331299e-02   pass
25000  25000  4      67.938     153.3455 6.089565e-10 3.462917e-02   pass
25000  25000  4      67.910     153.4068 6.089565e-10 3.462917e-02   pass
26000  26000  4      76.098     153.9946 6.669421e-10 3.506981e-02   pass
26000  26000  4      76.081     154.0296 6.669421e-10 3.506981e-02   pass
27000  27000  4      84.994     154.4037 6.672171e-10 3.253690e-02   pass
30000  30000  1      116.300    154.7882 8.421348e-10 3.319704e-02   pass
35000  35000  1      184.086    155.2852 1.085509e-09 3.151068e-02   pass
40000  40000  1      277.433    153.8026 1.466774e-09 3.262155e-02   pass
45000  45000  1      424.129    143.2444 1.711494e-09 3.011194e-02   pass

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
1000   1000   4       63.0569  64.6291 
2000   2000   4       73.4844  73.6438 
5000   5008   4       112.4080 112.5025
10000  10000  4       132.9362 133.0682
15000  15000  4       148.0125 148.2760
18000  18008  4       150.4112 150.4133
20000  20016  4       151.4295 151.4647
22000  22008  4       152.8958 152.9131
25000  25000  4       153.3761 153.4068
26000  26000  4       154.0121 154.0296
27000  27000  4       154.4037 154.4037
30000  30000  1       154.7882 154.7882
35000  35000  1       155.2852 155.2852
40000  40000  1       153.8026 153.8026
45000  45000  1       143.2444 143.2444

Residual checks PASSED

End of tests

Done: Sat Feb 22 20:02:53 PST 2014

Intel MKL Linpack Benchmark Core i7-4770 Output

Intel(R) Optimized LINPACK Benchmark data

Current date/time: Sat Feb 22 19:26:26 2014

CPU frequency:    3.897 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 8

Parameters are set to:

Number of tests: 15
Number of equations to solve (problem size) : 1000  2000  5000  10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array                  : 1000  2000  5008  10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run                     : 4     2     2     2     2     2     2     2     2     2     1     1     1     1     1    
Data alignment value (in Kbytes)            : 4     4     4     4     4     4     4     4     4     4     4     1     1     1     1    

Maximum memory requested that can be used=16200901024, at the size=45000

=================== Timing linear equation system solver ===================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
1000   1000   4      0.011      60.9602  1.194739e-12 4.074366e-02   pass
1000   1000   4      0.009      73.9902  1.194739e-12 4.074366e-02   pass
1000   1000   4      0.009      77.3679  1.194739e-12 4.074366e-02   pass
1000   1000   4      0.009      77.1606  1.194739e-12 4.074366e-02   pass
2000   2000   4      0.063      84.1916  4.536926e-12 3.946570e-02   pass
2000   2000   4      0.063      84.4081  4.536926e-12 3.946570e-02   pass
5000   5008   4      0.694      120.1819 2.471656e-11 3.446525e-02   pass
5000   5008   4      0.655      127.3289 2.471656e-11 3.446525e-02   pass
10000  10000  4      4.021      165.8472 9.436774e-11 3.327502e-02   pass
10000  10000  4      4.071      163.8038 9.436774e-11 3.327502e-02   pass
15000  15000  4      12.967     173.5532 2.169435e-10 3.416896e-02   pass
15000  15000  4      12.963     173.6057 2.169435e-10 3.416896e-02   pass
18000  18008  4      21.790     178.4627 2.645608e-10 2.897266e-02   pass
18000  18008  4      21.800     178.3777 2.645608e-10 2.897266e-02   pass
20000  20016  4      29.777     179.1361 3.504283e-10 3.102058e-02   pass
20000  20016  4      29.840     178.7565 3.504283e-10 3.102058e-02   pass
22000  22008  4      39.711     178.7811 4.267059e-10 3.125453e-02   pass
22000  22008  4      39.694     178.8594 4.267059e-10 3.125453e-02   pass
25000  25000  4      58.599     177.7833 5.194889e-10 2.954147e-02   pass
25000  25000  4      58.609     177.7516 5.194889e-10 2.954147e-02   pass
26000  26000  4      65.728     178.2908 6.593495e-10 3.467057e-02   pass
26000  26000  4      65.860     177.9332 6.593495e-10 3.467057e-02   pass
27000  27000  4      72.205     181.7522 6.135402e-10 2.991934e-02   pass
30000  30000  1      98.935     181.9566 7.133177e-10 2.811906e-02   pass
35000  35000  1      156.340    182.8432 1.085449e-09 3.150893e-02   pass
40000  40000  1      234.554    181.9194 1.338275e-09 2.976370e-02   pass
45000  45000  1      338.077    179.7047 1.782676e-09 3.136431e-02   pass

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
1000   1000   4       72.3697  77.3679 
2000   2000   4       84.2999  84.4081 
5000   5008   4       123.7554 127.3289
10000  10000  4       164.8255 165.8472
15000  15000  4       173.5795 173.6057
18000  18008  4       178.4202 178.4627
20000  20016  4       178.9463 179.1361
22000  22008  4       178.8202 178.8594
25000  25000  4       177.7674 177.7833
26000  26000  4       178.1120 178.2908
27000  27000  4       181.7522 181.7522
30000  30000  1       181.9566 181.9566
35000  35000  1       182.8432 182.8432
40000  40000  1       181.9194 181.9194
45000  45000  1       179.7047 179.7047

Residual checks PASSED

End of tests

Done: Sat Feb 22 19:59:03 PST 2014

Happy Computing! –dbk