Table of Contents
Introduction
I was able to get a little time in on the Intel Xeon W-3175X and the Core i9 9990XE processors. I ran a couple of numerical compute performance tests with the Intel MKL Linpack benchmark and NAMD. I used the same system image that I had used recently to look at 3 Intel 8-core processors so I will include those results here as well. There will be results for W-3175, 9990XE, 9800X, W-2145, and 9900K.
Intel has been doing some strange things recently. They are in shortage on many of their processors so we are seeing some models that would not normally appear in their line. For example some of the processors with integrated GPU’s that are coming out of the fab with faulty GPU sections, but perfectly good CPU sections, are being being released with the GPU disabled.
There are 2 recent Intel processors that are really strange, the Xeon W-3175X 28-core, and the Core i9 9990XE overclocked 14-core. I don’t know for sure if these processors are the result of fab “problems” or not. The W-3175X could be an overclocked Xeon Scalable 8180 that had problems rendering is useless in a multi-socket system and the 9990XE could be a 9980XE 18-core that had a few bad cores that when disabled allowed for overclocking of the remaining 14-cores.??? That’s pure speculation based only on my wild imagination!
Neither of these processors are actually available other than by an odd auction process in batches to OEM’s. The 9990XE does not have a warranty and I cannot find it on “Intel Ark”. The W-3175X is listed on Intel Ark. To me, the i9 9990XE does not appear to be a real product and I don’t understand why Intel would sell it to anyone without a warranty. They are not supporting it in any tangible way. The W-3175X at least “looks” like it might be a real product, but with no promise of availability or predictable pricing.
The W-3175X requires a special motherboard and cooler and surprisingly both ASUS and Gigabyte have made boards available. They are extremely large boards and they are using the Xeon Scalable C621 chipset.
We did get these two odd-balls in for testing at Puget Systems. My colleagues Matt and William did do a fair amount of testing with popular software programs running on Windows 10. You can find several of their posts listed in the Puget Systems “Articles” section. This one is particularly good for the 9990XE, Intels Core Xperiment i9 9990XE. I very much agree with the sentiment in this post that the 9990XE is just an experiment, it’s not a viable product. Also in practical terms the 3175 is not a viable product either.
Caveats aside, I was of course curious about the raw double precision floating point performance of these monsters so I popped into Puget labs and fired up Ubuntu 18.04 on them and ran the Intel optimized Linpack benchmark from MKL. That is in my opinion the best relative measure of numerical compute performance for Intel processors.
Processor Specs i9 9900K, i7 9800X, Xeon W-2145, i9 9990XE, Xeon W-3175
The following table list some of the specification differences between these processors relevant for consideration in a numerical computing workstation configuration.
i9 9900K, i7 9800X, Xeon 2145W, i9 9990XE, Xeon W-3175 Features
Features | i9-9900K | i7 9800X | Xeon W-2145 | i9 9990XE | Xeon W-3175 |
---|---|---|---|---|---|
Code Name | Coffee Lake | Skylake-X | Skylake-W | Skylake-X | Skylake-W |
Cores | 8 | 8 | 8 | 14 | 28 |
Base Clock | 3.6GHz | 3.8GHz | 3.7GHz | 4.0GHz | 3.1GHz |
Max Turbo | 5.0GHz | 4.5GHz | 4.5GHz | 5.1GHz* | 3.8GHz* |
All Core | 4.7GHz | 4.1GHz | 4.3GHz | 5.0GHz* | 3.7GHz* |
Cache | 16 MB | 16.5 MB | 11 MB | 19.25 MB | 38.5 MB |
TDP | 95 W | 165 W | 140 W | 255W | 255W |
Max Mem | 64 GB | 128 GB | 512 GB (Reg ECC) | 128 GB | 512 GB (Reg ECC) |
Mem Channels | 2 | 4 | 4 | 4 | 6 |
Max PCIe lanes | 16 | 44 | 48 | 44 | 48 |
X16 GPU support | 1 | 2 | 3 (4 w/PLX) | 2 | 3 (4 w/PLX)* |
Vector Unit | AVX2 | AVX512 | AVX512 | AVX512 | AVX512 |
Price | $500 | $600 | $1113 | * | $3000* |
Notes:
Clock Frequencies: I will included some raw frequency monitoring output in an appendix. What I observed when running Linpack was this; For the 9990XE the job started with an initial frequency of 5.0GHz and it stayed there on all cores for the initialization of the job. When the AVX512 went under load the clock for all but 2 cores dropped to 3.1GHz. 2 cores remained near 5.0GHz. 3.1GHz is presumably the AVX clock frequency. For the job run with the W-3175 the initial clock was 4.3GHz then dropping to 3.7GHz all-core for the initialization and then dropping to 2.8GHz when AVX512 started.
PCIe: It is common for Xeon-W systems to support 2 or 3 X16 cards without a PLX switch. The motherboard we used had an X16,X8,X16,X8 layout.
Pricing: There is no official price for the 9990XE (it’s not a product) see Intels Core Xperiment i9 9990XE. The W-3175 is listed as a real product on Intel Ark. It has an MSRP of approx. $3000. It also requires a special (massive!) socket 3647 motherboard which would cost close to $2000 and a really good cooler. There are many details that make a system utilizing the W-3175 processor a non-viable product. I understand the temptation to think that you “want one of those” but really, it looks like it is not supportable as a product.
Hardware under test:
There were 4 platforms used in this testing.
-
Intel Core i9 9900K 3.6GHz 8-Core
- Gigabyte Z390 Designare Motherboard (1 x X16 PCIe)
- 64 GB DDR4-2666 Memory
- 1 TB Intel 660p M.2 SSD
- NVIDIA RTX 2080Ti
-
Intel Core i9 9990XE 5.0GHz 14-Core and Core i7 9800X 3.8GHz 8-Core
- Gigabyte X299 Designare Motherboard (2 x X16 PCIe)
- 128GB DDR4-2666 Memory
- 1 TB Intel 660p M.2 SSD
- NVIDIA RTX 2080Ti
-
Intel Xeon W-2145 3.7GHz 8-Core
- Asus WS C422 SAGE/10G Motherboard (4 x X16 PCIe)
- 256GB DDR4-2666 Reg ECC Memory
- 1 TB Intel 660p M.2 SSD
- NVIDIA RTX 2080Ti
-
Intel Xeon W-3175 3.1GHz 28-Core
- Asus ROG Dominus Extreme
- Asetek 690LX-PN liquid cooler
- 192GB DDR4-2666
- 1 TB Intel 660p M.2 SSD
- NVIDIA RTX 2080Ti
Big thank you to Asus for the ROG Dominus Extreme motherboard and Asetek for the 690LX-PN CPU cooler! Without Asus and Asetek providing samples, we would not have been able to test the Intel Xeon W-3175X.
Software:
I had the OS and applications installed on the Intel 660p M.2 drive and swapped it between the test systems.
- Ubuntu 18.04
- Intel MKL 2019 (update 1) (Math Kernel Library)
- Intel optimized Linpack Benchmark (from MKL)
- NAMD 2.13 (Molecular Dynamics)
I am running Linux for this testing but there is no reason to expect that the same types of workloads on Windows 10 would show any significant difference in performance.
Results
Linpack
An optimized Linpack benchmark can achieve near theoretical peak performance for double precision floating point on a CPU. It is the first benchmark I run on any new CPU’s. It is the benchmark (still) used to rank the Top500 supercomputers in the world. I feel it is the best performance indicator for numerical computation with maximally optimized software. The Intel optimized Linpack makes great use of the excellent MKL library. There are many programs that link to MKL for performance. This includes the very useful “numerical compute scripting” packages Anaconda Python and Mathworks MATLAB.
This is not necessarily a good selection of comparative results but hopefully it does give you idea of the relative performance. These are results utilizing the same test install system image and software versions.
The double precision floating point performance of the W-3175 is very impressive, as expected.
Note: These jobs ran with “real” threads since “Hyperthreads” are not useful for this calculation.
Note: The 8-core results are with a large problems size of 75000 simultaneous equations (a 75000 x 75000 “triangular solve”) and used approximately 44GB of system memory. The 9990XE and W-3275 were tested with a problem size of 110016 using approximately 94GB of system memory. Also, note that the 9900K has a disadvantage on this benchmark since it has the older AVX2 vector unit.
NAMD
I also tested with the Molecular Dynamics package NAMD. NAMD scales really well across multiple cores and it is not specifically optimized for Intel hardware. It is highly optimized code and it uses the very interesting Charm++ for it’s parallel capabilities. NAMD is an important program and I like it for testing since it is a good example of well optimized code that scales to massive numbers of processes and also has very good GPU acceleration that needs to be balanced by good CPU performance.
The AVX512 vector units are not that important for this code since it is designed to run well on a wide variety of hardware. Higher core counts are a big advantage for performance since NAMD has very good parallel scaling.
Note: These jobs ran with “Hyperthreads” since they help with the way NAMD uses threads. It is always worth experiment with Hyperthreads to see if they help or not.
Note: The performance units here are “days per nano-second” of simulation time. Adding a GPU will dramatically increase the performance as will be seen in the next chart.
The first thing to notice is that the performance has increased by over a factor of 10 by including the NVIDIA RTX 2080Ti!
Conclusions and Recommendations
I have to emphasize that the 9990XE and W-3175 processors are not really viable components for supportable products. They are more enthusiast curiosities than workstation components. This is especially true for the 9990XE, it has no support of any kind from Intel I don’t even know what they were thinking. The W-3175 is more interesting but it is still not viable as a product because of the lack of commitment and supply as well as the “extreme” nature of the overall system platform needed to run it. So, don’t even think about it!
On the positive side 2019 should be an interesting year for new hardware. We expect a new architecture design from Intel toward the end of the year (after a hardware security bug-fix refresh). The future platform should be a significant change over what we are using now including new chipsets supporting PCIe v4 among other niceties. We also expect the current supply issues to be resolved. Intel also, has other interesting hardware projects in the works and we may see some results from them for new compute accelerator hardware. And, that’s just Intel … AMD and ARM are looking really interesting too!
Happy computing –dbk
Appendix
9990XE raw data snippets
kinghorn@utest:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 28
On-line CPU(s) list: 0-27
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Core(TM) i9-9990XE CPU @ 4.00GHz
Stepping: 4
CPU MHz: 1200.741
CPU max MHz: 5100.0000
CPU min MHz: 1200.0000
BogoMIPS: 8000.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 19712K
NUMA node0 CPU(s): 0-27
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon
pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl
vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3
cdp_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase
tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx
smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc
cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp
hwp_pkg_req flush_l1d
kinghorn@utest:~/projects/benchmarks/linpack$ ./runme_xeon64
Current date/time: Fri Feb 8 11:35:49 2019
CPU frequency: 4.999 GHz
Number of CPUs: 1
Number of cores: 14
Number of threads: 14
Parameters are set to:
Number of tests: 1
Number of equations to solve (problem size) : 110016
Leading dimension of array : 110016
Number of trials to run : 1
Data alignment value (in Kbytes) : 1
Maximum memory requested that can be used=96830363392, at the size=110016
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
110016 110016 1 910.742 974.7487 9.762934e-09 2.885014e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
110016 110016 1 974.7487 974.7487
Residual checks PASSED
End of tests
Start of job run, (showing active “hyperthreads”)
kinghorn@utest:~$ sudo cpupower monitor -m Mperf | sort -k2 -r
24| 39.76| 60.24| 5009
15| 0.64| 99.36| 5009
14| 0.40| 99.60| 5009
21| 0.13| 99.87| 5008
20| 0.12| 99.88| 5010
19| 0.12| 99.88| 5009
16| 0.12| 99.88| 5006
18| 0.10| 99.90| 5016
25| 0.10| 99.90| 5008
17| 0.07| 99.93| 5005
26| 0.06| 99.94| 5009
27| 0.06| 99.94| 4997
23| 0.05| 99.95| 5017
22| 0.05| 99.95| 5005
|Mperf|
Frequencies during AVX512 load,
kinghorn@utest:~$ sudo cpupower monitor -m Mperf | sort -k2 -r
CPU | C0 | Cx | Freq
14| 1.17| 98.83| 3116
24| 0.77| 99.23| 3104
23| 0.21| 99.79| 3479
19| 0.12| 99.88| 3106
25| 0.10| 99.90| 3221
17| 0.10| 99.90| 3097
21| 0.09| 99.91| 3144
26| 0.08| 99.92| 5007
18| 0.08| 99.92| 3101
20| 0.07| 99.93| 3106
22| 0.06| 99.94| 3211
15| 0.04| 99.96| 3100
27| 0.03| 99.97| 4871
16| 0.00|100.00| 3072
|Mperf
W-3175 data snippets
from /proc/cpuinfo
processor : 55
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) W-3175X CPU @ 3.10GHz
stepping : 4
microcode : 0x2000059
cpu MHz : 3800.392
cache size : 39424 KB
physical id : 0
siblings : 56
core id : 30
cpu cores : 28
apicid : 61
initial apicid : 61
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx
fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti
ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2
smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb
intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req pku ospke
flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 6200.00
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
kinghorn@utest:~/projects/benchmarks/linpack$ ./runme_xeon64
Current date/time: Fri Feb 15 05:17:29 2019
CPU frequency: 4.289 GHz
Number of CPUs: 1
Number of cores: 28
Number of threads: 28
Parameters are set to:
Number of tests: 10
Number of equations to solve (problem size) : 10000 15000 18000 20000 22000 25000 26000 27000 30000 110016
Leading dimension of array : 10000 15000 18008 20016 22008 25000 26000 27000 30000 110016
Number of trials to run : 2 2 2 2 1 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 1 1
Maximum memory requested that can be used=96830363392, at the size=110016
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
10000 10000 4 0.456 1463.3087 1.051521e-10 3.707768e-02 pass
10000 10000 4 0.450 1482.7231 1.051521e-10 3.707768e-02 pass
15000 15000 4 1.399 1608.9285 2.253401e-10 3.549145e-02 pass
15000 15000 4 1.395 1613.5717 2.253401e-10 3.549145e-02 pass
18000 18008 4 2.431 1599.5280 2.774894e-10 3.038850e-02 pass
18000 18008 4 2.430 1600.4747 2.774894e-10 3.038850e-02 pass
20000 20016 4 3.459 1542.0288 3.665729e-10 3.244973e-02 pass
20000 20016 4 3.459 1541.9453 3.665729e-10 3.244973e-02 pass
22000 22008 4 4.408 1610.6650 4.682967e-10 3.430089e-02 pass
25000 25000 4 6.509 1600.4551 5.435008e-10 3.090695e-02 pass
26000 26000 4 7.131 1643.3722 5.904530e-10 3.104779e-02 pass
27000 27000 4 7.888 1663.8254 6.503383e-10 3.171380e-02 pass
30000 30000 1 10.731 1677.5287 8.712018e-10 3.434286e-02 pass
110016 110016 1 504.572 1759.4013 1.061083e-08 3.135573e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
10000 10000 4 1473.0159 1482.7231
15000 15000 4 1611.2501 1613.5717
18000 18008 4 1600.0013 1600.4747
20000 20016 4 1541.9871 1542.0288
22000 22008 4 1610.6650 1610.6650
25000 25000 4 1600.4551 1600.4551
26000 26000 4 1643.3722 1643.3722
27000 27000 4 1663.8254 1663.8254
30000 30000 1 1677.5287 1677.5287
110016 110016 1 1759.4013 1759.4013
Residual checks PASSED
End of tests
kinghorn@utest:~/projects$ sudo cpupower monitor -m Mperf | sort -k2 -r
CPU | C0 | Cx | Freq
39| 0.99| 99.01| 3790
28| 0.47| 99.53| 3789
34| 0.14| 99.86| 3789
41| 0.07| 99.93| 3787
31| 0.06| 99.94| 3789
29| 0.06| 99.94| 3786
36| 0.05| 99.95| 3787
52| 0.03| 99.97| 3786
30| 0.02| 99.98| 3785
35| 0.02| 99.98| 3780
32| 0.02| 99.98| 3777
51| 0.01| 99.99| 3846
37| 0.01| 99.99| 3784
42| 0.01| 99.99| 3781
49| 0.01| 99.99| 3775
33| 0.01| 99.99| 3774
55| 0.01| 99.99| 3767
54| 0.01| 99.99| 3703
48| 0.00|100.00| 3756
40| 0.00|100.00| 3701
53| 0.00|100.00| 3640
46| 0.00|100.00| 3607
44| 0.00|100.00| 3577
45| 0.00|100.00| 3524
47| 0.00|100.00| 3515
43| 0.00|100.00| 3510
50| 0.00|100.00| 3500
38| 0.00|100.00| 3448
|Mperf
kinghorn@utest:~/projects$ sudo cpupower monitor -m Mperf | sort -k2 -r
CPU | C0 | Cx | Freq
28| 2.98| 97.02| 2792
39| 0.80| 99.20| 2792
34| 0.31| 99.69| 2792
44| 0.16| 99.84| 2792
41| 0.16| 99.84| 2791
36| 0.11| 99.89| 2793
29| 0.10| 99.90| 2788
52| 0.09| 99.91| 2794
42| 0.09| 99.91| 2785
30| 0.07| 99.93| 2787
45| 0.06| 99.94| 2787
46| 0.06| 99.94| 2783
37| 0.04| 99.96| 3264
35| 0.04| 99.96| 2799
50| 0.03| 99.97| 2796
32| 0.03| 99.97| 2766
53| 0.02| 99.98| 2824
48| 0.02| 99.98| 2806
51| 0.02| 99.98| 2787
31| 0.02| 99.98| 2784
54| 0.02| 99.98| 2777
43| 0.02| 99.98| 2775
55| 0.01| 99.99| 2808
40| 0.01| 99.99| 2754
38| 0.01| 99.99| 2751
33| 0.01| 99.99| 2733
49| 0.00|100.00| 2784
47| 0.00|100.00| 2765
|Mperf