Intel vs NVIDIA, IBM, Mellanox, AMD and everybody!

The next 18 months are going to see more shakeup and factioning in the computing world than we have seen in over a decade. Intel is pulling more and more of the compute architecture onto a single piece of silicon and tightly integrating the whole hardware stack. That’s good and bad. It may let them achieve better performance. However, this is going to leave users with a choice of “all Intel” or something else entirely. And, the “something else” is starting to seriously take shape.

Intel Xeon E5 v3 Haswell-EP Buyers Guide

The new Xeon E5 v3 Haswell processors are here, all 30+ of them! There is a bewildering variety of clock speeds, core counts, and power usage. There are processors in the new v3 familly ranging from the single socket E5-1620v3 with 4 cores at 3.5 GHz to the dual socket E5-2699v3 with 18 cores at 2.3GHz. How do you make a choice for a new system?!

How do these new processors perform when you programs parallel scaling is less than perfect?

Xeon E5 v3 Haswell-EP Performance — Linpack

The Intel Xeon E5 v3 Haswell EP processors are here. The floating point performance on these new processors is outstanding. We run a Linpack benchmark on a dual Xeon E5-2687W v3 system and show how it stacks up against several processors.

Memory Performance for Intel Xeon Haswell-EP DDR4

Memory bandwidth is often an important factor for compute or data intensive workloads. The STREAM benchmark has been used for may years as a measure of this bandwidth. We present STREAM results for the new Xeon E5 v3 Haswell processor with DDR4 memory and compare this with an Xeon E5 v2 Ivy Bridge system.

LAMMPS Optimized for Intel on Quad Socket Xeon

LAMMPS is a molecular dynamics program capable of running very large (billions of atom) dynamics simulations. It is modular with many contributed packages to add extra potential energy functions, atom types etc.. There was recently added a package, USER-INTEL, that adds some nice code optimizations for Intel Xeon hardware. We grabbed the latest source code and did a build with this new code and fired it up on our quad Xeon test system and got very good performance.

OpenFOAM performance on Quad socket Xeon and Opteron

OpenFOAM is a collection of programs and libraries for computational fluid dynamics, CFD, and general dynamical modelling with many solver types. It can give linear scaling and excellent parallel performance on Quad socket many-core systems. Read on to see performance on a 40-core Xeon and 48-core Opteron system.

Why quad Xeon? 95% of peak LINPACK on 40 cores!

I’ve been doing application performance testing on our quad socket systems and I am especially liking the quad Xeon box on our test bench. I realized that I haven’t published any LINPACK performance numbers for this system (that’s my favorite benchmark). I’ll show the results for the Intel optimized multi-threaded binary that is included with Intel MKL and do a compile from source using OpenMPI. It turns out that both openMP threads and MPI processes give outstanding, near theoretical peak performance. Building from source hopefully shows that it’s not just Intel “magic” that leads to this performance … although I guess it really is.

POV-ray on Quad Xeon and Opteron

POV-ray is an open source ray tracing package with a long history. It has been a favorite system performance testing package since it’s inception because of the heavy load it places on the CPU. It has had an SMP parallel implementation since the mid 2000’s and is often used as a multi-core CPU parallel performance benchmark on both Linux and Windows.

So lets try it on our Quad socket many-core systems!