LAMMPS is a molecular dynamics program capable of running very large (billions of atom) dynamics simulations. It is modular with many contributed packages to add extra potential energy functions, atom types etc.. There was recently added a package, USER-INTEL, that adds some nice code optimizations for Intel Xeon hardware. We grabbed the latest source code and did a build with this new code and fired it up on our quad Xeon test system and got very good performance.
OpenFOAM is a collection of programs and libraries for computational fluid dynamics, CFD, and general dynamical modelling with many solver types. It can give linear scaling and excellent parallel performance on Quad socket many-core systems. Read on to see performance on a 40-core Xeon and 48-core Opteron system.
I’ve been doing application performance testing on our quad socket systems and I am especially liking the quad Xeon box on our test bench. I realized that I haven’t published any LINPACK performance numbers for this system (that’s my favorite benchmark). I’ll show the results for the Intel optimized multi-threaded binary that is included with Intel MKL and do a compile from source using OpenMPI. It turns out that both openMP threads and MPI processes give outstanding, near theoretical peak performance. Building from source hopefully shows that it’s not just Intel “magic” that leads to this performance … although I guess it really is.
POV-ray is an open source ray tracing package with a long history. It has been a favorite system performance testing package since it’s inception because of the heavy load it places on the CPU. It has had an SMP parallel implementation since the mid 2000’s and is often used as a multi-core CPU parallel performance benchmark on both Linux and Windows.
So lets try it on our Quad socket many-core systems!
Hyper-Threading, hyperthreading, or just HT for short, has been around on Intel processors for over a decade and it still confuses people. I’m not going to do much to help with the confusion. I just want to point out an example from some testing I was doing recently with the ray-tracing application POV-ray that surprised me. Hyper-threading dramatically lowered the performance on a multi-core test system running Windows when running POV-ray in parallel.
We take a look at Quad Xeon and Quad Opteron performance and parallel scaling with Zemax OpticStudio including an analysis using Amdahl’s Law. Based on this analysis we then make performance predictions for other processors.
Need the most compute capability you can get in a single box for a well written, multithreaded application? We’ll take a look at one such application, Zemax OpticStudio14, running on a quad socket Ivy Bridge Xeon system. Performance was excellent!