OK, you got one of the Intel ďfire sale / crazy Eddie saleĒ Xeon Phi 31s1p cards Ö now what? I'll give you some tips on how to get this thing working!
The new Xeon E5 v3 Haswell processors are here, all 30+ of them! There is a bewildering variety of clock speeds, core counts, and power usage. There are processors in the new v3 familly ranging from the single socket E5-1620v3 with 4 cores at 3.5 GHz to the dual socket E5-2699v3 with 18 cores at 2.3GHz. How do you make a choice for a new system?! How do these new processors perform when you programs parallel scaling is less than perfect?
Sales Consultant Jeff Stubbers recently took home an Asus 4K monitor for personal use, and he liked it so much that he wrote a blog post about it.
The Intel Xeon E5 v3 Haswell EP processors are here. The floating point performance on these new processors is outstanding. We run a Linpack benchmark on a dual Xeon E5-2687W v3 system and show how it stacks up against several processors.
Memory bandwidth is often an important factor for compute or data intensive workloads. The STREAM benchmark has been used for may years as a measure of this bandwidth. We present STREAM results for the new Xeon E5 v3 Haswell processor with DDR4 memory and compare this with an Xeon E5 v2 Ivy Bridge system.
Posted on August 29, 2014 by Dr Donald Kinghorn
The new Intel desktop Core i7 processors are out, Haswell E! We look at how the Core i7 5960X and 5930K stack up with some other processors for numerical computing with the Intel optimized MKL Linpack benchmark.
LAMMPS is a molecular dynamics program capable of running very large (billions of atom) dynamics simulations. It is modular with many contributed packages to add extra potential energy functions, atom types etc.. There was recently added a package, USER-INTEL, that adds some nice code optimizations for Intel Xeon hardware. We grabbed the latest source code and did a build with this new code and fired it up on our quad Xeon test system and got very good performance.
Posted on August 5, 2014 by Dr Donald Kinghorn
OpenFOAM is a collection of programs and libraries for computational fluid dynamics, CFD, and general dynamical modelling with many solver types. It can give linear scaling and excellent parallel performance on Quad socket many-core systems. Read on to see performance on a 40-core Xeon and 48-core Opteron system.
Iíve been doing application performance testing on our quad socket systems and I am especially liking the quad Xeon box on our test bench. I realized that I havenít published any LINPACK performance numbers for this system (thatís my favorite benchmark). Iíll show the results for the Intel optimized multi-threaded binary that is included with Intel MKL and do a compile from source using OpenMPI. It turns out that both openMP threads and MPI processes give outstanding, near theoretical peak performance. Building from source hopefully shows that itís not just Intel ďmagicĒ that leads to this performance Ö although I guess it really is.
POV-ray is an open source ray tracing package with a long history. It has been a favorite system performance testing package since itís inception because of the heavy load it places on the CPU. It has had an SMP parallel implementation since the mid 2000ís and is often used as a multi-core CPU parallel performance benchmark on both Linux and Windows. So lets try it on our Quad socket many-core systems!
Posted on July 2, 2014 by Dr Donald Kinghorn
Hyper-Threading, hyperthreading, or just HT for short, has been around on Intel processors for over a decade and it still confuses people. Iím not going to do much to help with the confusion. I just want to point out an example from some testing I was doing recently with the ray-tracing application POV-ray that surprised me. Hyper-threading dramatically lowered the performance on a multi-core test system running Windows when running POV-ray in parallel.
Iím going to walk you through a basic install and configuration for a development system to do CUDA and OpenACC GPU programming. This is not a detailed howto but if you have some linux admin skills it will be a reasonable guide to get you started. Weíll do a basic NVIDIA GPU programming setup including CentOS 6.5, CUDA development environment and a PGI compiler setup with OpenACC. The most interesting part may be the OpenACC setup. OpenACC is a relatively new option for GPU programming and allows for a directive (pragma) based coding model.
We take a look at Quad Xeon and Quad Opteron performance and parallel scaling with Zemax OpticStudio including an analysis using Amdahl's Law. Based on this analysis we then make performance predictions for other processors.
NVIDIA Tesla K20 plus PGI Accelerator compilers with OpenACC in a package deal with a system. Yes, it's official. If you've wanted to do some development work with OpenACC on Tesla, this is a nice way to get started with a heavily discounted K20 and PGI compiler package pre loaded on a Peak Mini.
Here's a quick look at CUDA performance on the NVIDIA Jetson Tegra K1 developer board.
Posted on April 23, 2014 by Dr Donald Kinghorn
Need the most compute capability you can get in a single box for a well written, multithreaded application? Weíll take a look at one such application, Zemax OpticStudio14, running on a quad socket Ivy Bridge Xeon system. Performance was excellent!
The annual Northwest pilgrimage for the Linux faithful to the Bellingham Technical College in Bellingham, WA is nearly upon us! Puget Systems is donating a great machine to the raffle, a Serenity mini with a commemorative case etching!
Where is NVIDIA heading with High Performance Computing hardware? Ever since Intel announced Xeon Phi Knights Landing as a stand-alone processor integrated at the board level as a full compute unit, I've been wondering what NVIDIA would do along these lines. It just makes sense that they would do something similar since getting the GPU off of the PCIe bus and tightly integrated with plentiful system memory would be a huge step forward for usability and performance. Here's my guess about where NVIDIA is heading.
I had the pleasure of attending the NVIDIA Graphics Technology Conference ( GTC ) last week. Wonderful conference! If you have any doubts about the quality of the conference you are in luck. They have most of the content on-line, you can check it out yourself ...
How does the Ivy Bridge-E Core i7-4960X (Extreme edition) do against the Haswell Core i7-4770 running the Linpack benchmark? The Ivy Bridge-E 4960X is a great processor -- 6 cores, 4GHz max turbo clock, 4 memory channels, 40 PCIe lanes, big price tag ... However, the humble Haswell 4770 has it's AVX2 and FMA3 secret weapons which are really effective on linear/matrix algebra type of numerical computing problems. ...
By now, most folks have seen Appleís updated Mac Pro - or as I like to call it, the trash can. I kid, I kid! In all seriousness, though, we are often asked how our workstations - like the Genesis line - compare to the hardware Apple has put in the new, miniature Mac Pro. Read on to find out...
The NVIDIA Tesla accelerator is a well established work-horse for many useful and important High Performance Computing applications and we are happy to be able to provide Tesla acceleration for our "Peak" systems. The developer ecosystem around CUDA is well established, however, at Puget Systems we believe there is new round of developer interest on the horizon that will be catalyzed by the soon to be released 6.x series of the CUDA platform, advances with openACC, new libraries, new hardware, and perhaps significantly, NVIDIA's acquisition of The Portland Group and their excellent compilers and tools for working with Tesla. So, I've loaded up a Peak mini with a Tesla K40 and I'm ready to give Tesla programming a fresh look.
If you are thinking about getting a system for doing development work targeting the Intel Xeon Phi and you hesitated because of the additional cost of the Intel developer tools you would need then, you should get a system with the "Xeon Phi developers starter kit". The savings on the Intel tools can completely offset the cost of the base system. It's a serious bargain!
Can you use the new RHEL/CentOS 6.5 release with the Xeon Phi ... yes! But, there is a gotcha that we will need to work around. Read on.
Unlike desktop computers that sport large cases, ample power, and generally remain stationary, laptops can be confusing to contrast and compare. Desktop computers are typically much easier to upgrade than laptops so when youíre selecting a laptop, itís a good idea to ensure it includes the level of performance you require from day one. For example, the graphics card thatís used to power games or render complex 3D objects can be simple to upgrade in a desktop computer, whereas you may not have many, if any options to upgrade the graphical performance on your laptop. Swapping out CPUs and drives can be done, but again, you usually have fewer options than with a desktop PC.<< Older Posts