NVIDIA's GeForce GTX Titan X isn't for everyone - no $1000 video card ever will be - but it has some very specific roles where it excels. Click here to read about what the Titan X is and what it does well at!
Modern high-end laptops can be treated as desktop system replacements so it's expected that people will want to try to do some serious computing on them. Doing GPU accelerated computing on a laptop is possible and performance can be surprisingly good with a high-end NVIDIA GPU. [I'm looking at GTX 980m and 970m ]. However, first you have to get it to work! Optimus technology can present serious problems to someone who wants to run a Linux based CUDA laptop computing platform. Read on to see what worked.
Posted on March 2, 2015 by Dr Donald Kinghorn
The next 18 months are going to see more shakeup and factioning in the computing world than we have seen in over a decade. Intel is pulling more and more of the compute architecture onto a single piece of silicon and tightly integrating the whole hardware stack. That's good and bad. It may let them achieve better performance. However, this is going to leave users with a choice of “all Intel” or something else entirely. And, the “something else” is starting to seriously take shape.
OK, you got one of the Intel “fire sale / crazy Eddie sale” Xeon Phi 31s1p cards … now what? I'll give you some tips on how to get this thing working!
The new Xeon E5 v3 Haswell processors are here, all 30+ of them! There is a bewildering variety of clock speeds, core counts, and power usage. There are processors in the new v3 familly ranging from the single socket E5-1620v3 with 4 cores at 3.5 GHz to the dual socket E5-2699v3 with 18 cores at 2.3GHz. How do you make a choice for a new system?! How do these new processors perform when you programs parallel scaling is less than perfect?
Sales Consultant Jeff Stubbers recently took home an Asus 4K monitor for personal use, and he liked it so much that he wrote a blog post about it.
The Intel Xeon E5 v3 Haswell EP processors are here. The floating point performance on these new processors is outstanding. We run a Linpack benchmark on a dual Xeon E5-2687W v3 system and show how it stacks up against several processors.
Memory bandwidth is often an important factor for compute or data intensive workloads. The STREAM benchmark has been used for may years as a measure of this bandwidth. We present STREAM results for the new Xeon E5 v3 Haswell processor with DDR4 memory and compare this with an Xeon E5 v2 Ivy Bridge system.
Posted on August 29, 2014 by Dr Donald Kinghorn
The new Intel desktop Core i7 processors are out, Haswell E! We look at how the Core i7 5960X and 5930K stack up with some other processors for numerical computing with the Intel optimized MKL Linpack benchmark.
LAMMPS is a molecular dynamics program capable of running very large (billions of atom) dynamics simulations. It is modular with many contributed packages to add extra potential energy functions, atom types etc.. There was recently added a package, USER-INTEL, that adds some nice code optimizations for Intel Xeon hardware. We grabbed the latest source code and did a build with this new code and fired it up on our quad Xeon test system and got very good performance.
Posted on August 5, 2014 by Dr Donald Kinghorn
OpenFOAM is a collection of programs and libraries for computational fluid dynamics, CFD, and general dynamical modelling with many solver types. It can give linear scaling and excellent parallel performance on Quad socket many-core systems. Read on to see performance on a 40-core Xeon and 48-core Opteron system.
I’ve been doing application performance testing on our quad socket systems and I am especially liking the quad Xeon box on our test bench. I realized that I haven’t published any LINPACK performance numbers for this system (that’s my favorite benchmark). I’ll show the results for the Intel optimized multi-threaded binary that is included with Intel MKL and do a compile from source using OpenMPI. It turns out that both openMP threads and MPI processes give outstanding, near theoretical peak performance. Building from source hopefully shows that it’s not just Intel “magic” that leads to this performance … although I guess it really is.
POV-ray is an open source ray tracing package with a long history. It has been a favorite system performance testing package since it’s inception because of the heavy load it places on the CPU. It has had an SMP parallel implementation since the mid 2000’s and is often used as a multi-core CPU parallel performance benchmark on both Linux and Windows. So lets try it on our Quad socket many-core systems!
Posted on July 2, 2014 by Dr Donald Kinghorn
Hyper-Threading, hyperthreading, or just HT for short, has been around on Intel processors for over a decade and it still confuses people. I’m not going to do much to help with the confusion. I just want to point out an example from some testing I was doing recently with the ray-tracing application POV-ray that surprised me. Hyper-threading dramatically lowered the performance on a multi-core test system running Windows when running POV-ray in parallel.
I’m going to walk you through a basic install and configuration for a development system to do CUDA and OpenACC GPU programming. This is not a detailed howto but if you have some linux admin skills it will be a reasonable guide to get you started. We’ll do a basic NVIDIA GPU programming setup including CentOS 6.5, CUDA development environment and a PGI compiler setup with OpenACC. The most interesting part may be the OpenACC setup. OpenACC is a relatively new option for GPU programming and allows for a directive (pragma) based coding model.
We take a look at Quad Xeon and Quad Opteron performance and parallel scaling with Zemax OpticStudio including an analysis using Amdahl's Law. Based on this analysis we then make performance predictions for other processors.
NVIDIA Tesla K20 plus PGI Accelerator compilers with OpenACC in a package deal with a system. Yes, it's official. If you've wanted to do some development work with OpenACC on Tesla, this is a nice way to get started with a heavily discounted K20 and PGI compiler package pre loaded on a Peak Mini.
Here's a quick look at CUDA performance on the NVIDIA Jetson Tegra K1 developer board.
Posted on April 23, 2014 by Dr Donald Kinghorn
Need the most compute capability you can get in a single box for a well written, multithreaded application? We’ll take a look at one such application, Zemax OpticStudio14, running on a quad socket Ivy Bridge Xeon system. Performance was excellent!
The annual Northwest pilgrimage for the Linux faithful to the Bellingham Technical College in Bellingham, WA is nearly upon us! Puget Systems is donating a great machine to the raffle, a Serenity mini with a commemorative case etching!
Where is NVIDIA heading with High Performance Computing hardware? Ever since Intel announced Xeon Phi Knights Landing as a stand-alone processor integrated at the board level as a full compute unit, I've been wondering what NVIDIA would do along these lines. It just makes sense that they would do something similar since getting the GPU off of the PCIe bus and tightly integrated with plentiful system memory would be a huge step forward for usability and performance. Here's my guess about where NVIDIA is heading.
I had the pleasure of attending the NVIDIA Graphics Technology Conference ( GTC ) last week. Wonderful conference! If you have any doubts about the quality of the conference you are in luck. They have most of the content on-line, you can check it out yourself ...
How does the Ivy Bridge-E Core i7-4960X (Extreme edition) do against the Haswell Core i7-4770 running the Linpack benchmark? The Ivy Bridge-E 4960X is a great processor -- 6 cores, 4GHz max turbo clock, 4 memory channels, 40 PCIe lanes, big price tag ... However, the humble Haswell 4770 has it's AVX2 and FMA3 secret weapons which are really effective on linear/matrix algebra type of numerical computing problems. ...
By now, most folks have seen Apple’s updated Mac Pro - or as I like to call it, the trash can. I kid, I kid! In all seriousness, though, we are often asked how our workstations - like the Genesis line - compare to the hardware Apple has put in the new, miniature Mac Pro. Read on to find out...
The NVIDIA Tesla accelerator is a well established work-horse for many useful and important High Performance Computing applications and we are happy to be able to provide Tesla acceleration for our "Peak" systems. The developer ecosystem around CUDA is well established, however, at Puget Systems we believe there is new round of developer interest on the horizon that will be catalyzed by the soon to be released 6.x series of the CUDA platform, advances with openACC, new libraries, new hardware, and perhaps significantly, NVIDIA's acquisition of The Portland Group and their excellent compilers and tools for working with Tesla. So, I've loaded up a Peak mini with a Tesla K40 and I'm ready to give Tesla programming a fresh look.<< Older Posts