HPC Posts Archive

AMD Threadripper 3970x Compute Performance Linpack and NAMD

Posted on November 25, 2019 by Dr. Donald Kinghorn

AMD Threadripper 3970x 32-core! …The, third new AMD processor I’ve had the pleasure of trying recently. I’m running it through the same double precision floating point performance tests as the recently tested Ryzen processors, Linpack and NAMD.

AMD Ryzen 3950x Compute Performance Linpack and NAMD

Posted on November 14, 2019 by Dr. Donald Kinghorn

The, much anticipated, AMD Ryzen 3950x 16-core processor is out! As always the first thing I wanted know was the double precision floating point performance. My two favorite applications for a “first look” at a new CPU are Linpack and NAMD.

AMD 3900X (Brief) Compute Performance Linpack and NAMD

Posted on July 26, 2019 by Dr. Donald Kinghorn

I was able to spend a little time with an AMD Ryzen 3900X. Of course the first thing I wanted know was the double precision floating point performance. My two favorite applications for a “first look” at a new processor are Linpack and NAMD. The Ryzen 3900X is a pretty impressive processor!

PyTorch for Scientific Computing – Quantum Mechanics Example Part 4) Full Code Optimizations — 16000 times faster on a Titan V GPU

Posted on September 14, 2018 by Dr. Donald Kinghorn

This is the 16000 times speedup code optimizations for the scientific computing with PyTorch Quantum Mechanics example. The following quote says a lot,

“The big magic is that on the Titan V GPU, with batched tensor algorithms, those million terms are all computed in the same time it would take to compute 1!!!”

PyTorch for Scientific Computing – Quantum Mechanics Example Part 3) Code Optimizations – Batched Matrix Operations, Cholesky Decomposition and Inverse

Posted on August 31, 2018 by Dr. Donald Kinghorn

An amazing result in this testing is that “batched” code ran in constant time on the GPU. That means that doing the Cholesky decomposition on 1 million matrices took the same amount of time as it did with 10 matrices!

In this post we start looking at performance optimization for the Quantum Mechanics problem/code presented in the first 2 posts. This is the start of the promise to make the code over 15,000 times faster! I still find the speedup hard to believe but it turns out little things can make a big difference.

PyTorch for Scientific Computing – Quantum Mechanics Example Part 2) Program Before Code Optimizations

Posted on August 16, 2018 by Dr. Donald Kinghorn

This is the second post on using Pytorch for Scientific computing. I’m doing an example from Quantum Mechanics. In this post we go through the formulas that need to coded and write them up in PyTorch and give everything a test.

Working around TDR in Windows for a better GPU computing experience

Posted on March 17, 2016 by William George

A brief description of graphics driver Timeout Detection and Recovery, why it can be problematic for intensive GPU codes, and how to work around it so that Windows can be a viable GPU computing platform.

Intel Python Preview

Posted on December 23, 2015 by Dr. Donald Kinghorn

Intel is working on an optimized Python! I did a quick test with the preview version and it looks good.

GTX 980 Ti Linux CUDA performance vs Titan X and GTX 980

Posted on June 12, 2015 by Dr. Donald Kinghorn

NVIDIA has just launched the GTX 980 Ti and I got to run some benchmarks on one. How is the Linux CUDA performance? Almost as good as the TitanX! This is another great card from NVIDIA for single precision compute loads. We’ve got some number to show it.

GTC 2015 Deep Learning and OpenPOWER

Posted on April 6, 2015 by Dr. Donald Kinghorn

Another great GTC meeting. NVIDIA does this right! The most interesting aspects for me this year were the talks on “Deep Learning” (Artificial Neural Networks) and OpenPOWER. I have some observations and links to recordings of the keynotes and talks. Enjoy!

HPC Posts