NVIDIA HPC future directions

Table of Contents

Where is NVIDIA heading with HPC hardware?

Ever since Intel announced Xeon Phi "Knights Landing" as a stand-alone processor integrated at the board level as a full compute unit, I've been wondering what NVIDIA would do along these lines. It just makes good sense that they would do something similar since getting the GPU off of the PCIe bus and tightly integrated with plentiful system memory would be a huge step forward for usability and performance. Here's my guess about where NVIDIA is heading.

I’m not going to bore you with road-maps and hard facts. I’m going to dazzle you with wild unsubstantiated speculation and wishful thinking!

However, if you want some more concrete information, road-maps and the like, you really should check out NVIDIA CEO Jen-Hsun Huang’s keynote from the recent GTC conference, It’s really quite good! There was one thing that I wish he had announced, or at least hinted at more strongly, and that is what I’m going to talk about.

Intel is expected to launch the next generation of Xeon Phi, Knights Landing, sometime in 2015. What will NVIDIA come up with? NVIDIA has some great people and they are obviously not just sitting around designing video cards.

Here are a few of the rumored specs for Xeon Phi (Knights Landing):

Stand alone processor (not a “coprocessor” anymore)
72-core (Atom based with AVX512 vector units)
3D stacked RAM on package
Integrated multi-channel DDR4 memory controller (up to 384GB?)
36 lane PCIe mostly for future high speed communication fabric
…

Now, here are some of the puzzle pieces on the NVIDIA side:

ARM — NVIDIA does ARM well with Tegra and the “Jetson” developer board is a nice integration with an ARM Cortex-A15 CPU and 192 Kepler GPU cores. This is the year of 64-bit ARM and movement toward serious compute capability. ARM could become a usable HPC CPU for NVIDIA … maybe, maybe not, read on.
3D Stacked memory has been announced for the Pascal GPU architecture. (Get the high speed memory on the processor package, more like another layer of cache).
NVLink — This is the CPU-GPU interconnect needed to eliminate the PCIe bottleneck. This is really significant! There is involvement with IBM on this and the first implementation is likely(?) on IBM POWER architecture. POWER? … ARM?, Yes, POWER is nice!
NVIDIA has joined the “OpenPOWER Foundation”
CUDA 6, Unified memory, that fits nicely with NVLink.
Open Compute — NVIDIA has joined this, mostly Facebook-led, open-hardware project. This could help them with developing a “completely integrated compute unit”.
The Portland Group — PGI is now under NVIDIA’s wing. There are a lot of really good software people at PGI and after talking to a few of them at GTC I have the feeling that working tighter with NVIDIA is a breath of fresh air for them. It’s good to have tight, high quality, compiler support to go along with new hardware design.

One hardware component missing is mention of high speed network fabric. However, that could be accounted for within “open compute”(?).

My predictions …

… I predict that in the next 12 months we’ll see announcements of NVIDIA Pascal GPU’s on board with ARM or POWER architecture CPU’s connected by NVLink to the system memory pool along with some kind of on-board (or slotted) high speed network fabric.

I could be completely wrong, but I don’t think so… there is just too much to be gained by getting accelerators and coprocessors off of the PCIe bus and directly connected to large memory spaces. One thing for sure, the next couple of years will see significant steps forward for compute hardware. There is a potential for disruption of our comfortable “assemble from commodity components” model for HPC with more proprietary hardware on the horizon. We could see an abandonment of the traditional motherboard+CPU’s+mem+storage type of system and more of an all-in-one compute unit. Just thinking "out loud", we’ll see …

Happy computing –dbk

Tags: HPC, NVIDIA, Tesla