NAMD Custom Build for Better Performance on your Modern GPU Accelerated Workstation — Ubuntu 16.04, 18.04, CentOS 7



Introduction

This post will focus on compiling NAMD from source for good performance on modern NVIDIA GPU’s utilizing CUDA. Doing a custom NAMD build from source code gives a moderate but significant boost in performance. This can be important considering that large simulations over many time-steps can run for days or weeks. I wanted to do some custom NAMD builds to ensure that that modern Workstation hardware was being well utilized.

I often use NAMD for hardware performance testing and over the past year I have been using the nicely optimized build available in the NVIDIA NGC docker image. I am an advocate of using docker (see this link and links therein). However, for most workflows it is more appropriate and common to use a traditional Workstation setup. We will go through a relatively simple build process to get a NAMD that will give excellent performance on modern Workstation hardware.

If you are impaitent there is a plot and table of performance results near the bottom of the post.

Hardware

I used the phrase “modern Workstation hardware” a couple of times above. By that I mean a Workstation (or node) based around a current Intel CPU with the Skylake-X, -W, or -SP architecture. Together with an NVIDIA GPU based on Pascal or Volta architecture. For example, 1080Ti, Titan Xp, or Titan V. In particular I am using my personal machine with an Intel Xeon-W 2175 14-core CPU and an NVIDIA 1080Ti GPU similar to what you could configure as our “Peak Single”.

I will be doing more comprehensive hardware performance testing with NAMD in a later post. I will wait for the finial NAMD 2.13 version release before I do that. However, I will include some performance numbers near the end of this post so that you can see the nice speedup from doing your own NAMD build… I also include numbers using a Titan V.

If you search though the (many) posts I have on the Puget Systems HPC blog you will find several posts related to NAMD that I have done over the last few years. The last time I did a post that included building NAMD from source was nearly three years ago, Molecular Dynamics Performance on GPU Workstations — NAMD.

NAMD

NAMD is a molecular dynamics program developed and maintained by the Theoretical and Computational Biophysics Group at the University of Illinois at Urbana-Champaign. It is licensed by the University of Illinois and is made freely available, including source code, under a non-exclusive, non-commercial use license.

NAMD is a widely used molecular dynamics program capable of performing simulations on systems with millions of atoms (approaching billions of atoms!). It is also highly parallel and is often installed on large compute clusters. The underlying parallelism is achieved by using the underlying parallel objects framework charm++.

The group at UIUC working on NAMD were early pioneers of using GPU’s for compute acceleration and NAMD has very good performance acceleration using NVIDIA CUDA.

NAMD is representative of a larger group of Molecular Dynamics packages including LAMMPS, GROMACS, Amber … All of the modern Molecular Dynamics programs have very good GPU compute acceleration for important aspects of there functionality and performance.


Setup your Build Environment

For the NAMD build that I describe below you will need to have NVIDIA CUDA 9.0, Intel MKL and a standard set of gcc/g++ development tools. I did builds in an Ubuntu 16.04 and CentOS 7 environment. The build process is the same for both of these Linux distributions but the setup is of course a little different.

I assume you have your OS running and have recent NVIDIA drivers installed.

Development Tools

You may already have everything you need installed but the following should ensure that.

CentOS 7 (using “root” or sudo)

yum groupinstall "Development Tools"

Ubuntu 16.04

sudo apt-get install build-essential

Intel MKL

You don’t necessarily need to use MKL but I do in general recommend it when you can. This will mostly be providing the FFT libraries which are highly optimized in MKL.

Intel makes their optimized numerical libraries, MKL, available free of charge. Thank you Intel! MKL is simple to install and is the same procedure on both Ubuntu 16.04 and CentOS 7.

Download MKL from Intel’s “MATH KERNEL LIBRARY” site. You will have to register on the site. After that you will have access to all of the Intel’s performance libraries including MKL. I used version “2018 Update 3”. Install is straight forward,

tar xf l_mkl_2018.3.222.tgz
cd l_mkl_2018.3.222
sudo ./install.sh

Then just follow the prompts. I assume you will install to the default location under /opt/intel.

CUDA

Installing CUDA is a little more involved but not too bad. I used CUDA 9.0 in my build mostly because I know that the NAMD developers have tested with it. If you have some other version of CUDA already installed you can try using that. It would be similar to what I describe for 9.0.

I am assuming you have an up-to-date NVIDIA display driver installed (that provides the CUDA runtime). You will then be best to install CUDA from the “.run” file so that you can exclude installing the driver during the install.

To get CUDA 9.0 you will need to go to the “CUDA Toolkit Archive”. Follow the “buttons” until you get to the “runfile (local)” download links. There is the main .run file and 3 patches. You should probably download the patches and install them too. Installation is straight forward but do be careful to read the prompts and say No to installing the driver.

sudo sh cuda_9.0.176_384.81_linux.run

The patches will install in the same way. It’s OK to install version 9.0 along side of any other versions you may already have installed. Use the default location in /usr/local and you may want to skip creating the symbolic link if you have another version already linked. We will set the build to use /usr/local/cuda-9.0 in later steps.

Docker?

I actually used docker with the NVIDA runtime for my build environments. My system is running Ubuntu 18.04 and I have a docker/nvidia-docker setup on my machine. I described how to set this up in a 5 part series of post the last of which is, How-To Setup NVIDIA Docker and NGC Registry on your Workstation – Part 5 Docker Performance and Resource Tuning. That post has links to the earlier posts in the series. Using my docker install I setup a CentOS 7 build environment from the following container,

docker run --runtime=nvidia -it -v $HOME/projects:/projects nvidia/cuda:9.0-devel-centos7

I have my “projects” directory bound into the container and I have the NAMD source and the MKL install directory in subdirectories of that so I can access the code from my host systems and from running docker container images.

In that container I installed MKL and dev tools.

yum groupinstall "Development Tools"
yum install emacs-nox which wget

cd l_mkl_2018.3.222
sudo ./install.sh

I then saved that container as (it had an assigned name “modest_mahavira”)

docker commit -a "dbk" -m "CentOS 7.4 with CUDA 9.0 and MKL build env" modest_mahavira cuda9.0-mkl-centos74

I did seup a similar container for an Ubuntu 16.04 environment.

I like the flexibility of using docker but you can certainly do your NAMD build without it.


Configure and Build NAMD

With your build environment setup you can now configure and build NAMD from source. What follows are the details of the choices I made. You may want to do something else but these steps should help you get started.

Note: I am building from the “Nightly Build” source for the upcoming NAMD version 2.13. I’ll add some notes about building from the old 2.12 source in a later section.

The basic procedure is,

  • Get the NAMD source
  • Build and test “charm”
  • Edit a few “arch” files for the NAMD build
  • Run “config” with the appropriate options
  • Do “make” for that configuration
  • Create a “release” and test

1) Download the NAMD Source

Go to the NAMD download page and click on “Source Code” for the “Version Nightly Build (…) Platforms:”. If you have not registered for the download you will need to create an account to access the files. Please don’t hesitate to do this. It’s part of how they get data to justify their continued funding for the development work.

Create a directory for your build and expand the source tar file(s). This would be something similar to the following,

mkdir NAMD-build
cd NAMD-build
mv ~/Downloads/NAMD_Git-2018-07-17_Source.tar.gz .
tar xf NAMD_Git-2018-07-17_Source.tar.gz
cd NAMD_Git-2018-07-17_Source/
tar xf charm-6.8.2.tar

2) Build and test “charm”

This is very simple because they have created a question-answer script to guide you through an appropriate configuration. All you have to do is cd into the charm directory and run build without any arguments.

cd charm-6.8.2/
./build

Here’s my answers to the questions,

============================================================

Begin interactive charm configuration ...
If you are a poweruser expecting a list of options, please use ./build --help

============================================================


Are you building to run just on the local machine, and not across multiple nodes? [y/N]
y

Do you want to specify a compiler? [y/N]N

Do you want to specify any Charm++ build options, such as fortran compilers? [y/N]N

Choose a set of compiler flags [1-5]
	1) none
	2) debug mode                        -g -O0
	3) production build [default]        --with-production
	4) production build w/ projections   --with-production --enable-tracing
	5) custom

	5
	Enter compiler options: --with-production -march=native

What do you want to build?
	1) Charm++ [default] (choose this if you are building NAMD)
	2) Charm++ and AMPI
	3) Charm++, AMPI, ParFUM, FEM and other libraries

1

Do you want to compile in parallel?
        1) No
        2) Build with -j2
        3) Build with -j4
        4) Build with -j8
        5) Build with -j16 [default]
        6) Build with -j32
        7) Build with -j

4
We have determined a suitable build line is:
	./build charm++ multicore-linux-x86_64   -j8  --with-production -march=native


Do you want to start the build now? [Y/n]Y

Easy! The most “unusual” thing I did here was add the -march=native flag to give the compiler a better clue about optimizations that could be made.

Now give it a quick test,

cd tests/charm++/megatest/
make pgm

I have 14 cores in my system so I’ll use +p14,

./pgm +p14

This runs very fast on a modern system and most tests will report 0.00 sec for time. It should not give any errors.

3) Setup the configuration for the NAMD build

There are 3 things we’ll need to do here,

  • Satisfy the “tcl” dependency
  • Setup the correct path to the Intel MKL fftw3 libraries (2 files)
  • Setup the path for the CUDA libraries

We will be editing a few files in the “arch” directory. Our main “arch” file is Linux-x86_64-g++.arch. The 5 files we need to edit are Linux-x86_64.tcl, Linux-x86_64.mkl, Linux-x86_64.fftw3, Linux-x86_64.cuda9 and Linux-x86_64-g++.arch. We’ll also create a link from Linux-x86_64.cuda9 to Linux-x86_64.cuda. Most of the edits are trivial.

3.a) tcl dependency

The easiest thing to do for this is to get the tar file that UIUC uses for their builds and put it in the soruce directory and then configure the “arch”.tcl file.

wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64-threaded.tar.gz

tar xf tcl8.5.9-linux-x86_64-threaded.tar.gz
Edit arch/Linux-x86.tcl

It should look like this when you are done, (just fixing the path)

#TCLDIR=/Projects/namd2/tcl/tcl8.5.9-linux-x86_64                                                                                                   
TCLDIR=../tcl8.5.9-linux-x86_64-threaded
TCLINCL=-I$(TCLDIR)/include
#TCLLIB=-L$(TCLDIR)/lib -ltcl8.5 -ldl                                                                                                               
TCLLIB=-L$(TCLDIR)/lib -ltcl8.5 -ldl -lpthread
TCLFLAGS=-DNAMD_TCL
TCL=$(TCLINCL) $(TCLFLAGS)

If you are using the the MKL libraries for a fast FFT all you need to do is get the path corrected. I am assuming that you used the default paths when you installed MKL.

Edit arch/Linux-x86.mkl

Set MKLROOT and change options so that we do static linking. (“static” will make the executable larger but you will be able to run it on systems that do not have the MKL dynamic libraries installed.)

MKLROOT=/opt/intel/mkl/
FFTDIR=$(MKLROOT)
FFTINCL=-I$(FFTDIR)/include/fftw
#FFTLIB=-L$(FFTDIR)/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
FFTLIBDIR=$(FFTDIR)/lib/intel64
FFTLIB=-Wl,--start-group $(FFTLIBDIR)/libmkl_intel_lp64.a $(FFTLIBDIR)/libmkl_sequential.a $(FFTLIBDIR)/libmkl_core.a -Wl,--end-group
FFTFLAGS=-DNAMD_FFTW -DNAMD_FFTW_3
FFT=$(FFTINCL) $(FFTFLAGS)

NAMD “config” will reference this file when we pass the “–with-mkl” flag. We also need to edit the Linux-x86_64.fftw3 file.

Edit arch/Linux-x86.fftw3

We are making several changes in this file. It should look like this when you are done,

FFTDIR=$(MKLROOT)
FFTINCL=-I$(MKLROOT)/include -I$(FFTDIR)/include/fftw
FFTLIB= -mkl
FFTFLAGS=-DNAMD_FFTW -DNAMD_FFTW_3
FFT=$(FFTINCL) $(FFTFLAGS)

3.c) Set CUDA configuration

For CUDA we will need to fix the path and create a symbolic link to the correct “.cuda” file so we can use the “–with-cuda” configure flag for the NAMD build.

Edit arch/Linux-x86.cuda9

I’m only changing the fist line to get the path set to the CUDA install default, but, here’s what the whole file should look like,


CUDADIR=/usr/local/cuda-9.0

CUDAINCL=-I$(CUDADIR)/include
CUBDIR=.rootdir/cub
CUBINCL=-I$(CUBDIR)
CUDALIB=-L$(CUDADIR)/lib64 -lcufft_static -lculibos -lcudart_static -lrt
CUDASODIR=$(CUDADIR)/lib64
LIBCUDARTSO=
CUDAFLAGS=-DNAMD_CUDA
CUDAOBJS=$(CUDAOBJSRAWSTATIC)
CUDA=$(CUDAFLAGS) -I. $(CUDAINCL) $(CUBINCL)
CUDACC=$(CUDADIR)/bin/nvcc -Xcompiler "-m64"

CUDACCOPTS=-O3 --maxrregcount 48 $(CUDAGENCODE) $(CUDA)

# limit CUDADLINKOPTS to architectures available in libcufft_static

CUDADLINKOPTS=-gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,\
code=sm_60 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70

CUDAGENCODE=-gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,co\
de=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_61,code=\
compute_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70

Notice that we are getting CUDA arch from Kepler through Volta.
If you only wanted to include Pascal and Volta you could take out the flags smaller than “60” in CUDALINKOPTS and CUDAGENCODE.

The default file that gets read when you use the “–with-cuda” configure flag is Linux-x86.cuda but that file is setup for CUDA 8.0. We’ll rename that file to Linux-x86.cuda8 and make a link from Linux-x86.cuda to Linux-x86.cuda9.

mv Linux-x86_64.cuda Linux-x86_64.cuda8

ln -s Linux-x86_64.cuda9 Linux-x86_64.cuda

3.d) Add -march=native to the main configuration file

We giving the compiler a hint to optimize for the system that we are building on.

Edit arch/Linux-x86-g++.arch
NAMD_ARCH = Linux-x86_64
CHARMARCH = multicore-linux-x86_64

CXX = g++ -m64 -std=c++0x -O3 -march=native
CXXOPTS = -fexpensive-optimizations -ffast-math
CC = gcc -m64 -O3 -march=native
COPTS = -fexpensive-optimizations -ffast-math

4) Run config to create the build directory

With the setup from the above section, creating the build configuration is simple using the appropriate config flags. From the main source directory run,

./config Linux-x86_64-g++ --with-mkl --with-cuda

That creates the directory Linux-x86_64-g++ with all the appropriate links to the correct configuration files including the main “Makefile”.

5) Do the build (make)

cd to Linux-x86_64-g++ and run make. I’ll add -j14 since I have 14 cores I can use for the compile.

cd Linux-x86_64-g++
make -j14

That should build without errors. A few warnings are OK.

6) Create a distribution release

make release

This is a nice feature of the build setup. This will create a compressed tar file for you. In my case it was,

NAMD_Git-2018-07-17_Linux-x86_64-multicore-CUDA.tar.gz

You can move this file around to other directories or machines and then un-tar it and go to work! Since we compiled everything static there shouldn’t be any dependencies other than the main system libraries form the machine you built on i.e. glibc etc..

I did builds from an Ubuntu 16.04 environment and a CentOS 7 environment. The build from 16.04 had “too new” of system libraries to run on CentOS 7. However, the build I did with CentOS 7 worked fine on Ubuntu 16.04 and 18.04 since they still support the older system libraries of CentOS 7.

Note on ‘+isomalloc_sync’

when you run namd2 on a new architecture machine like the Intel Xeon-W 2175 I’m using you will the following message at the top of the output. (or any skylake-X -W -SP)

Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run ‘echo 0 > /proc/sys/kernel/randomize_va_space’ as root to disable it, or try running with ‘+isomalloc_sync’.

If you add +isomalloc_sync to your command line for your namd2 the message will go away and you may get better performance. In my simple testing it didn’t make much difference. However, I had a collegauge report a substantial performance increase for some of his simulation runs. He is using a nice Puget Systems machine similar to mine!


NAMD Performance with Custom Build (STMV Benchmark)

[ I will do more comprehensive performance testing on a variety of hardware after NAMD 2.13 is finialized. ]

For the custom build performance testing I ran the million atom STMV benchmark. The command line for the job runs was,

namd2 +p14 +isomalloc_sync  stmv.namd

I included a build with Ubuntu 18.04 using CUDA 9.2. That utilized a more up-to-date gcc version 7.3 as opposed to the version 5.4 in the Ubuntu 16.04 builds. It is faster but I don’t recommend it at this point because it is not officially supported …yet.

The following plot show the gain in ns/day for the custom builds vs the default binary download of NAMD. I tested with the system I described at the beginning of the post with the addition of a Titan V. You should note that the performance with the Titan V would likely be better with more CPU cores available to balance the workload. Also, the numbers below are from Ubuntu 16.04 and 18.04 builds. The CentOS 7 build performance is essentially the same as that of Ubuntu 16.04.

NAMD custom build performance

Here are the numbers,

NAMD Custom Build Performance, STMV Benchmark, 1080Ti TitanV

Build day/ns ns/day
Download Binary 1080Ti 0.531 1.88
Ubuntu 16.04 Custom 1080Ti 0.484 2.07
Ubuntu 18.04 CUDA 9.2 Custom 1080Ti 0.467 2.14
Download Binary Titan V 0.440 2.27
Ubuntu 16.04 Custom Titan V 0.428 2.34
Ubuntu 18.04 CUDA 9.2 Custom Titan V 0.404 2.48

The custom builds are not a huge improvement but they are significant considering the job runs may go for days or weeks with large models and many time-steps.

I hope this help you with your research!

Happy computing –dbk