NAMD Custom Build for Better Performance on your Modern GPU Accelerated Workstation -- Ubuntu 16.04, 18.04, CentOS 7Written on July 20, 2018 by Dr Donald Kinghorn
- Setup your Build Environment
- Configure and Build NAMD
- 1) Download the NAMD Source
- 2) Build and test "charm"
- 3) Setup the configuration for the NAMD build
- 3.a) tcl dependency
- 3.b) Setup to link in Intel MKL fftw libraries
- 3.c) Set CUDA configuration
- 3.d) Add -march=native to the main configuration file
- 4) Run config to create the build directory
- 5) Do the build (make)
- 6) Create a distribution release
- Note on '+isomalloc_sync'
- NAMD Performance with Custom Build (STMV Benchmark)
* NAMD Custom Build Performance, STMV Benchmark, 1080Ti TitanV
This post will focus on compiling NAMD from source for good performance on modern NVIDIA GPU's utilizing CUDA. Doing a custom NAMD build from source code gives a moderate but significant boost in performance. This can be important considering that large simulations over many time-steps can run for days or weeks. I wanted to do some custom NAMD builds to ensure that that modern Workstation hardware was being well utilized.
I often use NAMD for hardware performance testing and over the past year I have been using the nicely optimized build available in the NVIDIA NGC docker image. I am an advocate of using docker (see this link and links therein). However, for most workflows it is more appropriate and common to use a traditional Workstation setup. We will go through a relatively simple build process to get a NAMD that will give excellent performance on modern Workstation hardware.If you are impaitent there is a plot and table of performance results near the bottom of the post.
I used the phrase "modern Workstation hardware" a couple of times above. By that I mean a Workstation (or node) based around a current Intel CPU with the Skylake-X, -W, or -SP architecture. Together with an NVIDIA GPU based on Pascal or Volta architecture. For example, 1080Ti, Titan Xp, or Titan V. In particular I am using my personal machine with an Intel Xeon-W 2175 14-core CPU and an NVIDIA 1080Ti GPU similar to what you could configure as our "Peak Single".
I will be doing more comprehensive hardware performance testing with NAMD in a later post. I will wait for the finial NAMD 2.13 version release before I do that. However, I will include some performance numbers near the end of this post so that you can see the nice speedup from doing your own NAMD build... I also include numbers using a Titan V.
If you search though the (many) posts I have on the Puget Systems HPC blog you will find several posts related to NAMD that I have done over the last few years. The last time I did a post that included building NAMD from source was nearly three years ago, Molecular Dynamics Performance on GPU Workstations -- NAMD.
NAMD is a molecular dynamics program developed and maintained by the Theoretical and Computational Biophysics Group at the University of Illinois at Urbana-Champaign. It is licensed by the University of Illinois and is made freely available, including source code, under a non-exclusive, non-commercial use license.
NAMD is a widely used molecular dynamics program capable of performing simulations on systems with millions of atoms (approaching billions of atoms!). It is also highly parallel and is often installed on large compute clusters. The underlying parallelism is achieved by using the underlying parallel objects framework charm++.
The group at UIUC working on NAMD were early pioneers of using GPU's for compute acceleration and NAMD has very good performance acceleration using NVIDIA CUDA.
NAMD is representative of a larger group of Molecular Dynamics packages including LAMMPS, GROMACS, Amber ... All of the modern Molecular Dynamics programs have very good GPU compute acceleration for important aspects of there functionality and performance.
Setup your Build Environment
For the NAMD build that I describe below you will need to have NVIDIA CUDA 9.0, Intel MKL and a standard set of gcc/g++ development tools. I did builds in an Ubuntu 16.04 and CentOS 7 environment. The build process is the same for both of these Linux distributions but the setup is of course a little different.
I assume you have your OS running and have recent NVIDIA drivers installed.
You may already have everything you need installed but the following should ensure that.
CentOS 7 (using "root" or sudo)
yum groupinstall "Development Tools"
sudo apt-get install build-essential
You don't necessarily need to use MKL but I do in general recommend it when you can. This will mostly be providing the FFT libraries which are highly optimized in MKL.
Intel makes their optimized numerical libraries, MKL, available free of charge. Thank you Intel! MKL is simple to install and is the same procedure on both Ubuntu 16.04 and CentOS 7.
Download MKL from Intel's "MATH KERNEL LIBRARY" site. You will have to register on the site. After that you will have access to all of the Intel's performance libraries including MKL. I used version "2018 Update 3". Install is straight forward,
tar xf l_mkl_2018.3.222.tgz cd l_mkl_2018.3.222 sudo ./install.sh
Then just follow the prompts. I assume you will install to the default location under /opt/intel.
Installing CUDA is a little more involved but not too bad. I used CUDA 9.0 in my build mostly because I know that the NAMD developers have tested with it. If you have some other version of CUDA already installed you can try using that. It would be similar to what I describe for 9.0.
I am assuming you have an up-to-date NVIDIA display driver installed (that provides the CUDA runtime). You will then be best to install CUDA from the ".run" file so that you can exclude installing the driver during the install.
To get CUDA 9.0 you will need to go to the "CUDA Toolkit Archive". Follow the "buttons" until you get to the "runfile (local)" download links. There is the main .run file and 3 patches. You should probably download the patches and install them too. Installation is straight forward but do be careful to read the prompts and say No to installing the driver.
sudo sh cuda_9.0.176_384.81_linux.run
The patches will install in the same way. It's OK to install version 9.0 along side of any other versions you may already have installed. Use the default location in /usr/local and you may want to skip creating the symbolic link if you have another version already linked. We will set the build to use /usr/local/cuda-9.0 in later steps.
I actually used docker with the NVIDA runtime for my build environments. My system is running Ubuntu 18.04 and I have a docker/nvidia-docker setup on my machine. I described how to set this up in a 5 part series of post the last of which is, How-To Setup NVIDIA Docker and NGC Registry on your Workstation - Part 5 Docker Performance and Resource Tuning. That post has links to the earlier posts in the series. Using my docker install I setup a CentOS 7 build environment from the following container,
docker run --runtime=nvidia -it -v $HOME/projects:/projects nvidia/cuda:9.0-devel-centos7
I have my "projects" directory bound into the container and I have the NAMD source and the MKL install directory in subdirectories of that so I can access the code from my host systems and from running docker container images.
In that container I installed MKL and dev tools.
yum groupinstall "Development Tools" yum install emacs-nox which wget cd l_mkl_2018.3.222 sudo ./install.sh
I then saved that container as (it had an assigned name "modest_mahavira")
docker commit -a "dbk" -m "CentOS 7.4 with CUDA 9.0 and MKL build env" modest_mahavira cuda9.0-mkl-centos74
I did seup a similar container for an Ubuntu 16.04 environment.
I like the flexibility of using docker but you can certainly do your NAMD build without it.
Configure and Build NAMD
With your build environment setup you can now configure and build NAMD from source. What follows are the details of the choices I made. You may want to do something else but these steps should help you get started.
Note: I am building from the "Nightly Build" source for the upcoming NAMD version 2.13. I'll add some notes about building from the old 2.12 source in a later section.
The basic procedure is,
- Get the NAMD source
- Build and test "charm"
- Edit a few "arch" files for the NAMD build
- Run "config" with the appropriate options
- Do "make" for that configuration
- Create a "release" and test
1) Download the NAMD Source
Go to the NAMD download page and click on "Source Code" for the "Version Nightly Build (...) Platforms:". If you have not registered for the download you will need to create an account to access the files. Please don't hesitate to do this. It's part of how they get data to justify their continued funding for the development work.
Create a directory for your build and expand the source tar file(s). This would be something similar to the following,
mkdir NAMD-build cd NAMD-build mv ~/Downloads/NAMD_Git-2018-07-17_Source.tar.gz . tar xf NAMD_Git-2018-07-17_Source.tar.gz cd NAMD_Git-2018-07-17_Source/ tar xf charm-6.8.2.tar
2) Build and test "charm"
This is very simple because they have created a question-answer script to guide you through an appropriate configuration. All you have to do is cd into the charm directory and run build without any arguments.
cd charm-6.8.2/ ./build
Here's my answers to the questions,
============================================================ Begin interactive charm configuration ... If you are a poweruser expecting a list of options, please use ./build --help ============================================================ Are you building to run just on the local machine, and not across multiple nodes? [y/N] y Do you want to specify a compiler? [y/N]N Do you want to specify any Charm++ build options, such as fortran compilers? [y/N]N Choose a set of compiler flags [1-5] 1) none 2) debug mode -g -O0 3) production build [default] --with-production 4) production build w/ projections --with-production --enable-tracing 5) custom 5 Enter compiler options: --with-production -march=native What do you want to build? 1) Charm++ [default] (choose this if you are building NAMD) 2) Charm++ and AMPI 3) Charm++, AMPI, ParFUM, FEM and other libraries 1 Do you want to compile in parallel? 1) No 2) Build with -j2 3) Build with -j4 4) Build with -j8 5) Build with -j16 [default] 6) Build with -j32 7) Build with -j 4 We have determined a suitable build line is: ./build charm++ multicore-linux-x86_64 -j8 --with-production -march=native Do you want to start the build now? [Y/n]Y
Easy! The most "unusual" thing I did here was add the -march=native flag to give the compiler a better clue about optimizations that could be made.
Now give it a quick test,
cd tests/charm++/megatest/ make pgm
I have 14 cores in my system so I'll use +p14,
This runs very fast on a modern system and most tests will report 0.00 sec for time. It should not give any errors.
3) Setup the configuration for the NAMD build
There are 3 things we'll need to do here,
- Satisfy the "tcl" dependency
- Setup the correct path to the Intel MKL fftw3 libraries (2 files)
- Setup the path for the CUDA libraries
We will be editing a few files in the "arch" directory. Our main "arch" file is Linux-x86_64-g++.arch. The 5 files we need to edit are Linux-x86_64.tcl, Linux-x86_64.mkl, Linux-x86_64.fftw3, Linux-x86_64.cuda9 and Linux-x86_64-g++.arch. We'll also create a link from Linux-x86_64.cuda9 to Linux-x86_64.cuda. Most of the edits are trivial.
3.a) tcl dependency
The easiest thing to do for this is to get the tar file that UIUC uses for their builds and put it in the soruce directory and then configure the "arch".tcl file.
wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64-threaded.tar.gz tar xf tcl8.5.9-linux-x86_64-threaded.tar.gz
It should look like this when you are done, (just fixing the path)
#TCLDIR=/Projects/namd2/tcl/tcl8.5.9-linux-x86_64 TCLDIR=../tcl8.5.9-linux-x86_64-threaded TCLINCL=-I$(TCLDIR)/include #TCLLIB=-L$(TCLDIR)/lib -ltcl8.5 -ldl TCLLIB=-L$(TCLDIR)/lib -ltcl8.5 -ldl -lpthread TCLFLAGS=-DNAMD_TCL TCL=$(TCLINCL) $(TCLFLAGS)
3.b) Setup to link in Intel MKL fftw libraries
If you are using the the MKL libraries for a fast FFT all you need to do is get the path corrected. I am assuming that you used the default paths when you installed MKL.
Set MKLROOT and change options so that we do static linking. ("static" will make the executable larger but you will be able to run it on systems that do not have the MKL dynamic libraries installed.)
MKLROOT=/opt/intel/mkl/ FFTDIR=$(MKLROOT) FFTINCL=-I$(FFTDIR)/include/fftw #FFTLIB=-L$(FFTDIR)/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core FFTLIBDIR=$(FFTDIR)/lib/intel64 FFTLIB=-Wl,--start-group $(FFTLIBDIR)/libmkl_intel_lp64.a $(FFTLIBDIR)/libmkl_sequential.a $(FFTLIBDIR)/libmkl_core.a -Wl,--end-group FFTFLAGS=-DNAMD_FFTW -DNAMD_FFTW_3 FFT=$(FFTINCL) $(FFTFLAGS)
NAMD "config" will reference this file when we pass the "--with-mkl" flag. We also need to edit the Linux-x86_64.fftw3 file.
We are making several changes in this file. It should look like this when you are done,
FFTDIR=$(MKLROOT) FFTINCL=-I$(MKLROOT)/include -I$(FFTDIR)/include/fftw FFTLIB= -mkl FFTFLAGS=-DNAMD_FFTW -DNAMD_FFTW_3 FFT=$(FFTINCL) $(FFTFLAGS)
3.c) Set CUDA configuration
For CUDA we will need to fix the path and create a symbolic link to the correct ".cuda" file so we can use the "--with-cuda" configure flag for the NAMD build.
I'm only changing the fist line to get the path set to the CUDA install default, but, here's what the whole file should look like,
CUDADIR=/usr/local/cuda-9.0 CUDAINCL=-I$(CUDADIR)/include CUBDIR=.rootdir/cub CUBINCL=-I$(CUBDIR) CUDALIB=-L$(CUDADIR)/lib64 -lcufft_static -lculibos -lcudart_static -lrt CUDASODIR=$(CUDADIR)/lib64 LIBCUDARTSO= CUDAFLAGS=-DNAMD_CUDA CUDAOBJS=$(CUDAOBJSRAWSTATIC) CUDA=$(CUDAFLAGS) -I. $(CUDAINCL) $(CUBINCL) CUDACC=$(CUDADIR)/bin/nvcc -Xcompiler "-m64" CUDACCOPTS=-O3 --maxrregcount 48 $(CUDAGENCODE) $(CUDA) # limit CUDADLINKOPTS to architectures available in libcufft_static CUDADLINKOPTS=-gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,\ code=sm_60 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 CUDAGENCODE=-gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,co\ de=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_61,code=\ compute_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70
Notice that we are getting CUDA arch from Kepler through Volta.
If you only wanted to include Pascal and Volta you could take out the flags smaller than "60" in CUDALINKOPTS and CUDAGENCODE.
Rename arch/Linux-x86.cuda and create a link to Linux-x86.cuda9
The default file that gets read when you use the "--with-cuda" configure flag is Linux-x86.cuda but that file is setup for CUDA 8.0. We'll rename that file to Linux-x86.cuda8 and make a link from Linux-x86.cuda to Linux-x86.cuda9.
mv Linux-x86_64.cuda Linux-x86_64.cuda8 ln -s Linux-x86_64.cuda9 Linux-x86_64.cuda
3.d) Add -march=native to the main configuration file
We giving the compiler a hint to optimize for the system that we are building on.
NAMD_ARCH = Linux-x86_64 CHARMARCH = multicore-linux-x86_64 CXX = g++ -m64 -std=c++0x -O3 -march=native CXXOPTS = -fexpensive-optimizations -ffast-math CC = gcc -m64 -O3 -march=native COPTS = -fexpensive-optimizations -ffast-math
4) Run config to create the build directory
With the setup from the above section, creating the build configuration is simple using the appropriate config flags. From the main source directory run,
./config Linux-x86_64-g++ --with-mkl --with-cuda
That creates the directory Linux-x86_64-g++ with all the appropriate links to the correct configuration files including the main "Makefile".
5) Do the build (make)
cd to Linux-x86_64-g++ and run make. I'll add -j14 since I have 14 cores I can use for the compile.
cd Linux-x86_64-g++ make -j14
That should build without errors. A few warnings are OK.
6) Create a distribution release
This is a nice feature of the build setup. This will create a compressed tar file for you. In my case it was,
You can move this file around to other directories or machines and then un-tar it and go to work! Since we compiled everything static there shouldn't be any dependencies other than the main system libraries form the machine you built on i.e. glibc etc..
I did builds from an Ubuntu 16.04 environment and a CentOS 7 environment. The build from 16.04 had "too new" of system libraries to run on CentOS 7. However, the build I did with CentOS 7 worked fine on Ubuntu 16.04 and 18.04 since they still support the older system libraries of CentOS 7.
Note on '+isomalloc_sync'
when you run namd2 on a new architecture machine like the Intel Xeon-W 2175 I'm using you will the following message at the top of the output. (or any skylake-X -W -SP)
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
If you add
+isomalloc_sync to your command line for your namd2 the message will go away and you may get better performance. In my simple testing it didn't make much difference. However, I had a collegauge report a substantial performance increase for some of his simulation runs. He is using a nice Puget Systems machine similar to mine!
NAMD Performance with Custom Build (STMV Benchmark)
[ I will do more comprehensive performance testing on a variety of hardware after NAMD 2.13 is finialized. ]
For the custom build performance testing I ran the million atom STMV benchmark. The command line for the job runs was,
namd2 +p14 +isomalloc_sync stmv.namd
I included a build with Ubuntu 18.04 using CUDA 9.2. That utilized a more up-to-date gcc version 7.3 as opposed to the version 5.4 in the Ubuntu 16.04 builds. It is faster but I don't recommend it at this point because it is not officially supported ...yet.
The following plot show the gain in ns/day for the custom builds vs the default binary download of NAMD. I tested with the system I described at the beginning of the post with the addition of a Titan V. You should note that the performance with the Titan V would likely be better with more CPU cores available to balance the workload. Also, the numbers below are from Ubuntu 16.04 and 18.04 builds. The CentOS 7 build performance is essentially the same as that of Ubuntu 16.04.
Here are the numbers,
NAMD Custom Build Performance, STMV Benchmark, 1080Ti TitanV
|Download Binary 1080Ti||0.531||1.88|
|Ubuntu 16.04 Custom 1080Ti||0.484||2.07|
|Ubuntu 18.04 CUDA 9.2 Custom 1080Ti||0.467||2.14|
|Download Binary Titan V||0.440||2.27|
|Ubuntu 16.04 Custom Titan V||0.428||2.34|
|Ubuntu 18.04 CUDA 9.2 Custom Titan V||0.404||2.48|
The custom builds are not a huge improvement but they are significant considering the job runs may go for days or weeks with large models and many time-steps.
I hope this help you with your research!
Happy computing --dbk