PyTorch for Scientific Computing – Quantum Mechanics Example Part 2) Program Before Code Optimizations



In this post we’ll put together working code to return the ground state energy and gradient for small atoms using the correlated Gaussian basis functions and Hamiltonian energy operator discussed in part 1. I’m not going to repeat much of the theory so have a look at Part 1) if you have no idea of what I’m talking about (that may or may not help!).

The title of Part 1) is “Doing Quantum Mechanics with a Machine Learning Framework: PyTorch and Correlated Gaussian Wavefunctions: Part 1) Introduction“. I changed the name of the series in this post to better reflect what I’m trying to convey.

The important take-away in this post is to see how straight forward it is to write code for a complicated scientific application using PyTorch. We will optimize the code for high performance in later posts.

In Part 1) we established the feasibility of using PyTorch for this problem and confirmed that “Automatic Gradients” using PyTorch autograd works as expected.

The essence of the problem is to minimize the smallest eigenvalue, $E$, of the generalized eigenvalue problem,

$$(H-ES)c = 0$$

Where $H = T + V$ is a matrix of evaluated integrals using our basis functions. There are terms for kinetic $T$ and potential $V$ energy operators, $S$ an “overlap” matrix of integral of the basis functions with themselves. $c$ is a vector (eigenvector) of the linear coefficients of the basis expansion. [… and there is a symmetry projection operator mixed in there to make sure we are computing the right wavefunction for the ground state energy.] The basis functions and energy operator were discussed in Part 1).

The energy is the equivalent of the “loss” function in a typical machine learning problem. We want to minimize the energy.


The formulas that need to be coded up

The formulas we need to compute for the matrix elements look like, (for each $k,l$ basis function)

Overlap

$$S_{kl}= 2^{3n/2}\left( \frac{\left\| L_k\right\| \left\| L_l\right\| }{\left| A_{kl}\right| }\right) ^{3/2}$$

$L_k$ and $L_l$ are lower triangular matrices of of (independent variable) parameters that will be optimized. $A_{kl} = A_k + \tau_P^{\prime }A_l\tau_P$ with $\tau$ a symmetry projection matrix. $|A|$ means determinate.

Kinetic energy $T$

$$T_{kl}=6S_{kl}\,\mathrm{tr}\left[ MA_kA_{kl}^{-1}\tau_P^{\prime }A_l\tau_P\right]$$

This formula is implemented efficiently by utilizing $\mathrm{tr}\left[AB\right]=\sum_{i,j}\left(A_{i,j}B_{i,j}^{\prime}\right)$. That is, form the term by term product of $A$ and $B$ transpose, and then add up all of the components of the product. $M$ is a matrix of reduced mass constants.

Potential energy $V$

$$V_{kl}=\frac 2{\sqrt{\pi }}S_{kl}\, \sum_{i>j}Q*R_{ij}$$

$Q$ is a lower triangular matrix of “charge products” and $R_{ij}$ is a lower triangular matrix constructed from elements in $A_{kl}^{-1}$

$$R_{ij}=\,\mathrm{\,\,} \,\left( \left\{ \begin{array}{ll} \left[A_{kl}^{-1}\right]_{ii}^{-1/2} & i=j \\ \left(\left[A_{kl}^{-1}\right]_{ii}+\left[A_{kl}^{-1}\right]_{jj} -2\left[A_{kl}^{-1}\right]_{ij}\right)^{-1/2} & i>j \end{array} \right. \right)$$

Those are very compact matrix equations for what are really quite elaborate integrals derived with matrix calculus. If those formulas were written out with summation signs instead of matrix formulas there would be several pages of ugly formulas to consider. Having formulas in matrix form is a big plus for code implementation.

Energy function

For the energy (loss) function we will not do complete eigenvalue decomposition, we’ll use the Rayleigh quotient,

$$E = \min_c\left(\frac{c’Hc}{c’Sc} \right)$$

The minimum over $c$ of the Rayleigh quotient is the smallest eigenvalue and that minimal $c$ is the corresponding eigenvector. By doing this we can optimize the linear expansion coefficients in $c$ simultaneously with the non-linear parameters that are contained in the matrices $L_k$.

The independent parameter variables for our energy (loss) function $E$ is the collection of all of the lower triangular elements of the $L_k$ matrices for all of the basis functions along with the vector of expansion coefficients $c$. Those optimization parameters will be placed in a vector x. That will give $E$ as a function of x that will return the energy and gradient at the value of x. We will minimize that.

Got it? Let’s code it up!


First “Naive” Implementation of the Problem

The first thing to do is get working code. We will speed it up by a factor of over 15,000 in later posts! (see near the end of the post for a teaser)

Don’t worry about performance yet, we just want to have something working that can be checked for correctness. In the “old-days” I would have used MATLAB for this step. In fact I had old MATLAB code for for this problem that I ran in Octave (open source ‘clone’ of MATLAB) as a check to be sure I was getting correct results. It turns out that it is nearly as easy to do this with PyTorch as it was with MATLAB! [ note: I really like MATLAB and Octave! … but, I am becoming a huge fan of PyTorch!]

You should be able to cut and paste the code blocks below into Python Jupyter notebook cells. To get PyTorch setup to work in a notebook you can follow instructions in my post Why You Should Consider PyTorch (includes Install and a few examples).

Import PyTorch

import torch as th # PyTorch is imported as torch and will be referenced as "th"

import time

Setup compute device and data-type

For these calculation float32 is not enough precision! You will need double precision i.e. float64 in order to do the optimization runs. The Titan V has great float64 performance, GeForce cards do not! We will use the CPU until we do the code optimizations. (The code as-is will run on the GPU but performance is bad.)

dtype = th.float64

gpuid = 0
#device = th.device("cuda:"+ str(gpuid))
device = th.device("cpu")

print("Execution device: ",device)
print("PyTorch version: ", th.__version__ )
print("CUDA available: ", th.cuda.is_available())
print("CUDA version: ", th.version.cuda)
print("CUDA device:", th.cuda.get_device_name(gpuid))
Execution device:  cpu
PyTorch version:  0.4.1
CUDA available:  True
CUDA version:  9.0.176
CUDA device: TITAN V

Utility functions

# Utility functions for working with lower triangular matrices

# return the lower triangle of A in column order i.e. vech(A)
def vech(A):
    count = 0
    c = A.shape[0]
    v = th.zeros(c * (c + 1) // 2, device=device, dtype=dtype)
    for j in range(c):
        for i in range(j,c):
            v[count] = A[i,j]
            count += 1
    return v

# vech2L   create lower triangular matrix L from vechA
def vech2L(v,n):
    count = 0
    L = th.zeros((n,n), device=device, dtype=dtype)
    for j in range(n):
        for i in range(j,n):
            L[i,j]=v[count]
            count += 1
    return L

Compute formulas for matrix elements

The following function matrix_elements() implements the formulas presented in the fist part of this post. The code is mostly a direct translation from the math formulas.

def matrix_elements(n, vechLk, vechLl, Sym, Mass, vecQ):
    '''
    Returns: a dictionary with the skl, tkl, vkl matrix elements
    n : the size of the Lk matrices
    vechL(k,l): vector of parameters use for Lk and Ll matrices
    Sym: a single symmetry projection matrix
    Mass: a matrix of "reduced mass" values for the particles needed for tkl
    vecQ: an array of products of particle charges needed for vkl
    '''
    # build Lk and Ll

    Lk = vech2L(vechLk,n);
    Ll = vech2L(vechLl,n);

    # apply symmetry projection on Ll

    # th.t() is shorthand for th.transpose(X, 0,1)
    PLl = th.t(Sym) @ Ll;

    # build Ak, Al, Akl, invAkl

    Ak = [email protected](Lk);
    Al = [email protected](PLl);
    Akl = Ak+Al
    invAkl = th.inverse(Akl);

    # Overlap (normalized)
    skl = 2**(3*n/2) * th.sqrt( th.pow(th.abs(th.det(Lk))*th.abs(th.det(Ll))/th.det(Akl) ,3) );

    # Kinetic energy

    tkl = skl*(6*th.trace(Mass@Ak@invAkl@Al));

    # potential energy

    TWOoSqrtPI = 1.1283791670955126 # 2/sqrt(pi)

    RIJ = th.zeros((n,n), device=device, dtype=dtype);
    # 1/rij i~=j
    for j in range(0,n-1):
        for i in range(j+1,n):
            tmp2 = invAkl[i,i] + invAkl[j,j] - 2*invAkl[i,j];
            #RIJ[i,j] = TWOoSqrtPI * skl/th.sqrt(tmp2);
            RIJ[i,j] = 1/th.sqrt(tmp2)

    # 1/rij i=j
    for i in range(0,n):
        #RIJ[i,i] = TWOoSqrtPI * skl/th.sqrt(invAkl[i,i]);
        RIJ[i,i] = 1/th.sqrt(invAkl[i,i])

    RIJ = TWOoSqrtPI*skl*RIJ
    Q = vech2L(vecQ,n);

    vkl = th.sum(RIJ*Q)

    return {'skl':skl, 'tkl':tkl, 'vkl':vkl}

You can see how close the PyTorch/Python code is to the mathematical matrix formulas. This makes it very quick to get working code running.

Test the matrix element formulas.

def test_matrix_elements():
# test input with a known result
    n = 3;
    vechLk = th.tensor([ 1.00000039208682, 0.02548044275764261, 0.3525161612610669,
                         1.6669144815242515, 0.9630555318946559, 1.8382882034659822
                       ], device=device, dtype=dtype, requires_grad=True);

    vechLl = th.tensor([ 1.3353550436464964, 0.9153272033682132, 0.7958636766525028,
                         1.8326931436447955, 0.3450426931160630, 1.8711839323167831
                       ], device=device, dtype=dtype, requires_grad=True);

    Sym = th.tensor([[0,0,1],
                    [0,1,0],
                    [1,0,0]], device=device, dtype=dtype);

    Mass = th.tensor([[5.446170e-4, 2.723085077e-4, 2.723085077e-4],
                     [2.723085077e-4, .5002723085, 2.723085077e-4],
                     [2.723085077e-4, 2.723085077e-4, .5002723085 ]], device=device, dtype=dtype);

    vecQ = th.tensor([1, -1, -1, -1, 1, -1], device=device, dtype=dtype);

    matels = matrix_elements(n, vechLk, vechLl, Sym, Mass, vecQ)

    print('skl: ',matels['skl']) # should be 0.5334
    print('tkl: ',matels['tkl']) # should be 4.3509
    print('vkl: ',matels['vkl']) # should be -2.3840
start_time = time.time()
test_matrix_elements()
print(" took {:.6f} seconds ".format(time.time() - start_time))
skl:  tensor(0.5334, dtype=torch.float64, grad_fn=)
tkl:  tensor(4.3509, dtype=torch.float64, grad_fn=)
vkl:  tensor(-2.3840, dtype=torch.float64, grad_fn=)
 took 0.047023 seconds

Those numbers are the expected output for those parameters so we can move on to computing the energy.

Define the energy “loss” function

“energy()” loops over the lower triangle of S T and V making calls to the matrix element code. These matrices are then used to create the Hamiltonian matrix H and Overlap S that are then used to compute the energy.

def energy(x,n,nb,Mass,Charge,Sym,symc):
    '''
    Returns: the energy at the point (parameters) x
    n: number of particles defined in input
    nb: number of basis functions
    Mass: matrix of "reduced" mass constants for system
    Charge: vector of "charge products" for particle pairs
    Sym: a tensor containing the symmetry projection matrices
    symc: a vector of coefficients for the symmetry projection terms
    '''

    nx = len(x);
    nn = int(n*(n+1)/2);
    nsym = len(symc);

    # extract linear coefs "eigen vector" from x
    c = x[-nb:];
    # reshape non-linear variables for easier indexing
    X = th.reshape(x[:nb*nn], (nb,nn))

    # Init H S T V
    H = th.zeros((nb,nb), device=device, dtype=dtype);
    S = th.zeros((nb,nb), device=device, dtype=dtype);
    T = th.zeros((nb,nb), device=device, dtype=dtype);
    V = th.zeros((nb,nb), device=device, dtype=dtype);

    # outer loop is over symmetry terms
    for k in range(0,nsym):

        for j in range(0,nb):
            for i in range(j,nb):

                vechLi = X[i,:]
                vechLj = X[j,:]

                matels = matrix_elements(n,vechLi,vechLj,Sym[:,:,k],Mass,Charge);

                S[i,j] += symc[k]*matels['skl'];
                T[i,j] += symc[k]*matels['tkl'];
                V[i,j] += symc[k]*matels['vkl'];

    H = T + V

    # complete upper triangle of H and S
    for i in range(0,nb):
        for j in range(i+1,nb):
            H[i,j] = H[j,i];
            S[i,j] = S[j,i];

    # The energy from the Rayleigh quotient

    cHc = c@H@c;
    cSc = c@S@c;
    eng = cHc/cSc;

    return eng

Test the energy

The following function defines the constants for Li atom (infinite nuclear mass). There is sample input for an 8 term wavefunction with the non-linear parameters in x1.

def test_energy():

    Mass = th.tensor([[0.5, 0.0, 0.0],
                     [0.0, 0.5, 0.0],
                     [0.0, 0.0, 0.5]], device=device, dtype=dtype);

    Charge = th.tensor([-3, 1, 1, -3, 1, -3], device=device, dtype=dtype);

    # symmetry projection terms
    Sym = th.zeros((3,3,6), device=device, dtype=dtype)
    # (1)(2)(3)
    Sym[:,:,0] = th.tensor([[1,0,0],[0,1,0],[0,0,1]], device=device, dtype=dtype);
    # (12)
    Sym[:,:,1] = th.tensor([[0,1,0],[1,0,0],[0,0,1]], device=device, dtype=dtype);
    # (13)
    Sym[:,:,2] = th.tensor([[0,0,1],[0,1,0],[1,0,0]], device=device, dtype=dtype);
    # (23)
    Sym[:,:,3] = th.tensor([[1,0,0],[0,0,1],[0,1,0]], device=device, dtype=dtype);
    # (123)
    Sym[:,:,4] = th.tensor([[0,1,0],[0,0,1],[1,0,0]], device=device, dtype=dtype);
    # (132)
    Sym[:,:,5] = th.tensor([[0,0,1],[1,0,0],[0,1,0]], device=device, dtype=dtype);

    # coeff's
    symc = th.tensor([4.0,4.0,-2.0,-2.0,-2.0,-2.0], device=device, dtype=dtype);

    n=3;
    nb=8;

    xvechL=th.tensor([
         1.6210e+00, -2.1504e-01,  9.0755e-01,  9.7866e-01, -2.8418e-01,
        -3.5286e+00, -3.3045e+00, -4.5036e+00, -3.2116e-01, -7.1901e-02,
         1.5167e+00, -8.4489e-01, -2.1377e-01, -3.6127e-03, -5.3774e-03,
        -2.1263e+00, -2.5191e-01,  2.1235e+00, -2.1396e-01, -1.4084e-03,
        -1.0092e-02,  4.5349e+00,  9.4837e-03,  1.1225e+00, -2.1315e-01,
         5.8451e-02, -4.9410e-03,  5.0853e+00,  7.3332e-01,  5.0672e+00,
        -2.1589e-01, -6.8986e-03, -1.4310e-02,  1.5979e+00,  3.3946e-02,
        -8.7965e-01, -1.1121e+00, -2.1903e-03, -4.6925e-02,  2.1457e-01,
         3.3045e-03,  4.5120e+00, -2.1423e-01, -1.6493e-02, -2.3429e-03,
        -8.6715e-01, -6.7070e-02,  1.5998e+00
     ], device=device, dtype=dtype, requires_grad=False)

    evec = th.tensor([
      -6.0460e-02,  7.7708e-05, 1.6152e+00,  9.5443e-01,
      1.1771e-01,  3.2196e+00,  9.6344e-01, 3.1398e+00
    ], device=device, dtype=dtype, requires_grad=False)

    # join the parameters into x1
    x1 = th.tensor(th.cat((xvechL,evec)), device=device, dtype=dtype, requires_grad=True)

    eng = energy(x1,n,nb,Mass,Charge,Sym,symc)
    print(eng) # Should return -7.3615 at this point x1
    return x1
start_time = time.time()
xrestart = test_energy()
print(" took {} seconds ".format(time.time() - start_time))
tensor(-7.3615, dtype=torch.float64, grad_fn=)
 took 0.11813998222351074 seconds

With those values of the parameters for x1 the the energy is as expected. Note the “requires_grad=True” attribute for x1. That tells PyTorch to keep track of those parameters for computing gradients.

Optimize energy using PyTorch Autograd gradients

def opt_energy(steps=1, num_basis=8):

    ############################################################################
    # Input constants for Li atom (infinite nuclear mass)
    ############################################################################
    Mass = th.tensor([[0.5, 0.0, 0.0],
                     [0.0, 0.5, 0.0],
                     [0.0, 0.0, 0.5]], device=device, dtype=dtype);

    Charge = th.tensor([-3, 1, 1, -3, 1, -3], device=device, dtype=dtype);

    # symmetry projection terms
    Sym = th.zeros((3,3,6), device=device, dtype=dtype)
    # (1)(2)(3)
    Sym[:,:,0] = th.tensor([[1,0,0],[0,1,0],[0,0,1]], device=device, dtype=dtype);
    # (12)
    Sym[:,:,1] = th.tensor([[0,1,0],[1,0,0],[0,0,1]], device=device, dtype=dtype);
    # (13)
    Sym[:,:,2] = th.tensor([[0,0,1],[0,1,0],[1,0,0]], device=device, dtype=dtype);
    # (23)
    Sym[:,:,3] = th.tensor([[1,0,0],[0,0,1],[0,1,0]], device=device, dtype=dtype);
    # (123)
    Sym[:,:,4] = th.tensor([[0,1,0],[0,0,1],[1,0,0]], device=device, dtype=dtype);
    # (132)
    Sym[:,:,5] = th.tensor([[0,0,1],[1,0,0],[0,1,0]], device=device, dtype=dtype);

    # coeff's
    symc = th.tensor([4.0,4.0,-2.0,-2.0,-2.0,-2.0], device=device, dtype=dtype);
    ############################################################################


    n=3
    nb=num_basis
    th.manual_seed(3)
    x1 = th.empty(int(nb*n*(n+1)/2 + nb), device=device, dtype=dtype, requires_grad=True)
    th.nn.init.uniform_(x1, a=-.8, b=.8)

    #x1 = xrestart

    optimizer = th.optim.Rprop([x1], lr=0.001, etas=(0.5, 1.2), step_sizes=(1e-06, 50))

    for i in range(steps):
        optimizer.zero_grad()
        loss = energy(x1,n,nb,Mass,Charge,Sym,symc)
        loss.backward()
        optimizer.step()

        if (i<10 or not i%10):print('step: {:5}  f: {:4.12f}  gradNorm: {:.9f}'.format(i, loss, th.norm(x1.grad)))
    print('step: {:5}  f: {:4.12f}  gradNorm: {:.9f}'.format(i, loss, th.norm(x1.grad)))
    return x1

"opt_energy()" is the main program. I have the problem input constants setup here as well as the optimization code. This runs optimization steps on our energy function. The gradient for the optimization is evaluated by "back propagation" in the statement loss.backward(). I used the name loss in honor of standard Machine Learning name for the function being minimized.

I experimented with some of the machine learning optimization routines in PyTorch and found that "Rprop" worked really well for this problem. I used the a utility from the neural network module "nn" for the random parameter initialization. It was great having these kinds of tools in the framework for experimenting with!

You can run this code and it will optimize a wavefunction for the Li atom ground state energy. It will be converging toward a value of -7.478060 Hartrees' (an energy unit typically used in quantum chemistry). Running this for 100 "epochs" i.e. optimization steps gives the following output.

start_time = time.time()
xrestart = opt_energy(steps=100, num_basis=8)
print(" took {} seconds ".format(time.time() - start_time))
step:     0  f: -0.782443089544  gradNorm: 6.657278345
step:     1  f: -0.813405818204  gradNorm: 6.663333179
step:     2  f: -0.850565814089  gradNorm: 6.664066434
step:     3  f: -0.895095668330  gradNorm: 6.655362804
step:     4  f: -0.948326026794  gradNorm: 6.631036613
step:     5  f: -1.011714048225  gradNorm: 6.582179112
step:     6  f: -1.086766887327  gradNorm: 6.496718392
step:     7  f: -1.174915424078  gradNorm: 6.359865888
step:     8  f: -1.277512892162  gradNorm: 6.162854834
step:     9  f: -1.395322269810  gradNorm: 5.891109978
step:    10  f: -1.528494104459  gradNorm: 5.568453572
step:    20  f: -4.012585110663  gradNorm: 2.752546989
step:    30  f: -6.571789033344  gradNorm: 1.543067756
step:    40  f: -7.223244601602  gradNorm: 0.284565245
step:    50  f: -7.289228643587  gradNorm: 0.149059481
step:    60  f: -7.353758373450  gradNorm: 0.142130627
step:    70  f: -7.368628179269  gradNorm: 0.069057239
step:    80  f: -7.378787154917  gradNorm: 0.062784281
step:    90  f: -7.389791432408  gradNorm: 0.047265743
step:    99  f: -7.398805061984  gradNorm: 0.095278731
 took 28.317583560943604 seconds

This is a very small 8 term wavefunction so it wont give a great result and the current code is too slow for large numbers of basis functions.

The next run is for a wavefunction with 512 basis terms. This is one step of the optimization which will give us a time for one energy and gradient evaluation. (this is from a random start point)

start_time = time.time()
xrestart = opt_energy(steps=1, num_basis=512)
print(" took {} seconds ".format(time.time() - start_time))
step:     0  f: -0.720436686340  gradNorm: 4.060212842
 took 2768.9312148094177 seconds

2768 seconds for one step is too long!

I'll tease you and do the same thing with my fast version of the code. After all of the code optimizations in the next few posts and running on a Titan V GPU we'll be able to bring that execution time down to the following,

start_time = time.time()
xrestart = opt_energyrc(steps=1,num_basis=512)
print(" took {:.4f} seconds ".format(time.time() - start_time))
step:     0  f: 82.210051671516  gradNorm: 495.193096479
 took 0.1683 seconds

Yes!, from 2768 seconds down to 0.1683 seconds! That's 16452 Times Faster!! Yes, really, that much. Some of the simple code optimizations will be surprising by how much difference they make.

I'll be taking a vacation next week so it may be two weeks before I start writing up the code optimizations. I will be worth the wait. I promise!

Happy computing --dbk