Thermodynamic Integration

From AlchemistryWiki
Jump to navigation Jump to search

Thermodynamic Integration (TI) is one of the most common methods for calculating free energy differences, and the easiest to understand. The basic relationship can be calculated by taking the derivative of the free energy difference with respect to [math]\displaystyle{ \lambda }[/math]. It is one of the few methods that require calculation of [math]\displaystyle{ \frac{\partial U(\lambda,\vec{q})}{\partial\lambda} }[/math]. The need to calculate the derivative is also one of its limitations as many simulation packages will not calculate this natively, and can cause problems when numerically evaluating it at [math]\displaystyle{ r = 0 }[/math]. Despite this, it is still one of the most accurate methods if used correctly and it is generally recommended for those new to the field because of its simplicity and ease of use if available in your code of interest.

Derivation

Starting with the identity of free energy

[math]\displaystyle{ \displaystyle A = -\beta^{-1} \ln Q }[/math]

taking the derivative with respect to [math]\displaystyle{ \lambda }[/math] yields

[math]\displaystyle{ \displaystyle dA/d\lambda = -\beta^{-1}\frac{d}{d\lambda} \ln \int e^{-\beta U(\lambda,\vec{q})}d\vec{q} = -\beta^{-1} \frac{\frac{d}{d\lambda}\int e^{-\beta U(\lambda,\vec{q})}d\vec{q}}{Q} }[/math]

which can then be written as

[math]\displaystyle{ \displaystyle dA/d\lambda = -\beta^{-1}\frac{-\beta\int \frac{dU(\lambda,\vec{q})}{d\lambda} e^{-\beta U(\lambda,\vec{q})}d\vec{q}}{Q} = \left\langle \frac{dU(\lambda,\vec{q})}{d\lambda}\right\rangle_\lambda }[/math].

Finally, one can do integration over the whole range of [math]\displaystyle{ \lambda }[/math] to get the final TI equation

[math]\displaystyle{ \displaystyle \Delta A = \int_0^1 \left\langle \frac{dU(\lambda,\vec{q})}{d\lambda}\right\rangle_\lambda d\lambda }[/math].

Estimating Free Energies with TI

The above derivation makes it rather simple to estimate free energies from TI as there is no iterative solution needed, and information from only a single state is needed to calculate the derivative. Since we can only perform simulations at a number of [math]\displaystyle{ \lambda }[/math] states, numeric integration schemes are required.

All numeric integration schemes have the form

[math]\displaystyle{ \displaystyle \Delta A \approx \sum_{k=1}^K w_k \left\langle \frac{dU(\lambda,\vec{q})}{d\lambda}\right\rangle_k }[/math]

where the weights, [math]\displaystyle{ w_k }[/math] will depend on which numeric integration style is chosen. We recommend starting users in free energy calculations use the trapezoid rule as it is very straightforward and maximizes flexibility in choice of [math]\displaystyle{ \lambda }[/math] spacing. Under the trapezoid rule, even lambda spacing weights are [math]\displaystyle{ \displaystyle w_1 = w_k = 1/[2(K-1)] }[/math] and [math]\displaystyle{ w_{k \ne 1,K} = 1/(K-1) }[/math]. Other weighting schemes have also been tried but are not recommended for beginners as they each have their own benefits and disadvantages and are often system specific. [1][2] The worst case scenario is that you have to run a few extra intermediate states to get accurate results.

Calculating the Statistical Error in TI

The statistical error TI is fairly straightforward to calculate, but has one catch beginners should be aware of. Although the information required to calculate [math]\displaystyle{ \left\langle \frac{dU}{d\lambda} \right\rangle }[/math] requires information from only the one state, to calculate the free energy over single interval takes values from two or more states at a time, which means that the statistical error over the entire interval does not add independently.

Let's look at this with all the mathematical details. The total statistical error is the square root of the variance. Since each of the averages is generated from different simulations, the total variance for TI over the entire interval is a weighted sum of the variances:

[math]\displaystyle{ \mathrm{var}\left(\Delta A\right) = \sum_{k=1}^{K}w_k^2 \mathrm{var}\left(\frac{dU}{d\lambda}\right)_k }[/math].

where the weights are the weights determined by the numerical integration method. To illustrate how this is different from independent addition of the errors or free energies of each interval together, consider the trapezoid rule example. Taking into account the correct equation the variance would be

[math]\displaystyle{ \mathrm{var}\left(\Delta A_{1,K}\right) = \frac{1}{4}\mathrm{var}\left(\frac{dU}{d\lambda}\right)_1 + \mathrm{var}\left(\frac{dU}{d\lambda}\right)_2 + \cdots + \mathrm{var}\left(\frac{dU}{d\lambda}\right)_{K-1} +\frac{1}{4}\mathrm{var}\left(\frac{dU}{d\lambda}\right)_K }[/math].

Compare this to the incorrect method shown below.

[math]\displaystyle{ \mathrm{var}\left(\Delta A_{i,i+1}\right) = \frac{1}{4}\mathrm{var}\left(\frac{dU}{d\lambda}\right)_i + \frac{1}{4}\mathrm{var}\left(\frac{dU}{d\lambda}\right)_{i+1} }[/math]
[math]\displaystyle{ \begin{alignat}{2} \mathrm{badvar}\left(\Delta A_{1,K}\right) &=\sum_{i=1}^{K-1}\mathrm{var}\left(\Delta A_{i,i+1}\right) \\ &= \frac{1}{4}\mathrm{var}\left(\frac{dU}{d\lambda}\right)_1 + \frac{1}{2}\mathrm{var}\left(\frac{dU}{d\lambda}\right)_2 + \cdots + \frac{1}{2}\mathrm{var}\left(\frac{dU}{d\lambda}\right)_{K-1} +\frac{1}{4}\mathrm{var}\left(\frac{dU}{d\lambda}\right)_K \\ \end{alignat} }[/math]

The second set of equations is clearly quite different from the correct first set.

To get the correct statistical error, each individual average should be multiplied by [math]\displaystyle{ \sqrt{2\tau} }[/math] to correct for the correlation time at each state; error is then [math]\displaystyle{ \sqrt{\mathrm{var}\left(\Delta A_{i,K}\right)} }[/math].

Problems with TI

Although TI is one of the simplest free energy methods to analyze, it also suffers from some drawbacks that need to be carefully avoided. For instance, if the curvature of [math]\displaystyle{ \frac{dU}{d\lambda} }[/math] is large, the bias introduced by discrete [math]\displaystyle{ \lambda }[/math] states becomes significant. So when using TI it is very important that researchers verify that they have gathered data from sufficient numbers of states, such that the free energy becomes independent of the number of states to within statistical precision.

Infinite [math]\displaystyle{ dU/d\lambda }[/math]

One of the largest problems with TI is evaluation of [math]\displaystyle{ \frac{dU}{d\lambda} }[/math] at [math]\displaystyle{ r = 0 }[/math] when standard Lennard-Jones potentials are used. The simplest thermodynamic pathway one can chose is the linear pathway of

[math]\displaystyle{ U(\lambda,\vec{q}) = (1-\lambda)U_0(\vec{q}) + \lambda U_1(\vec{q}) + U_{unaffected}(\vec{q}) }[/math]

which is acceptable for changes in parameters, but not for annihilating or decoupling a site because of the [math]\displaystyle{ r^{-12} }[/math] term in the Lennard-Jones potentials in [math]\displaystyle{ U_i }[/math]. The linear transformation will always have an infinite potential at [math]\displaystyle{ r=0 }[/math] leading to numeric instabilities for evaluating [math]\displaystyle{ \frac{dU}{d\lambda} }[/math] in TI. Although there are ways to get around this, they do not converge very well with any function of [math]\displaystyle{ \lambda }[/math] taking the form shown above. However, if one were to use a soft core potential, this problem can be mostly avoided.

Modifying Simulations

Because most other free energy methods do not need to evaluate [math]\displaystyle{ \frac{dU}{d\lambda} }[/math], many simulation codes do not natively support evaluating this quantity. If the thermodynamic path is constructed with a linear transformation, then the derivative can be evaluated in post-processing knowing the energy of the system. However, if the transformation is done with soft core potentials, then the derivative will need to be evaluated in code, and it will often be necessary for researchers to modify the code in-house as many simulation packages do not evaluate this quantity. It would probably better to use another method if the derivative is not explicitly calculated.

References

  1. Jorge, M., Garrido, N., Queimada, A., Economou, I., and Macedo, E. (2010) Effect of the Integration Method on the Accuracy and Computational Efficiency of Free Energy Calculations Using Thermodynamic Integration. J. Chem. Theo. Comp. 6, 1018–1027.
  2. Shyu, C., and Ytreberg, F. M. (2009) Reducing the bias and uncertainty of free energy estimates by using regression to fit thermodynamic integration data. J. Comp. Chem. 30, 2297–2304.