Simulating States of Interest

From AlchemistryWiki
Jump to navigation Jump to search



This section should serve as a guide to preparation of the simulation as a whole and data collection, not as an answer to the question of "which simulation software should I use?" So long as the software can handle free energy calculations (or you can make it), and collects the information the analysis method that you choose needs, then any software is acceptable. Some software may be more suited to certain methods than others, but analysis of this is beyond the scope of this page.

There are three core things you need when running simulations (although more will be discussed below); those items are:

  1. Simulations must be run at equilibrium. Special rules apply when running nonequilibrium simulations.
  2. Samples must be collected from states of interest.
  3. Samples must be uncorrelated and independent.

Running Simulations at Equilibrium

Each and every simulation you run should be done under equilibrium. This is true not only for the end points, but also for every intermediate state you have. Because rare events are often given such a large importance in most simulation packages, running at nonequilibrium will give you the wrong results. That said, there are some methods which require nonequilibrium conditions, but those must start at equilibrium before perturbing the system. It should be said upfront that no hard and fast rules exists for determining equilibration time, meaning that it falls to the researcher to determine if the system is equilibration correctly.

It is actually rather difficult for your simulation to start in equilibrium, especially for explicit solvent systems where the solvent is automatically generated. Many simulation software packages actually run or recommend you run an energy minimization step prior to beginning to correct for possible overlapping molecules, however, these alone are not enough to sufficiently bring the system into equilibrium. The best way to do this is to actually run the simulation for a period of time and then simply exclude those samples from analysis. This equilibration must be done at every intermediate too.

One efficient scheme to equilibration is to run short (10-100 ps) simulations ate each state, then restart/begin collecting data with the configuration at the end of this time window. This will not guarantee a full relaxation of the system, but it does significantly help reduce potential instabilities in your simulation; full relation can take 100s to 1000s of ps or longer in some cases.[1] As [math]\displaystyle{ \lambda }[/math] changes, the volume should be allowed to change as well so the solvent density of the system does not change as a result of state. Liquids are incompressible, which means that any small change in volume can have stark changes in the thermodynamic properties. Even when running at NVT conditions, the average volume from an NPT equilibration should be used as box fluctuations can cause free energy differences of 0.1-0.3 kcal/mol for typical small molecule solvation.

Solvating small molecules may take 100-500ps, but this is a near best case scenario. Large protein-ligand systems started out of equilibrium have very lengthy correlation times and may take hundreds of nanoseconds to equilibrate. Things to watch for when checking equilibration are the system energy ([math]\displaystyle{ U }[/math]), [math]\displaystyle{ \left\langle \mathrm{d} U / \mathrm{d} \lambda \right\rangle }[/math], and the number of hydrogen bonds (for small molecules) for convergence; the hydrogen bonds may be slow to converge.[2]

Running Nonequilibrium Simulations

Collecting from States of Interest

Independent and Uncorrelated Samples

The samples must be independent, meaning they are uncorrelated in time; this is a basic assumption of all the theories presented here in the fundamentals section. However, for all but the simplest of systems, completely independent samples can be very difficult to generate. For protein-ligand binding affinities, the time scale for some motions may be 100s of ns, meaning truly uncorrelated samples may be impossible to generate in a reasonable amount of time with today's simulation technology. In this case, free energy calculations might provide some useful information, but will only be approximations to the correct free energy for that model and cannot be considered reliable.

There are methods for subsampling simulated and finding correlation times to work around the independent sampling problem, but those have been relegated to their own section.

References

  1. Fujitani, H., Tanida, Y., Ito, M., Shirts, M. R., Jayachandran, G., Snow, C. D., Sorin, E. J., and Pande, V. S. (2005) Direct calculation of the binding free energies of FKBP ligands. J. Chem. Phys. 123, 084108. - Find at Cite-U-Like
  2. Smith, L. J., Daura, X., and van Gunsteren, W. F. (2002) Assessing equilibration and convergence in biomolecular simulations. Proteins: Struct., Funct., Bioinf. 48, 487–496.