Difference between revisions of "Best Practices"
Line 88: | Line 88: | ||
{{cite|Chodera2006|Chodera, J. D.; Swope, W. C.; Pitera, J. W.; Seok, C.; Dill, K. A. Journal of Chemical Theory and Computation 2007, 3.|http://www.citeulike.org/group/14929/article/9052139}} | {{cite|Chodera2006|Chodera, J. D.; Swope, W. C.; Pitera, J. W.; Seok, C.; Dill, K. A. Journal of Chemical Theory and Computation 2007, 3.|http://www.citeulike.org/group/14929/article/9052139}} | ||
− | {{cite|Wu2005|Wu, D.; Kofke, D. A. Journal of Chemical Physics 2005, 123, 054103.|http://www.citeulike.org/group/14929/article/9052669}} | + | <!--{{cite|Wu2005|Wu, D.; Kofke, D. A. Journal of Chemical Physics 2005, 123, 054103.|http://www.citeulike.org/group/14929/article/9052669}}--> |
{{cite|Jarzynski2006|Jarzynski, C. Phys. Rev. E 2006, 73, 046105.|http://www.citeulike.org/group/14929/article/9052297}} | {{cite|Jarzynski2006|Jarzynski, C. Phys. Rev. E 2006, 73, 046105.|http://www.citeulike.org/group/14929/article/9052297}} | ||
− | {{cite|Fujitani2005|Fujitani, H.; Tanida, Y.; Ito, M.; Shirts, M. R.; Jayachandran, G.; Snow, C. D.; Sorin, E. J.; Pande, V. S. Journal of Chemical Physics 2005, 123, 084108.|http://www.citeulike.org/group/14929/article/9052204}} | + | <--{{cite|Fujitani2005|Fujitani, H.; Tanida, Y.; Ito, M.; Shirts, M. R.; Jayachandran, G.; Snow, C. D.; Sorin, E. J.; Pande, V. S. Journal of Chemical Physics 2005, 123, 084108.|http://www.citeulike.org/group/14929/article/9052204}} --> |
{{cite|Mobley2007c|Mobley, D. L.; Chodera, J. D.; Dill, K. A. Journal of Chemical Theory and Computation 2007, 3.|http://www.citeulike.org/group/14929/article/9052442}} | {{cite|Mobley2007c|Mobley, D. L.; Chodera, J. D.; Dill, K. A. Journal of Chemical Theory and Computation 2007, 3.|http://www.citeulike.org/group/14929/article/9052442}} | ||
</references> | </references> |
Revision as of 21:08, 12 July 2013
This is intended as an overview of best practices for free energy calculations, with references, when possible.
This document assumes you already have a basic idea of what these calculations are and what they do (if not, you can learn more from the Free Energy Fundamentals section), and that you already have a working knowledge of molecular simulation, convergence and equilibration. If not, start with a textbook like Leach's Molecular Modelling: Principles and Applications[1]). The discussion is simulation-package agnostic, focusing on the important ideas.
This document is open for editing. We hope this to be a place storing the community consensus on these issues. Do back up any edits with appropriate references. If you are not sure about changes, post questions on the discussion pages. The Previous Best Practices link is here.
Introduction
Free energies calculated from molecular simulation are appealing, because, in principle, they are rigorously correct, given a particular set of parameters and physical assumptions. Given this particular set of parameters and physical assumption, there exists a single right answer for a free energy calculation. The goal is to obtain that free energy with as little statistical error as possible -- and not necessarily to match experiment. Without convergence, matching experiments may be due only to fortuitous cancellation of error. Only after the simulations have converged can the underlying parameters and physical assumptions really be tested.
Free energy is a function of state, so there are many possible choices of pathway for a thermodynamic cycle connecting the same two endpoints. Some pathways may be more efficient than others (sometimes by many orders of magnitude). Methods which are in principle correct may not always be practical or useful. A large part of the following discussion will discuss exactly how to construct a useful and efficient pathways connecting the end states.
Free energy differences can be defined between any two physical states defined by a Hamiltonian. Here, we specifically discuss best practices in the context of two common practical situations; calculating solvation free energies and calculating binding free energies. Solvation free energies provide a basic starting point for considering a number of the key issues. For binding free energies, there are many more choices of pathway, which raises additional complications.
Guidelines for setting up your system files
Guideline 1: Automate when possible
It is extremely easy to make a mistake that affects the entire simulation system. When possible, automate simulation preparation tasks so that they do not need to be done manually. Validating the results of automated scripts is a very good idea too.
Guidelines for free energy calculation pathways
Alchemical free energies almost involve some insertion or deletion of atoms -- both in hydration free energy calculations, and in absolute and relative binding free energy calculations. By insertion and deletion, we mean decoupling or annihilation (see Decoupling and annihilation for definitions) of the interactions of the atoms in question. We believe that a basic list of rules and guidelines should be followed in any calculation that involves insertion or deletion:
Guideline 1: Always use soft-core potentials while decoupling or annihilating Lennard-Jones interactions
When removing atoms from a system, the atoms should be removed gradually. Because of how potentials are defined, what gradual removal means can be somewhat subtle. The soft-core potential formalism is a well-tested way to do this. Read more about how to choose the potential to decouple or annihilate Lennard-Jones sites.
Guideline 2: Never leave a partial atomic charge on an atom while its Lennard-Jones interactions are being removed
Partial atomic charge should never be allowed to remain on an atom while its Lennard-Jones interactions are being removed, because it can result in an instance of two bare charges. Read more about how to handle changes to both charges and Lennard-Jones sites.
Guideline 3: Use few insertions/deletions
Electrostatics transformations are usually smooth functions of lambda , and require relatively few lambda values, while Lennard-Jones transformations – especially those involving insertion and deletions – can require substantially more lambda values (even when using soft core potentials) to obtain good phase-space overlap and therefore accurate free energy differences.[2][3][4] Insertions and deletions of particles can be thought of as “difficult” transformations.[5]
Consequently, it is far more computationally efficient to modify existing particles (atoms) than to insert or delete new atoms. Construct mutation pathways for relative free energy calculations that minimize the number of times atoms, especially large atoms, need to be inserted and deletet, since multiple choices of mutation pathways between a set of molecules are typically possible.
This guideline is unfortunately not useful for absolute free energy calculations, since these involve inserting or deleting entire molecules.
Guideline 4: Think about whether your intermediate states are likely to converge or not
Many choices of pathway are possible. It can often be helpful to think about whether a particular choice of pathway makes convergence easier or more difficult at each lambda, thus reducing the amount of simulation time required.
Visualize what happens to a ligand when it is partially noniteracting with the protein (say at [math]\displaystyle{ \lambda=0.5 }[/math]) during an absolute binding free energy simulation. It will have significantly more freedom to move around the pocket than when fully interacting- but the kinetics may still be slow, since it will get trapped in free energy minima. It then may take quite a while to sample these states.
One can introduce distance or orientation restraints between the ligand and the protein. These reduce the free space of exploration of the ligand, thus increasing convergence. In the noninteracting state, the interaction potential can be removed analytically. In the interacting state, the free energy of applying these constraints need to be included. The free energy of applying constraints can be non-trivial, since the ligand will be pulled away from alternative minima, but convergence issues at other intermediates will be greatly reduced.
At the fully noninteracting state, the amount of configurational sampling the ligand undergoes is dictated by this choice. A ligand with a single reference distance restrained relative to the protein will need to sample a spherical shell in configuration space, and a ligand with all six relative degrees of freedom restrained would need to sample only a very small region of configuration space. These two can take drastically different amounts of time, so in fact it can be much more efficient, at least in some cases, to use the additional restraints.[6]
Guidelines for carrying out free energy calculations
Guideline 1: Equilibrium methods are easier to use for beginners than nonequilibrium or mixed-ensemble methods
There are a very large number of basic methods for performing free energy calculation: Slow-growth, fast-growth (Jarzynski-style), and equilibrium (or instantaneous growth) free energy methods. For advanced users, there are significant efficiency advantages with careful use of these methods.
However, for beginners, and for standard simulations, we believe the evidence is in favor of equilibrium methods: Run simulations at fixed values of lambda, collect equilibrium data from these simulations, and calculate the free energies from these observations. There are far fewer things that can go wrong. If it's a problem that really requires significantly more efficiency, it's probably worth doing a warmup problem for practice first!
Guideline 2: Use equilibrated data
For free energy methods to work, the simulations must sample from the correct Boltzmann distribution. Make sure that your simulation is well-equilibrated, and that you are not in the transient region. Simulations must separately be equilibrated at each lambda value.
Equilibration at only [math]\displaystyle{ \lambda=0 }[/math], followed by sequential short simulations at other lambda values may not be sufficient. For example in a study by Mobley et al.[7] the protein remains kinetically trapped in its starting [math]\displaystyle{ \lambda=0 }[/math] equilibrium conformational state in all calculations. If long equilibration is deemed necessary, equilibration periods should be equally long at each lambda value.
Guideline 3: Monitor whether your free energy calculations are actually statistically converging
- Trial and error: For a particular system, begin with very closely spaced lambda values, and then gradually increase separation until results begin to deteriorate.
- Error analysis:
Final Guideline : Verify, Verify, Verify!
In view of these facts, our recommendation is to use equilibrium simulations at a set of separate [math]\displaystyle{ \lambda }[/math] values to estimate free energy differences. But how should these simulations be performed? Should a [math]\displaystyle{ \lambda=0 }[/math] simulation be performed first, for example, and then the ending conformation used to seed a [math]\displaystyle{ \lambda=0.1 }[/math] simulation? Or should they be independent? There is no reason in principle that simulations cannot be performed serially – but this leads to several potential pitfalls and disadvantages. First, if simulations are performed serially, and it is later concluded that some of the intermediate simulations were not long enough, the entire set of calculations performed after those intermediate simulations must be repeated. Second, results at one [math]\displaystyle{ \lambda }[/math] value become coupled to those at previous lambda values, potentially leading to hysteresis (add references) and difficulties in troubleshooting. Independent simulations, on the other hand, provide the great practical advantage that if, at the analysis stage, it appears that data is insufficient at some range of [math]\displaystyle{ \lambda }[/math] values, these particular simulations can simply be extended without requiring repetition of other component simulations. Additionally, with large computer clusters now becoming relatively common, independent simulations can often be performed in parallel, leading to significant real-time savings over running the simulations serially.
References
- ↑ Leach, A. Molecular Modelling: Principles and Applications (2nd Edition); Prentice Hall: 2 ed.; 2001.
- ↑ Shirts, M. R.; Pande, V. S. J. Chem. Phys. 2005, 122, 134508. - Find at Cite-U-Like
- ↑ Mobley, D. L.; Dumont, Ã.; Chodera, J. D.; Dill, K. A. Journal of Physical Chemistry B 2007, 111. - Find at Cite-U-Like
- ↑ Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K.; Dill, K. A. Journal of Molecular Biology 2007, 371,. - Find at Cite-U-Like
- ↑ Jarzynski, C. Phys. Rev. E 2006, 73, 046105. - Find at Cite-U-Like
- ↑ 6.0 6.1 Mobley, D. L.; Chodera, J. D.; Dill, K. A. Journal of Chemical Physics 2006, 125, 084902. - Find at Cite-U-Like
- ↑ Mobley, D. L.; Chodera, J. D.; Dill, K. A. Journal of Chemical Theory and Computation 2007, 3. - Find at Cite-U-Like
- ↑ Chodera, J. D.; Swope, W. C.; Pitera, J. W.; Seok, C.; Dill, K. A. Journal of Chemical Theory and Computation 2007, 3. - Find at Cite-U-Like
Cite error: <ref>
tag with name "Fujitani2005" defined in <references>
is not used in prior text.