Difference between revisions of "Draft Standards for Post-Calculation Health Reports"

From AlchemistryWiki
Jump to navigation Jump to search
(Created initial draft of Draft Standards for Post-Calculation Health Reports)
 
Line 4: Line 4:
  
 
== Division of simulation data into "equilibrated region" and "production analysis region" ==
 
== Division of simulation data into "equilibrated region" and "production analysis region" ==
 +
 +
While a Markov chain Monte Carlo simulation does not require that any initial "equilibration" region be identified and discarded to achieve asymptotically consistent results (see Charles Geyer's discussion [http://users.stat.umn.edu/~geyer/mcmc/burn.html Burn-In is Unnecessary]), the initial starting configuration and box volumes for an alchemically free energy calculation are often so highly atypically samples from the equilibrium distribution that failing to discard some initial transients of relaxation to equilibrium will lead to highly biased free energy estimates.
 +
It is therefore recommended that all simulations properly identify a division between the initial "unequilibrated region", in which the system transiently relaxes to a more typical set of samples from equilibrium, and the "equilibrated production region" of data that is analyzed to produce estimates of the free energy.
 +
This also allows the analyst to ensure that the "equilibrated region" is sufficiently long as to produce a useful estimate to the practitioner.
 +
 +
This should be done for each "leg" of the alchemical free energy simulation thermodynamic cycle.
 +
 +
=== Manual inspection of key properties ===
 +
 +
If properties of the simulation are to be inspected manually (which is not recommended, but may be useful for graphical illustration), the practitioner is advised to examine the following properties:
 +
 +
==== Box volume ====
 +
 +
Many simulation preparation schemes for explicit solvent simulations utilize a scheme that replicates a small equilibrated water box (potentially generated with a different solvent or a different set of nonbonded simulation cutoffs or dispersion corrections), places the complex or ligand within this box, and then culls waters with significant overlap with the solute molecules.  Counterions are added through one of several heuristic schemes.  This process can lead to initial solvent densities that significantly different densities than the average equilibrium density, where density deviations of even 0.1% are highly significant and can result in enormous artificial pressure deviations. 
 +
 +
Additionally, initiating a simulation containing an alchemically-modified solute from configurations typical of an unmodified solute will have incorrect initial solvent densities.
 +
 +
'''We recommend some quantity relating to the box volume be computed and examined over the course of the simulation, examining this for transients'''
 +
 +
Individual alchemical simulations can be inspected separately.
 +
 +
Replica-exchange simulations are more difficult to inspect because exchanges between replicas complicate the inspection of per-state traces, while per-replica traces hop among many alchemical states with different typical volumes.  Therefore, it is recommended that the ''average'' volume across all replicas be inspected as a function of simulation time to inspect for initial transients.
 +
 +
'''TODO: Show examples'''
 +
 +
=== Automated equilibration determination ===
  
 
== Computation of statistical inefficiencies ==
 
== Computation of statistical inefficiencies ==

Revision as of 13:47, 22 December 2013

Purpose

This page contains a discussion regarding a draft standard practice for analyses to be run following an alchemical free energy calculation to provide a "health report" indicating the quality of an alchemical free energy calculations, and identifying some common modes of failure generally known to occur in these types of calculations. The hope is to begin a community discussion on this topic so that multiple implementations of these standard checks will be available for different popular codes.

Division of simulation data into "equilibrated region" and "production analysis region"

While a Markov chain Monte Carlo simulation does not require that any initial "equilibration" region be identified and discarded to achieve asymptotically consistent results (see Charles Geyer's discussion Burn-In is Unnecessary), the initial starting configuration and box volumes for an alchemically free energy calculation are often so highly atypically samples from the equilibrium distribution that failing to discard some initial transients of relaxation to equilibrium will lead to highly biased free energy estimates. It is therefore recommended that all simulations properly identify a division between the initial "unequilibrated region", in which the system transiently relaxes to a more typical set of samples from equilibrium, and the "equilibrated production region" of data that is analyzed to produce estimates of the free energy. This also allows the analyst to ensure that the "equilibrated region" is sufficiently long as to produce a useful estimate to the practitioner.

This should be done for each "leg" of the alchemical free energy simulation thermodynamic cycle.

Manual inspection of key properties

If properties of the simulation are to be inspected manually (which is not recommended, but may be useful for graphical illustration), the practitioner is advised to examine the following properties:

Box volume

Many simulation preparation schemes for explicit solvent simulations utilize a scheme that replicates a small equilibrated water box (potentially generated with a different solvent or a different set of nonbonded simulation cutoffs or dispersion corrections), places the complex or ligand within this box, and then culls waters with significant overlap with the solute molecules. Counterions are added through one of several heuristic schemes. This process can lead to initial solvent densities that significantly different densities than the average equilibrium density, where density deviations of even 0.1% are highly significant and can result in enormous artificial pressure deviations.

Additionally, initiating a simulation containing an alchemically-modified solute from configurations typical of an unmodified solute will have incorrect initial solvent densities.

We recommend some quantity relating to the box volume be computed and examined over the course of the simulation, examining this for transients

Individual alchemical simulations can be inspected separately.

Replica-exchange simulations are more difficult to inspect because exchanges between replicas complicate the inspection of per-state traces, while per-replica traces hop among many alchemical states with different typical volumes. Therefore, it is recommended that the average volume across all replicas be inspected as a function of simulation time to inspect for initial transients.

TODO: Show examples

Automated equilibration determination

Computation of statistical inefficiencies

Verifying acceptable alchemical state overlap

Computation of statistical error

Determining whether the calculation has converged

Detection of infrequent events coupled to the alchemical free energy

Detection of artifacts in the simulation parameters

Cycle closure in relative free energy calculations