GROMACS 4.6 example: Ethanol solvation with expanded ensemble
Free Energy Fundamentals |
---|
Methods of Free Energy Simulations
|
Free Energy How-to's |
---|
Setting up the calculation with GROMACS
This tutorial assumes knowledge of [GROMACS]. Make sure you actually know how to use GROMACS first.
Setting up the calculations are very similar to standard free energy calculations; you can review a tutorial here for the standard way to perform the simulation. The topology and gro files are exactly the same; only the mdp files and the calls to the programs change.
In this particular case, we again set up nine intermediate states defined to perform the calculation of the absolute solvation free energy of ethanol (also known as the chemical potential at infinite dilution). However, you will only need to run one simulations, as the simulation will go back and forth between the different states.
So, start with the same (ethanol.gro) and (ethanol.top) and a new mdp file (expanded.mdp) files.
Running the calculation with GROMACS
Run grompp and mdrun as normal. Specifically:
grompp -f expanded.mdp -c ethanol.gro -p ethanol.top -o ethanol.tpr -maxwarn 4
There may be some warnings, and you'll need to override, hence -maxwarn.
For mdrun, we simply run:
mdrun -deffnm ethanol -dhdl ethanol.dhdl.xvg
The number of threads can be set as with any other simulation. If you don't set the dhdl file independently, it will be saved to ethanol.xvg,
Strictly, you will only need the dhdl files, but looking at the other output files can be useful to determine what is going on in the system if something goes wrong.
Analyze the calculation results
There are two ways to use the outputs to perform free energy calculations.
Expanded ensemble calculations build up the free energies as the simulations are performed. It does this by building up simulation weights as the simulation progresses, so each different thermodynamic state (lambda state) has a different weight. If it didn't have this weight, then the simulation would spend all the time in the lowest free energy states. When it visits each state equally, then the weights will be exactly equal to the free energies. We keep track of this in the log files. For example, after 50 ps, we have:
Step Time Lambda 25000 50.00000 0.00000 MC-lambda information Wang-Landau incrementor is: 1 N CoulL VdwL Count G(in kT) dG(in kT) 1 0.000 0.000 38 0.00000 4.00000 2 0.200 0.000 34 4.00000 4.00000 3 0.500 0.000 30 8.00000 0.00000 4 1.000 0.000 30 8.00000 3.00000 5 1.000 0.200 27 11.00000 3.00000 << 6 1.000 0.400 24 14.00000 1.00000 7 1.000 0.600 23 15.00000 2.00000 8 1.000 0.800 21 17.00000 -2.00000 9 1.000 1.000 23 15.00000 0.00000
the << indicate the current state, with the free energies indicated.
at 250 ps, we have:
Step Time Lambda 125000 250.00000 0.00000 MC-lambda information Wang-Landau incrementor is: 0.0625 N CoulL VdwL Count G(in kT) dG(in kT) 1 0.000 0.000 13 0.00000 4.75000 2 0.200 0.000 25 4.75000 3.87500 3 0.500 0.000 41 8.62500 2.56250 << 4 1.000 0.000 30 11.18750 2.12500 5 1.000 0.200 26 13.31250 1.62500 6 1.000 0.400 24 14.93750 0.93750 7 1.000 0.600 21 15.87500 -1.81250 8 1.000 0.800 6 14.06250 -6.37500 9 1.000 1.000 6 7.68750 0.00000
Note that the Wang-Landau incrementor (the amount added to the free energies of the current state) is now 0.0625
Much later, we reach:
Step Time Lambda 997000 1994.00000 0.00000 MC-lambda information Wang-Landau incrementor is: 0.0078125 N CoulL VdwL Count G(in kT) dG(in kT) 1 0.000 0.000 299 0.00000 4.53125 2 0.200 0.000 311 4.53125 3.93750 3 0.500 0.000 335 8.46875 1.96094 4 1.000 0.000 340 10.42969 1.84375 << 5 1.000 0.200 352 12.27344 1.64844 6 1.000 0.400 347 13.92188 0.64062 7 1.000 0.600 383 14.56250 -2.22656 8 1.000 0.800 450 12.33594 -5.54688 9 1.000 1.000 456 6.78906 0.00000
And a few steps after that, we have:
Step 998200: Weights have equilibrated, using criteria: wl-delta
So the weights now STOP equilibrating. From this point on, the simulation is an equilibrium simulation (it is not history-dependent). This is the point which we should START analyzing the dhdl file.
We can tell how close these free energies are by looking at how close the visits are to flat at the end of the simulation:
Step Time Lambda 20000000 40000.00000 0.00000 MC-lambda information N CoulL VdwL Count G(in kT) dG(in kT) 1 0.000 0.000 24232 0.00000 4.50000 2 0.200 0.000 21616 4.50000 3.98438 3 0.500 0.000 20533 8.48438 1.96875 4 1.000 0.000 21404 10.45312 1.84375 5 1.000 0.200 20978 12.29688 1.64844 6 1.000 0.400 24308 13.94531 0.64062 << 7 1.000 0.600 23005 14.58594 -2.22656 8 1.000 0.800 17166 12.35938 -5.54688 9 1.000 1.000 16776 6.81250 0.00000
We can see that the counts are within 2.42/1.68 = 1.44, and kBT*log(1.44) = 0.8 kJ/mol, which is not so bad since the weights were fixed after 2 ns of total simulation at all eight states.
We can now analyze the dhdl file, using pymbar with the dhdl output. Unlike last time, there is only one dhdl file, which has outputs of the form:
@ s0 legend "Thermodynamic state" @ s1 legend "Energy (kJ/mol)" @ s2 legend "dH/d\xl\f{} (coul -lambdas)" @ s3 legend "dH/d\xl\f{} (vdw-lambdas)" @ s4 legend "\xD\f{}H \xl\f{} (0,0)" @ s5 legend "\xD\f{}H \xl\f{} (0.2,0)" @ s6 legend "\xD\f{}H \xl\f{} (0.5,0)" @ s7 legend "\xD\f{}H \xl\f{} (1,0)" @ s8 legend "\xD\f{}H \xl\f{} (1,0.2)" @ s9 legend "\xD\f{}H \xl\f{} (1,0.4)" @ s10 legend "\xD\f{}H \xl\f{} (1,0.6)" @ s11 legend "\xD\f{}H \xl\f{} (1,0.8)" @ s12 legend "\xD\f{}H \xl\f{} (1,1)" @ s13 legend "pV (kJ/mol)" 0.0000 0 -29136.337 63.503958 19.491314 0.0000000 12.700792 31.751979 63.503958 67.421648 71.375522 75.362115 79.378551 83.422410 1.6650634 0.2000 0 -29341.744 95.496064 -258.31621 0.0000000 19.099213 47.748032 95.496064 85.312750 85.791887 87.838454 90.571444 93.705941 1.6496685 0.4000 0 -28839.430 81.996287 8.7445649 0.0000000 16.399257 40.998143 81.996287 84.283621 87.281402 90.697714 94.393505 98.291373 1.6952030
The first line after the time is the thermodynamic state. We can then distinguish which state each sample is from.
Again you will need to install [[1]]. alchemical-gromacs.py will be in examples directory.
The correct invocation is:
python alchemical-gromacs.py -f directory/prefix -t 300 -p 1 -n 2500 -v > outpufile
'-t' is temperature, '-p' is pressure, '-f' is (prefix of the files, including directory), and '-v' is verbose output (not required, but helpful to understand!)
It will use all files it finds with the given prefix, and it will assume they are numbered in order. In this case, there should only be one file. If there are multiple expanded ensemble files, it will analyze all the data together. We should omit the first 2 ns, since we know that the weights were still equilibrating. Read over the usage for standard calculations.
Understanding the analysis
Let's look at the output file:
This time it's only reading a single file:
The number of files read in for processing is: 1 output is verbose Reading metadata from solvation_direct/outputs/dhdls/ethanol_expanded.dhdl.xvg... Done reading metadata from solvation_direct/outputs/dhdls/ethanol_expanded.dhdl.xvg... Reading solvation_direct/outputs/dhdls/ethanol_expanded.dhdl.xvg...
All standard stuff saying what it's doing. The next part is important. pymbar determines how many of the samples are statistically uncorrelated, and only uses every [math]\displaystyle{ 1/2\tau }[/math] samples. Although it says correlation time, it's really the twice the correlation time, which is the length of time required between uncorrelated samples.
Now computing correlation times Correlation times: [ 5.00888473 5.00888473 5.00888473 5.00888473 5.00888473 5.00888473 5.00888473 5.00888473 5.00888473] number of uncorrelated samples: [3417 3049 2913 3000 2940 3365 3211 2468 2407]
The convergence behaves just the same. We get the same format of outputs, because we are still sampling from 8 states.
TI (kJ/mol) TI-CUBIC (kJ/mol) DEXP (kJ/mol) IEXP (kJ/mol) BAR (kJ/mol) MBAR (kJ/mol) 0: 11.478 +- 0.041 11.441 +- 0.043 11.405 +- 0.089 11.467 +- 0.091 11.408 +- 0.042 11.410 +- 0.041 1: 10.274 +- 0.056 10.008 +- 0.065 10.042 +- 0.132 10.007 +- 0.174 10.032 +- 0.060 10.048 +- 0.057 2: 5.463 +- 0.072 4.811 +- 0.104 5.049 +- 0.161 4.402 +- 0.226 4.736 +- 0.073 4.714 +- 0.059 3: 4.682 +- 0.031 4.705 +- 0.033 4.948 +- 0.064 4.613 +- 0.029 4.609 +- 0.026 4.643 +- 0.017 4: 3.687 +- 0.028 3.691 +- 0.032 3.720 +- 0.104 3.794 +- 0.033 3.762 +- 0.026 3.755 +- 0.021 5: 1.503 +- 0.042 2.059 +- 0.048 1.947 +- 0.150 1.794 +- 0.054 1.792 +- 0.039 1.787 +- 0.037 6: -5.330 +- 0.068 -5.232 +- 0.075 -4.464 +- 0.409 -4.670 +- 0.187 -4.760 +- 0.090 -4.735 +- 0.089 7: -13.103 +- 0.068 -13.580 +- 0.082 -13.769 +- 0.127 -12.219 +- 1.090 -13.774 +- 0.066 -13.770 +- 0.066 ------------------- ------------------- ------------------- ------------------- ------------------- ------------------- TOTAL: 18.653 +- 0.501 17.901 +- 0.539 18.878 +- 0.522 19.187 +- 1.148 17.803 +- 0.162 17.853 +- 0.191
We have almost identical results to the fixed lamda simulations, carried out with simulations at fixed states:
TI (kJ/mol) TI-CUBIC (kJ/mol) DEXP (kJ/mol) IEXP (kJ/mol) BAR (kJ/mol) MBAR (kJ/mol) ------------------- ------------------- ------------------- ------------------- ------------------- ------------------- TOTAL: 18.505 +- 0.186 17.721 +- 0.198 17.559 +- 0.305 18.104 +- 0.361 17.688 +- 0.059 17.684 +- 0.070
We can ALSO compare to the weights at the point they were fixed. The free energies will be -1 times the weights time [math]\displaystyle{ k_B T }[/math] = 2.494 ( at T = 300 giving
0: 11.22 1: 9.94 2: 4.91 3: 4.60 4: 4.11 5: 1.60 6: -5.55 7: -13.83 TOTAL: 16.99
Which are about 0.7 kJ/mol from the MBAR estimate, about the 0.8 kJ/mol we estimated earlier.
The uncertainties are lower with the fixed lambda states, which in this case is mostly because of different correlation times. For expanded ensembles, we take the correlation time in the total u_k(x), which behaves a bit differently that the correlation time in the derivative of the potential.
Hamilton replica exchange
There is no point in doing expanded ensemble with Hamiltonian replica exchange -- it's ALREADY visiting all of the states.