Implementation of adaptive integration method for free energy calculations in molecular systems

Christopher A. Mirabzadeh; F. Marty Ytreberg

doi:10.7717/peerj-cs.264

Implementation of adaptive integration method for free energy calculations in molecular systems

Christopher A. Mirabzadeh¹, F. Marty Ytreberg ^1,2,3

1Department of Physics, University of Idaho, Moscow, ID, United States of America

2Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, ID, United States of America

3Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States of America

DOI: 10.7717/peerj-cs.264

Published: 2020-03-16
Accepted: 2020-02-10
Received: 2019-09-04

Academic Editor: James Procter

Subject Areas: Computational Biology, Scientific Computing and Simulation
Keywords: Adaptive integration, Monte Carlo, Free energy, Solvation, Protein, Biomolecule

Copyright: © 2020 Mirabzadeh and Ytreberg
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: Mirabzadeh CA, Ytreberg FM. 2020. Implementation of adaptive integration method for free energy calculations in molecular systems. PeerJ Computer Science 6:e264 https://doi.org/10.7717/peerj-cs.264

Abstract

Estimating free energy differences by computer simulation is useful for a wide variety of applications such as virtual screening for drug design and for understanding how amino acid mutations modify protein interactions. However, calculating free energy differences remains challenging and often requires extensive trial and error and very long simulation times in order to achieve converged results. Here, we present an implementation of the adaptive integration method (AIM). We tested our implementation on two molecular systems and compared results from AIM to those from a suite of other methods. The model systems tested here include calculating the solvation free energy of methane, and the free energy of mutating the peptide GAG to GVG. We show that AIM is more efficient than other tested methods for these systems, that is, AIM results converge to a higher level of accuracy and precision for a given simulation time.

Introduction

Measuring free energy differences using computer simulations can be computationally expensive, yet is useful for many different applications (see e.g., Steinbrecher & Labahn, 2010; Chodera et al., 2011; Mobley et al., 2012; Zhan et al., 2013; Miller et al., 2014; Petukh, Li & Alexov, 2015; Zhan & Ytreberg, 2015; Miller et al., 2016; Cournia, Allen & Sherman, 2017; Hossain et al., 2019; Aminpour, Montemagno & Tuszynski, 2019). Specific examples include determining protein conformational preferences, virtual screening for drug design or drug discovery (Steinbrecher & Labahn, 2010; Chodera et al., 2011; Zhan & Ytreberg, 2015; Śledź & Caflisch, 2018; Aminpour, Montemagno & Tuszynski, 2019; Zhang et al., 2019). Of specific relevance to the current study is that free energy calculations allow prediction of how amino acid mutations may modify protein-protein binding (Zhan et al., 2013; Miller et al., 2014; Petukh, Li & Alexov, 2015; Miller et al., 2016; Geng et al., 2019). We are particularly interested in developing and implementing efficient methods for calculating free energy differences and using them to understand how amino acid mutations modify protein–protein and protein-substrate interactions.

For this study, we have implemented the adaptive integration method (AIM) introduced by Fasnacht, Swendsen & Rosenberg (2004) for use in the GROMACS (Berendsen, Van der Spoel & Van Drunen, 1995) molecular dynamics simulation package, and have compared results to a suite of other methods. In previous studies, AIM was shown to provide high quality, precise and efficient estimates of binding free energies (Ytreberg, Swendsen & Zuckerman, 2006; Kaus, Arrar & McCammon, 2014; Kaus & Mccammon, 2015). We focus on alchemical free energy calculation where a system is transformed from one state to another via an unphysical pathway. The progress along the pathway that connects the two states is defined by the parameter λ. For this study, we are interested in two ways to explore λ space—both of which have the goal of obtaining equilibrium sampling of system configurations at discrete λ values along the pathway. The first way is to treat λ as a system variable that can be biased and sampled. A class of such methods, termed generalized ensemble, use an extended Hamiltonian to sample λ (Bitetti-Putzer, Yang & Karplus, 2003). For example, λ-dynamics (Kong & Brooks, 1996; Knight & Brooks, 2009; Knight & Brooks, 2011) treats λ as a dynamic particle in the system with fictitious mass. By contrast, AIM uses Metropolis Monte Carlo to sample λ space (Fasnacht, Swendsen & Rosenberg, 2004). Monte Carlo moves between values of λ are based on running estimates of free energy differences; this is a key distinction from other methods and allows AIM to continuously improve the estimate for the free energy during the simulation. The second way to explore λ space is to perform standard molecular dynamics or Monte Carlo simulations at fixed values of λ, typically discarding some simulation time for equilibration. The configurational ensembles at each value of λ can then be used to estimate free energy differences (see e.g., Lyubartsev, Førrisdahl & Laaksonen, 1996; Gonçalves & Stassen, 2004; Kofke, 2005; Shirts, Mobley & Chodera, 2007; Chodera & Shirts, 2011; Klimovich, Shirts & Mobley, 2015). In order to provide comparisons for AIM to fixed λ methods, we used the Python tool alchemical-analysis.py (Klimovich, Shirts & Mobley, 2015), part of the Pymbar package (Shirts & Chodera, 2008). This tool estimates the free energy using a suite of methods such as the Bennett acceptance ratio, multistate Bennett acceptance ratio, thermodynamic integration and exponential averaging.

For the current study, we chose two molecular systems that have well-documented results and are important starting points for biomolecular free energy studies. First, we calculated the solvation free energy of methane. Simulations were performed and the free energies were calculated using the fixed λ methods provided by alchemical-analysis. Simulations were also performed using AIM and results compared to fixed λ simulations. Using the lessons learned from the methane system, we then calculated the free energy of mutating the peptide GAG to GVG in water. For both systems, we found that AIM produces free energy estimates that are within statistical uncertainty of fixed λ methods but with greater efficiency (i.e., more accurate for a given simulation time).

Methods

All methods, code and simulation input files are available in the supplemental materials. For this study, we performed alchemical free energy simulations where the system is changed from a reference state to an end state by constructing a reaction pathway that modifies, adds or removes atoms. Such alchemical simulations are non-physical, i.e., the simulation does not represent what could occur naturally. Since the free energy is a state variable, it is independent of the path taken, and we may provide any path we wish. To perform these simulations the reaction pathway is divided into many separate, non-physical, λ states between a reference state and an end state. The λ states represent the progress along the reaction pathway as the reference state transforms into the end state.

Like most methods used to calculate free energies we start from the identity, (1) $F = U - T S,$ where U is the potential energy, T is the temperature and S is the entropy of the system. For free energy differences we generalize the formulation of the change in free energy by separating calculations into two, non-overlapping, thermodynamic end states, A and B, at constant system temperature T, (2) $Δ F \equiv Δ F_{A \to B} = F_{B} - F_{A} = Δ U - T Δ S .$ ΔF is the change in free energy, ΔU is the change in potential energy and ΔS is the change in entropy of the system. According to statistical mechanics, the free energy difference between the two end states, A and B, of the system is the log of the ratio of the configurational partition functions (see discussion in Chipot & Pohorille (2007)), (3) $Δ F = - k_{B} T ln \frac{Z [U_{B} (\vec{x})]}{Z [U_{A} (\vec{x})]} .$ Here, k_B is the Boltzmann constant and $Z [U (\vec{x})]$ is the configurational partition function for the energy states $U_{A} (\vec{x})$ and $U_{B} (\vec{x})$ , where $\vec{x}$ is the vector of configuration coordinates. The configurational partition function is given by (4) $Z [U (\vec{x})] = \int exp (- β U (\vec{x})) d \vec{x},$ and $β = \frac{1}{k_{B} T}$ .

Computationally, we calculate free energy differences between end states by performing molecular dynamics simulations along a reaction pathway of intermediate states, defined by λ, such that, 0 ≤ λ ≤ 1.

This pathway connects the two end states of the system. In the case of poor overlap, where the end states may be separated by a high energy barrier, |U_B − U_A| ≫ k_BT, this pathway mitigates the otherwise very slow convergence of free energy estimates (Shirts, Mobley & Chodera, 2007). Care should be taken when choosing intermediate states such that there is adequate overlap in the conformation space between the end states (Shirts, Mobley & Chodera, 2007; Klimovich, Shirts & Mobley, 2015). For our simulations the number of λ values and time per λ were chosen through extensive trial and error (more on this below).

The method of exponential averaging (DEXP, IEXP) (Zwanzig, 1954) starts from Eq. (3) above and then adding and subtracting $exp (- β U (\vec{x}))$ from the integral in the configurational partition function of the numerator we end up with the final relationship, (5) $Δ F_{i j} = - k_{B} T ln {〈 exp (- β Δ U_{i j} (\vec{x})) 〉}_{λ_{i}} .$ where ΔF_ij is the free energy between λ_i and λ_j and 〈⋅〉_{λ_i} represents an average of the equilibrium configuration for λ_i. Unlike some other methods, exponential averaging has an exact solution since it is only used to evaluate the difference between two states. However, it is the least efficient method and should not be used if difference in potential energies are much larger than k_BT (Shirts & Pande, 2005). In addition, exponential averaging can be noisy, biased and dependent on the tails of the distribution of λ states (Bruckner & Boresch, 2011; Shirts & Pande, 2005).

For thermodynamic integration (TI) we estimate the free energy by first looking at the derivative of Eq. (1) with respect to λ, (6) $\frac{\partial F}{\partial λ} = {〈\frac{\partial U}{\partial λ}〉}_{λ} .$ This differential equation, Eq. (6), can then be integrated to give, (7) $Δ F = \int_{λ = 0}^{1} {〈\frac{\partial U_{λ} (\vec{x})}{\partial λ}〉}_{λ} d λ$ where the 〈⋅〉_λ notation represents the ensemble average at a given intermediate state, λ. The free energy is estimated by numerically integrating Eq. (7) after running equilibrium simulations at each intermediate λ state. Since numerical integration is required, TI can be biased by the chosen method of integration. Some of that bias can be removed by using cubic-spline interpolation or more complex integration estimators(Shirts & Pande, 2005; Shyu & Ytreberg, 2009).

The Bennett (Bennett, 1976) and multistate (Shirts & Chodera, 2008) Bennett acceptance ratio (BAR and MBAR) methods are far more efficient than exponential averaging and are commonly used to avoid the shortcomings of other methods (Shirts & Pande, 2005; Ytreberg, Swendsen & Zuckerman, 2006). BAR and MBAR typically achieve the same statistical precision as TI with fewer λ states unless the integrand for TI is very smooth (Shirts & Mobley, 2013; Ytreberg, Swendsen & Zuckerman, 2006). The complete derivation can be found in Bennett’s paper (Bennett, 1976) but the premise is; for sufficiently large samples n_i of U_i and n_j of U_j, (8) $Δ F (i \to j) = k_{B} T ln \frac{{〈 f (Δ U_{i j} + C) 〉}_{j}}{{〈 f (Δ U_{j i} - C) 〉}_{i}} + C .$ C is a shift constant, (9) $C = k_{B} T ln \frac{n_{j}}{n_{i}},$ and f(x) is the Fermi function, (10) $f (x) = \frac{1}{1 + exp (β x)} .$ Equation (8) is the ratio of canonical averages of two different potentials U_i and U_j acting on the same configuration space meaning it requires information from two neighboring states. However, this limitation is not too much of a concern with a trivial coordinate transformation or when using dummy coordinates in alchemical simulations. MBAR, an extension of BAR, differs in that it takes data from more than two states hence the name “multistate”.

AIM is similar to TI in that numerical integration of Eq. (7) is performed; the key difference is how the averages ${〈\partial U ∕ \partial λ〉}_{λ}$ are obtained. AIM uses Metropolis Monte Carlo to move in λ space and ordinary running averages are calculated at each λ value. In AIM, a random move from λ_old to λ_new is accepted with probability (11) $min \{1, exp (- β (U_{n e w}) - U_{o l d}) + β (F_{n e w} - F_{o l d})\}$ where U_new − U_old is the difference in the potential energy for the old and new λ values. F_new − F_old is the estimated free energy difference based on the current running averages of ∂U∕∂λ.

Implementation

AIM was implemented in GROMACS as an expanded ensemble calculation. That is, the Hamiltonian must be calculated along with its derivative, and an expanded ensemble step must be performed for every dynamics step. In GROMACS, nstexpanded is the number of integration steps between attempted λ moves changing the system Hamiltonian in expanded ensemble simulations. This value must be a multiple of nstcalcenergy, the number of steps before calculating the system energy, but can be greater or less than nstdhdl, the number of steps before calculating ∂U∕∂λ (referred to as dHdλ in GROMACS documentation). For a detailed explanation of all technical terms see reference Abraham et al. (2016). The GROMACS package was further altered to print out the ∂U∕∂λ averages computed by AIM to the log file when AIM is used as the lmc-mover.

AIM requires the ∂U∕∂λ value from every dynamics step to be stored regardless of whether a move in λ space is attempted. Since ∂U∕∂λ is only calculated at each step where free energies are calculated, every nstdhdl step, we set nstexpanded = nstdhdl = nstcalcenergy = 1 for AIM simulations. This further implies that lmc-stats functions were not used during AIM simulations because those functions modify the Hamiltonian which is not needed for AIM.

For the implementation of AIM with GROMACS we follow the outline given in our previous study Ytreberg, Swendsen & Zuckerman (2006).

Start the simulation from an equilibrated configuration at λ=0 and perform one molecular dynamics step.
Randomly choose a trial move in λ space. For example, if our λ spacing is 0.05, a move from λ = 0.35 to 0.4 or 0.3 may be attempted but not to 0.45.
Calculate the difference in potential energy between the trial and current λ values.
Estimate the free energy difference between the trial and current λ values using the running averages of ∂U∕∂λ and the trapezoidal rule.
Accept λ trial with probability given in Eq. (11).
If the move is accepted then λ is updated to the trial value, otherwise the simulations stays at the current λ.
The running average of ∂U∕∂λ is updated.

Simulation details

The first system used here, methane in water, is detailed in systematic studies of force fields and the free energies of hydration of amino acid side chain analogs (Sun et al., 1992; Lyubartsev, Førrisdahl & Laaksonen, 1996; Chodera & Shirts, 2011; Paliwal & Shirts, 2011). For the GAG to GVG mutation the PMX (Gapsys et al., 2015) software package was used to construct the tri-peptide mutation. Using PMX, we generated the hybrid protein structure and topology for simulations of the chosen mutation, alanine to valine.

All simulations described in this paper were performed using the molecular dynamics package GROMACS 5.1.4. The simulations were carried out at 300 K and solvated in a dodecahedron box with TIP3P waters. The methane molecule was parameterized using the OPLS (Optimized Potential for Liquid Simulations) force field (Jorgensen, Maxwell & Tirado-Rives, 1996). The OPLS force field was chosen for this study because it is known to perform well on small molecules (Shirts et al., 2003). In future studies, we anticipate using AIM on protein systems where other force fields are more appropriate such as AMBER (Salomon-Ferrer, Case & Walker, 2013) and CHARMM (Mackerell, Banavali & Foloppe, 2001). Since all molecular dynamics force fields have similar form and number of parameters, it is expected that the performance of AIM would not depend on the force field chosen. For the GAG to GVG mutations, Na+ and Cl- ions were added to keep the simulation box neutral and reach a physiologically relevant 150 mM salt concentration.

For both systems, energy minimization was performed using steepest descent for 1,000 steps. The system was then equilibrated using simulated annealing for 1,000 ps to heat the system from 100 K to 300 K. For production simulations, electrostatic interactions were handled by Reaction field with a cut-off of 0.9 nm, Potential-shift-Verlet modifier and Verlet cutoff scheme. Van der Waals interactions were handled by twin range cutoffs with neighbor list cutoff of 1.15 nm and van der Waals cutoff of 0.9 nm. The bonds involving hydrogens were constrained with the Shake algorithm, allowing for a 2 fs time step. Long range dispersion corrections for energy and pressure were applied. For the free energy calculations, softcore scaling was used with parameters sc-power =1, sc-r-power =6 and sc-alpha =0.5. In addition, the van der Waals and Coulomb interactions were separately turned on or off as a function of λ. That is, one is held fixed as a function of λ while the other changes. For the methods processed with alchemical-analysis.py we ran fixed λ simulations. That is, an equal amount of simulation time was spent at each λ value. For AIM we ran expanded ensemble simulations where we alternate between taking molecular dynamics steps and attempting trial moves in λ space. That is, for AIM the amount of time spent at each λ value is determined by the algorithm.

In order to determine the best distribution of intermediate λ states we followed a simple strategy: (i) Conduct short simulations with a small set of intermediates. (ii) Generate a plot comparing slope values between AIM and fixed λ (iii) Determine the locations of curvature in the estimate of the free energy. (iv) Increase the density of intermediate states in locations of high curvature. (v) Repeat until all areas of high curvature have been well explored. We note that these steps should be performed for any method where the slope of the free energy is used to calculate free energies, including both TI and AIM. Similar steps would be performed for methods such as DEXP, IEXP, BAR and MBAR to ensure that energy differences are not too large between λ values.

Results

Methane

After conducting short simulations, generating plots to determine locations of high curvature and increasing λ density in those regions, we averaged eight trial simulations of 100 ps per λ for separate λ distributions (see Fig. 1). We found, by progressively increasing the λ density between λ = 0.5 and λ = 1.0, that a distribution of 31 λ values gave us a dense enough distribution to properly compare AIM to fixed λ methods for the methane simulations.

Figure 2 is a violin plot to visualize the distribution and probability densities over the eight trials for each method as a function of simulation time per value of λ. A violin plot combines a box plot and a density plot to show the shape of the distribution around the mean. The thick black bar in the center represents the interquartile range, the white dot is the median and the thin black line going vertically through the middle represents the upper and lower adjacent values. Reading a violin plot is similar to reading a density plot. The thicker parts represent high frequency values and the thinner parts represent low frequency values. The advantage of a violin plot over a box plot is that we are able to view the underlying distribution of the data.

In Fig. 2 at 100 ps per λ most methods have similar standard deviations of around 0.05 kcal/mol (visualized by the height of the violin shape in the figure), but the slower convergence of MBAR in this case leads to a larger standard deviation of around 0.16 kcal/mol. By 750 ps AIM has converged to a smaller standard deviation of around 0.02 kcal/mol compared to the other methods at around 0.05 kcal/mol.

GAG to GVG Mutation

For the GAG to GVG mutation we first tested a distribution of 41 λ values averaged over 8 trial simulations of 1 ps and 100 ps per λ; see Fig. 3. By reviewing the smoothness of the function we concluded that 41 λ values was sufficient. Figure 4 shows the convergence of the free energy for GAG to GVG over time for each method. At just 1 ps per λ value the AIM estimates are within less than 1.0 kcal/mol of the converged (1 ns per λ) result with a standard deviation of around 0.5 kcal/mol. All other methods are more than 2.0 kcal/mol from this converged result with larger standard deviations of around 1.0 kcal/mol. At 100 ps all estimates are less than 0.1 kcal/mol from the converged result, but AIM estimates have a standard deviation of around 0.1 kcal/mol compared to other methods at 0.4 kcal/mol. All methods have similarly converged at 1 ns per λ.

Figure 2: Violin plot showing methane solvation results for 31 λ values averaged over eight trials.
A violin plot combines a box plot and a density plot to visualize the distribution and probability density. The graphic shows all methods have similarly converged at 1 ns per λ. AIM and AIM-CUBIC converge earlier than other methods at 750 ps per λ.

Download full-size image

DOI: 10.7717/peerjcs.264/fig-2

Figure 3: Different simulation times for alanine to valine mutation free energy calculations.
Eight trial simulations of 41 λ values at 1 ps, 100 ps and 1 ns per λ. Note the smoothness of AIM versus fixed λ simulations. AIM requires less samples than fixed λ simulations to smooth the free energy function.

Download full-size image

DOI: 10.7717/peerjcs.264/fig-3

Figure 4: Violin plot showing alanine to valine mutation results for 41 λ values averaged over eight trials.
The graphic shows all methods have similarly converged at 1 ns per λ. AIM and AIM-CUBIC converge more rapidly than other methods and are mostly converged at 100 ps per λ.

Download full-size image

DOI: 10.7717/peerjcs.264/fig-4

Discussion

In the limit of infinite sampling, all rigorous methods (i.e., statistical mechanics-based methods), performed properly with the same force-field and parameters, will yield the same result within uncertainty. Often, it is of interest to define accuracy by comparing results to experimental data. However, given that the purpose of this study is to compare various computational methods, we define accuracy by comparing results to the value upon which all methods converge. For fixed λ simulations the sampling time is typically the same for each λ state. Sampling time must be increased whenever convergence has not been achieved. However, if bias is introduced by using an insufficient number of λ values in regions of high curvature, increased sampling leads to radical convergence problems (Shyu & Ytreberg, 2009; Steinbrecher & Labahn, 2010). If the curvature of the underlying free energy slope values is large, averaging over a state space that is not dense enough to fully describe the state function propagates this bias requiring significantly increased sampling time to achieve convergence. For TI, the bias will persist even for infinite sampling. In addition, increasing sampling time may not be realistic when dealing with limited computational resources. Paliwal & Shirts (2011) make a detailed argument to why convergence may not be possible for all systems due to hard limitations in computational resources.

In particular, both TI and AIM are calculating the same slope averages and should agree very well for simple systems and reasonably long simulation times. However, before convergence is reached, due to the fact that AIM spends more time in some regions, we should not expect the approximation of AIM to exactly match TI with similar sampling time until the number of λ values has been sufficiently increased in high curvature regions. Once we have properly chosen the λ values then reasonably long simulations will lead to highly similar results between these two methods.

AIM is able to estimate the free energy for the amino acid mutation within 1.0 kcal/mol at a total simulation time of only 41 ps. This is quite remarkable since ±1.0 kcal/mol is typically the range that is desired for mutation studies. Of course, further studies using a broader range of amino acid change are needed, but it suggests that AIM may be suitable for quick estimation. We believe the reason that AIM performs so well in such cases is due to the Monte Carlo sampling that allows AIM to more efficiently sample λ space compared to fixed λ simulations.

The reader may note that AIM violates detailed balance since the acceptance criterion contains the free energy estimates that are updated continuously. AIM does however obey detailed balance asymptotically. As simulation time increases, the average free energy differences between λ values reach an equilibrium and detailed balance is satisfied. Once this equilibrium is attained the algorithm will sample all λ values equally, that is, the histogram of the number of configurations will become flat as a function of λ.

Conclusion

In this report we have implemented the adaptive integration method (AIM) for calculating free energy differences in GROMACS and applied it to two molecular systems. We have shown agreement within statistical uncertainty between AIM and a suite of fixed λ methods for methane solvation and an GAG to GVG mutation. We have also shown that AIM is more efficient than the other tested methods. That is, for a given amount of simulation time, AIM has a higher level of accuracy and precision. We anticipate these findings will extend to larger, more complex systems. Future studies will be performed to test whether this is the case.

Further, we found that running longer simulations with too few intermediate λ states generated results that were inconsistent between methods. The density and sampling convergence of the λ states directly influences the agreement between all the tested methods. Since some states will contribute disproportionately to the variance of the estimate, we found that generating short test simulations of different λ densities before attempting longer simulations is advisable.

Supplemental Information

Source code, input file, and Jupyter notebook

Three directories: (1) gmxProgFiles contains source code to build AIM into GROMACS 5.1.4; (2) gmxInputFiles contains input files for GROMACS to reproduce our results; (3) AnalysisFiles contains Jupyter notebook files for analyzing results producing graphs.

DOI: 10.7717/peerj-cs.264/supp-1

Download

[1] Abraham M, Van der Spoel D, Lindahl E, Hess B. 2016. GROMACS user manual version 5.1.4. software

[2] Aminpour M, Montemagno C, Tuszynski JA. 2019. An overview of molecular modeling for drug discovery with specific illustrative examples of applications. Molecules 24(9) Article 1693

[3] Bennett CH. 1976. Efficient estimation of free energy differences from Monte Carlo Data. Journal of Computational Physics 22:245-268

[4] Berendsen HJ, Van der Spoel D, Van Drunen R. 1995. GROMACS: a message-passing parallel molecular dynamics implementation. Computer Physics Communications 91(1–3):43-56

[5] Bitetti-Putzer R, Yang W, Karplus M. 2003. Generalized ensembles serve to improve the convergence of free energy simulations. Chemical Physics Letters 377(5–6):633-641

[6] Bruckner S, Boresch S. 2011. Efficiency of alchemical free energy simulations. I. A practical comparison of the exponential formula, thermodynamic integration, and Bennett’s acceptance ratio method. Journal of Computational Chemistry 32(7):1303-1319

[7] Chipot C, Pohorille A. 2007. Free energy calculations. New York: Springer.

[8] Chodera JD, Mobley DL, Shirts MR, Dixon RW, Branson K, Pande VS. 2011. Alchemical free energy methods for drug discovery: progress and challenges. Current Opinion in Structural Biology 21(2):150-160

[9] Chodera JD, Shirts MR. 2011. Replica exchange and expanded ensemble simulations as Gibbs sampling: Simple improvements for enhanced mixing. Journal of Chemical Physics 135(19):0-15

[10] Cournia Z, Allen B, Sherman W. 2017. Relative binding free energy calculations in drug discovery: recent advances and practical considerations. Journal of Chemical Information and Modeling 57(12):2911-2937

[11] Fasnacht M, Swendsen RH, Rosenberg JM. 2004. Adaptive integration method for Monte Carlo simulations. Physical Review E 69(5):056704-1-056704-15

[12] Gapsys V, Michielssens S, Seeliger D, De Groot BL. 2015. pmx: automated protein structure and topology generation for alchemical perturbations. Journal of Computational Chemistry 36(5):348-354

[13] Geng C, Xue LC, Roel-Touris J, Bonvin AM. 2019. Finding the ΔΔG spot: are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it? Wiley Interdisciplinary Reviews: Computational Molecular Science 9(5):e1410

[14] Gonçalves PFB, Stassen H. 2004. Calculation of the free energy of solvation from molecular dynamics simulations. Pure and Applied Chemistry 76(1):231-240

[15] Hossain S, Kabedev A, Parrow A, Bergström CA, Larsson P. 2019. Molecular simulation as a computational pharmaceutics tool to predict drug solubility, solubilization processes and partitioning. European Journal of Pharmaceutics and Biopharmaceutics 137(February):46-55

[16] Jorgensen WL, Maxwell DS, Tirado-Rives J. 1996. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. Journal of the American Chemical Society 118(45):11225-11236

[17] Kaus JW, Arrar M, McCammon JA. 2014. Accelerated adaptive integration method. Journal of Physical Chemistry B 118(19):5109-5118

[18] Kaus JW, Mccammon JA. 2015. Enhanced ligand sampling for relative protein-ligand binding free energy calculations. Journal of Physical Chemistry B 119(20):6190-6197

[19] Klimovich PV, Shirts MR, Mobley DL. 2015. Guidelines for the analysis of free energy calculations. Journal of Computer-Aided Molecular Design 29:397-411

[20] Knight JL, Brooks CL. 2009. λ—dynamics free energy simulation methods. Journal of Computational Chemistry 30(11):1692-1700

[21] Knight JL, Brooks CL. 2011. Multisite λ dynamics for simulated structure—activity relationship studies. Journal of Chemical Theory and Computation 7(9):2728-2739

[22] Kofke DA. 2005. Free energy methods in molecular simulation. Fluid Phase Equilibria 228–229:41-48

[23] Kong X, Brooks CL. 1996. Λ—dynamics: a new approach to free energy calculations. The Journal of Chemical Physics 105(6):2414-2423

[24] Lyubartsev AP, Førrisdahl OK, Laaksonen A. 1996. Calculations of solvation free energies by expanded ensemble method. In: 2nd international conference on natural gas hydrates. Toulouse (France), June 2–6, 1996, (6). 311-318

[25] Mackerell AD, Banavali N, Foloppe N. 2001. Development and current status of the CHARMM force field for nucleic acids. Biopolymers (Nucleic Acid Sciences 56:257-265

[26] Miller CR, Johnson EL, Burke AZ, Martin KP, Miura TA, Wichman HA, Brown CJ, Ytreberg FM. 2016. Initiating a watch list for Ebola virus antibody escape mutations. PeerJ 4:e1674

[27] Miller CR, Lee KH, Wichman HA, Ytreberg FM. 2014. Changing folding and binding stability in a viral coat protein: a comparison between substitutions accessible through mutation and those fixed by natural selection. PLOS ONE 9(11):e112988

[28] Mobley DL, Liu S, Cerutti DS, Swope WC, Rice JE. 2012. Alchemical prediction of hydration free energies for SAMPL. Journal of Computer-Aided Molecular Design 26(5):551-562

[29] Paliwal H, Shirts MR. 2011. A benchmark test set for alchemical free energy transformations and its use to quantify error in common free energy methods. Journal of Chemical Theory and Computation 7(12):4115-4134

[30] Petukh M, Li M, Alexov E. 2015. Predicting binding free energy change caused by point mutations with knowledge-modified MM/PBSA method. PLOS Computational Biology 11(7):1-23

[31] Salomon-Ferrer R, Case DA, Walker RC. 2013. An overview of the Amber biomolecular simulation package. Wiley Interdisciplinary Reviews: Computational Molecular Science 3(2):198-210

[32] Shirts MR, Chodera JD. 2008. Statistically optimal analysis of samples from multiple equilibrium states. Journal of Chemical Physics 129(12):124105-1-124105-10

[33] Shirts MR, Mobley DL. 2013. An introduction to best practices in free energy calculations. In: Monticelli L, Salonen E, eds. Biomolecular simulations, Methods in molecular biology (Methods and Protocols). Totowa: Humana Press. vol. 924:271-311

[34] Shirts MR, Mobley DL, Chodera JD. 2007. Chapter 4 alchemical free energy calculations: ready for prime time? Annual Reports in Computational Chemistry 3(February 2016):41-59

[35] Shirts MR, Pande VS. 2005. Comparison of efficiency and bias of free energies computed by exponential averaging, the Bennett acceptance ratio, and thermodynamic integration. Journal of Chemical Physics 122(14):144107-1-144107-16

[36] Shirts MR, Pitera JW, Swope WC, Pande VS. 2003. Extremely precise free energy calculations of amino acid side chain analogs: comparison of common molecular mechanics force fields for proteins. Journal of Chemical Physics 119(11):5740-5761

[37] Shyu C, Ytreberg FM. 2009. Reducing the bias and uncertainty of free energy estimates by using regression to fit thermodynamic integration data. Journal of Computational Chemistry 30(14):2297-2304

[38] Śledź P, Caflisch A. 2018. Protein structure-based drug design: from docking to molecular dynamics. Current Opinion in Structural Biology 48:93-102

[39] Steinbrecher T, Labahn A. 2010. Towards accurate free energy calculations in ligand protein-binding studies. Current Medicinal Chemistry 17(8):767-785

[40] Sun Y, Spellmeyer D, Pearlman DA, Kollman P. 1992. Simulation of the solvation free energies for methane, ethane, and propane and corresponding amino acid dipeptides: a critical test of the bond-PMF correction, a new set of hydrocarbon parameters, and the gas phase-water hydrophobicity scale. Journal of the American Chemical Society 114(17):6798-6801

[41] Ytreberg FM, Swendsen RH, Zuckerman DM. 2006. Comparison of free energy methods for molecular systems. Journal of Chemical Physics 125(18):1-11