Visitors   Views   Downloads

Effects of number of parallel runs and frequency of bias-strength replacement in generalized ensemble molecular dynamics simulations

View article
Effects of number of parallel runs and frequency of bias-strength replacement in generalized ensemble molecular dynamics simulations https://t.co/VvNjmWJy31 View all the @PeerJPhysChem peer-reviewed articles here: https://t.co/Hh4AY9qjDl https://t.co/eJK7bKrNno
PEER-REVIEWED

Introduction

In the past several decades, the molecular dynamics (MD) method has been widely applied to investigate the microscopic behavior of molecular systems. Although advances in high-performance computing technology have extended the timescale that is reachable by MD simulations (Salomon-Ferrer et al., 2013; Shaw et al., 2014; Abraham et al., 2015), there is still a large gap from experimental measurements. In particular, it is not straightforward to characterize the free-energy landscape (FEL) of a complex molecular system, because the characteristics of conformational ensembles obtained via canonical MD simulations largely depend on the initial conditions. To solve this problem, the generalized ensemble (GE) approach has been extensively developed and applied to the MD method. The GE approach enhances the conformational sampling using some tricks. First, in many GE methods, the conformational sampling can be performed with many parallel runs of simulations in a coupled or independent manner. For example, the replica-exchange MD (REMD) method (Sugita & Okamoto, 1999) involves performing many simulations of the same system, i.e., replicas, with different temperatures. The replicas with adjacent temperatures are coupled by exchanging their temperatures via Monte Carlo trials. On the other hand, the multicanonical MD (McMD) method (Nakajima, Nakamura & Kidera, 1997) can be performed by multiple independent runs, and a resultant ensemble is obtained by concatenating the trajectories of these runs (Ikebe et al., 2010). Second, the GE approach generates a non-Boltzmann distribution by applying bias potential, e.g., heating/cooling in the entire system or a part of the system, scaling the potential energies, and applying spring potentials for parts of system. These biases enhance the conformational changes of molecules and avoid trapping the molecular system at local minima in the FEL. During a simulation, the strength of the bias is frequently replaced, and the system alternates between different bias conditions. After simulations, a canonical ensemble can be obtained by reweighting each snapshot in the sampled conformational ensemble (Souaille & Roux, 2001; Shirts & Chodera, 2008).

For using these two features, users must adjust some settings. First, the number of runs is an adjustable parameter. In the case of the REMD method, using a larger number of replicas allows wider overlaps of the energy distributions between adjacent replicas and results in a higher acceptance probability. However, increasing the number of runs proportionally increases the computational costs. Users must choose the optimal balance between the number of runs and the length of each run according to the available computational resources. Previously, Ikebe et al. (2010) reported that an increase in the number of independent runs of McMD yields efficient exploration of a wider area of the conformational space. However, the balance between the number of runs and the length of each run has not been discussed. Second, the frequency of the bias-strength replacement is also adjustable. In the REMD method, the frequency of replica-exchange trials must be specified by users. Other methods using a continuous bias strength, e.g., McMD and adaptive umbrella sampling (AUS), can control the frequency of bias-strength replacement by using the virtual-system coupling scheme (Higo, Umezawa & Nakamura, 2013; Higo et al., 2015), as described later. It is reported that the frequency of the bias-strength replacement affects the resultant ensembles for the REMD method (Periole & Mark, 2007; Sindhikara, Meng & Roitberg, 2008; Rosta & Hummer, 2009; Sindhikara, Emerson & Roitberg, 2010; Jani, Sonavane & Joshi, 2014; Iwai, Kasahara & Takahashi, 2018). Although higher frequencies enhance the traversals in the temperature space, they are suspected as an origin of artifacts. Although the effects of these features have been examined, these studies were mainly based on simple model peptides with helix–coil transitions. The effects of the features for more practical cases, e.g., a protein folding–unfolding transition, are not fully understood. More importantly, the relationship between these effects and the complexities of molecular systems, e.g., the degree of freedom and ruggedness of the FEL, are expected to be revealed.

In this study, we aim to elucidate the effects of the number of runs and the bias-replacing frequency for the GE method on the resultant conformational ensembles of molecular models including a foldable mini-protein and disordered model peptides. We utilized the trivial-trajectory parallelized virtual-system coupled McMD (TTP-V-McMD) method (Ikebe et al., 2010; Higo, Umezawa & Nakamura, 2013), which is a variant of the McMD method, for simulating the three molecular models with an explicit solvent: (i) Trp-cage, (ii) 8-residue poly-glutamic acid (PGA8), and (iii) 20-residue poly-glutamic acid (PGA20). We chose these models as test cases to examine the simulation conditions because they are sufficiently small for elucidating their conformational ensembles within a practical computational time in addition to the fact that their structural properties have been well studied thus far. Trp-cage, which is a mini-protein consisting of 20 amino acids, has been widely studied as a prominent model of protein folding (Ahmed et al., 2005; Hudáky et al., 2007; Hałabis et al., 2012). Poly-glutamic acids have been used as model peptides to characterize the conformational properties of polypeptides (Clarke et al., 1999; Kimura et al., 2002; Finke et al., 2007; Donten & Hamm, 2013; Ogasawara et al., 2018). We analyzed their FELs under various parameter settings to provide a guide for adjusting these parameters for the GE methods. The questions to be answered are as follows: (1) Which condition is more efficient: many short simulations or a small number of long-term simulations? (2) Which is better: frequent or less frequent replacement of the bias strength? Moreover, we discuss the relationship between the relaxation of the energy and that of the protein conformation. While the McMD method enhances the relaxation in the energy space, it is not guaranteed to enhance the relaxation in the conformational space. We analyzed these two relaxation processes using the McMD trajectories calculated with the various settings.

Materials and Methods

We calculated the FELs of the three explicitly solvated molecular models: Trp-cage, PGA8, and PGA20, by using the TTP-V-McMD method with various settings. The theory of McMD, virtual-system coupled McMD (V-McMD), and trivial-trajectory parallelization (TTP) is briefly presented in the following subsections. Then, the simulation protocol applied in this study is described.

Multicanonical MD

The McMD method efficiently explores the conformational space of a molecular system, by applying a biasing energy term. The Hamiltonian H of the system is H=K+Emc, where K and Emc denote the kinetic energy and multicanonical energy, respectively. Emc is defined as follows: Emc=E+RTlnPc(E,T), where E is the potential energy, and the second term corresponds to the bias potential. R is the gas constant, and Pc(E, T) denotes the canonical distribution at the temperature T: Pc(E,T)=n(E)exp(ERT)Zc(T), where n(E) denotes the density of states, and Zc(T) is the partition function of the canonical distribution at the temperature T. With this definition, the potential energy distribution of an ensemble obtained from the McMD, or the multicanonical distribution, Pmc(E), becomes uniform: Pmc(E)=n(E)exp(EmcRT)Zmc(T)            =n(E)exp(ERT)Pc(E,T)  Zmc(T)=Zc(T)Zmc(T)=const.

As a result, the McMD method performs a random walk in the potential energy space and generates a uniform distribution of potential energy in a resultant ensemble. After a multicanonical ensemble is obtained, a canonical ensemble at any temperature in a sampled energy range can be generated by reweighting the probability of existence of each snapshot.

Equations (3) and (4) include an analytical form of n(E), which is usually unknown a priori. Therefore, n(E) is approximated as a parametric function, e.g., the polynomial function, and its parameters are estimated by iterations of McMD simulations to make Pmc(E) near-uniform. In the ith iteration, the bias potential is calculated using Eq. (2) with the canonical distribution obtained from the (i–1)th iteration, i.e., Pci1(E,T). As the result of the ith iteration, we obtain Pmci(E). Pci(E,T) can be calculated as Pci(E,T)=Pmci(E)Pci1(E,T).

See Higo et al. (2012) for details.

Virtual-system coupled McMD

Virtual-system coupled McMD (V-McMD) introduces a virtual system, which interacts with the molecular system, and the multicanonical ensemble is calculated for the entire system consisting of these two subsystems (Higo, Umezawa & Nakamura, 2013). In practice, this method can be roughly interpreted as a combination of McMD and the simulated tempering method. The simulated tempering method replaces the system temperature with the Metropolis criterion and performs a canonical simulation until the next replacement trial. On the other hand, in V-McMD, the potential energy space is split into several regions (Fig. S1), and the molecular system is trapped in one of these regions. With a certain time interval (tVST), the molecular system replaces the region to be trapped. The state variable governing which region traps the molecular system is called the “virtual state,” and the system defined by the virtual state is called the “virtual system.” The energy range of each virtual state is defined to be overlapped with the adjacent virtual states. When the molecular system has the potential energy Ek in the overlapped region of the ith and (i + 1)th virtual states, the state transition between these two virtual states can occur. Because this transition does not change the atomic coordinates or potential energy, the Metropolis criterion of this state transition is always satisfied. The time interval of virtual-state transitions (tVST) should be determined arbitrarily by users. See Higo, Umezawa & Nakamura (2013) for details.

Trivial-trajectory parallelization

According to the theory of TTP, trajectories of multiple independent McMD runs with the same molecular system and different initial conditions can be treated as a single trajectory of an McMD simulation by concatenating the trajectories in an arbitrary order. This theory requires the condition that the initial coordinates of each run are sampled from the multicanonical distribution. Because the initial coordinates of production runs can be obtained from the near-uniform potential energy distribution generated by iterative simulations, it is expected that this condition holds. The McMD method with the TTP theory, which is called the TTP–McMD method, can be considered as a hybrid Monte Carlo sampler, by assuming that the system transitions from the last snapshot of the ith run (the microscopic state mil) to the first snapshot of the jth run (the microscopic state mjf) via a Monte Carlo step (Fig. S2). See Ikebe et al. (2010) for details.

Simulation protocol

We studied the three molecular systems, which are Trp-cage, PGA8, and PGA20 in an explicitly solvated cubic periodic boundary cell, by using the TTP-V-McMD method. Random coil structures of Trp-cage, PGA8, and PGA20 were constructed using the Modeller software (Webb & Sali, 2016) without any template. The termini of the PGAs were capped with acetyl and N-methyl groups, and the termini of the Trp-cage were not capped. Each of these molecular models was plased into a cubic box filled by water molecules; the number of water molecules were 5,097, 2,879, and 3,800 for Trp-cage, PGA8, and PGA20, respectively. In addition, a Cl ion was added to the Trp-cage model to cancel the net charge of the system. The net charge of the PGA models was zero because all the Glu residues were protonated.

The system was relaxed by using the GROMACS software (Pronk et al., 2013). Energy minimizations were successively applied using the steepest descent and conjugate gradient methods. Then, an MD simulation under a constant-pressure ensemble with the Berendsen barostat was performed for 1 ns. In the first half of the simulation, gradual heating from 10 to 300 K was applied. In the simulation, the positions of the heavy atoms of the Trp-cage were restrained, the bond lengths were not constrained, and the integration time step (Δt) was 0.5 fs. Subsequently, an additional constant-pressure relaxation was applied for 1 ns with Δt = 2.0 fs, and the covalent bonds to hydrogen atoms were constrained using the LINCS method (Hess et al., 1997; Hess, 2008). The final configuration of each model was used for the TTP-V-McMD simulations. The cell dimensions of these configurations were 54.0378, 44.6116, and 49.1174 Å for Trp-cage, PGA8, and PGA20, respectively.

For each model, the following steps were performed using our MD simulation program, which is called myPresto/omegagene and is tailored for GE simulations (Kasahara et al., 2016). The protein conformation was randomized with a constant-temperature simulation at 800 K. By using 30 snapshots taken from a trajectory with an interval of 300 ps, 30 independent runs were simulated with a gradual decrease in the temperature from 629 to 296 K to estimate the density of states. Successively, the TTP-V-McMD simulations were iteratively performed while updating the estimation of the density of states (Higo, Umezawa & Nakamura, 2013). A total of 84 production runs were performed (Nrun = 84) for each of three different interval times for the virtual-state transitions (tVST) meaning the interval times for bias-potential replacement: tVST = 0.002, tVST = 0.2, and tVST = 20 ps. The simulation length of each run (trun) was 50 ns except for the Trp-cage model with tVST = 0.2 ps, trun, of which the simulation length was 200 ns. In total, 50.4 μs of trajectories were simulated as production runs. The virtual system was divided into seven states that cover the energy range corresponding to the canonical distribution from 296 to 629 K. The velocity scaling method (Berendsen et al., 1984) was applied to maintain the system temperature.

For the potential parameters, the AMBER ff99SB-ILDN force field (Lindorff-Larsen et al., 2010), the ion parameter presented by Joung & Cheatham (2008), and the TIP3P water model (Jorgensen et al., 1983) were applied. The electrostatic potential was calculated using the zero-multipole summation method, which is a non-Ewald scheme (Fukuda, 2013; Fukuda, Kamiya & Nakamura, 2014). The zero-dipole condition with the damping factor α = 0 was used (Fukuda, Yonezawa & Nakamura, 2011; Fukuda et al., 2012).

Comparison of simulated ensembles among different settings

On the basis of the trajectories obtained from of the TTP-V-McMD production runs, the effects of the simulation conditions, i.e., the time interval for bias-strength replacement (tVST), the number of independent runs (Nrun), and the simulation time of each run (trun), were assessed.

For the Trp-cage model, we analyzed the FEL for various conformational ensembles on the basis of the two structural parameters: the root-mean-square deviation (RMSD) of Cα atoms from the native conformation (PDB ID: 1L2Y, model 1), which is denoted as RMSDnative, and the radius of gyration (Rg). The FEL is visualized as the map of the potential of mean forces (PMF) on the plane defined by these two parameters. We defined the reference ensemble as the ensemble calculated for the conditions of trun = 200 ns, Nrun = 84, and tVST = 0.2 ps, because it is expected to have the highest reliability owing to its abundance of samples (it comprises a total of 16.8 μs of simulations). The FELs analyzed in various conditions were compared with the reference FEL with regard to the Pearson correlation coefficient of the PMF (PCCPMF). To calculate the PCCPMF for a pair of FELs, bins without samples in one of the two FELs were ignored. In addition, the probability of the existence of the native conformations in each ensemble (Pnative) was measured to characterize each ensemble. The native conformations are defined as the conformations with RMSDnative ≤ 2.0 Å.

For the PGA models, the FELs were analyzed using principal component analysis (PCA) based on the Cα–Cα distances (28 and 190 dimensions for PGA8 and PGA20, respectively). The PCAs were performed using aggregations of trajectories with all the three tVST conditions for each model. For each tVST condition, the ensemble calculated from the entire trajectory (trun = 50 ns and Nrun = 84) was considered as the reference ensemble. The FELs were compared with regard to PCCPMF, similar to the Trp-cage case.

To assess the effects of Nrun and trun, PCCPMF (and Pnative for the Trp-cage model) were calculated for ensembles with subsets of the reference trajectories. Because there are many possibilities to pick Nrun runs from 84 runs and trun-length trajectories from the entire set of trajectories, we analyzed them by using the bootstrap approach. We constructed an ensemble by taking a random sampling of Nrun runs from 84 runs with replacement and repeated it 100 times. The statistics over the 100 ensembles were analyzed via simulation with this Nrun setting. This process was repeatedly performed for Nrun = 1, 2, …, 84. For the case of trun, the trajectories were split into 5-ns bins, and an ensemble was constructed by taking a random sampling of trun/5 bins with replacement. We confirmed that the results of the bootstrap analyses with 100 and 200 samples were consistent (Fig. S3).

The sampling efficiency was measured in terms of the frequency of traversals between low- and high-energy regions, which were defined as the ranges (Emin, Elow) and (Ehigh, Emax), respectively. Here, Emin and Emax denote the minimum and maximum potential energies in all the trajectories, respectively, and Elow and Ehigh are defined as follows.

Elow=Emin+X(EmaxEmin) Ehigh=EmaxX(EmaxEmin)

X is an arbitrary parameter in the range of 0–0.5. We assessed X = 0.2 and 0.3. The traversal frequency FtraversE was calculated as the number of traversals between the two energy regions during 1.0 ns. The traversal frequencies of RMSDnative and Rg (FtraversRMSD and FtraversRg, respectively) were also analyzed.

Results

In the first part of this section, the results of the Trp-cage model are described. The reference ensemble is characterized in the subsection, “FEL of folding–unfolding equilibrium of Trp-cage.” Next, the effects of the parameters trun, Nrun, and their balances are discussed in the successive subsections: “Effects of simulation time for each run,” “Effects of number of independent runs,” and “Balance between simulation time and number of runs,” respectively. Subsequently, the effects of the other parameter tVST are discussed in the subsection, “Effects of frequency of bias-strength replacement.” Additionally, the following subsection, “Effects of system complexity” describes the results of the PGA8 and PGA20 models and compares them with those of the Trp-cage model.

FEL of folding–unfolding equilibrium of Trp-cage

For the Trp-cage model, we performed 34 iterations of TTP-V-McMD simulations while updating the estimation of the density of states, n(E), and obtained a near-uniform energy distribution (Fig. S4). On the basis of this estimation, we performed production runs with Nrun = 84, trun = 200 ns, and tVST = 0.2 ps. This is called the reference setting hereinafter. The resultant canonical ensemble reweighted at 300 K is referred to as the reference ensemble.

The FEL of the reference ensemble projected on the RMSDnativeRg plane is shown in Fig. 1A. The most stable basin corresponds to the native structure consisting of an α-helix at the N-terminus, a 310-helix at the middle, and a loop region at the C-terminus (the secondary structural elements were recognized by using the DSSP software) (Kabsch & Sander, 1983). For example, the RMSDnative of one of the most probable structures in this basin was 0.994 Å (Fig. 1B). The energy barrier (approximately 3.3 kcal/mol) was observed at RMSDnative ≈ 3 Å in a low-Rg regime. Around this barrier, the 310-helix at the middle of the peptide chain was partially deformed; this deformation can be the first step of an unfolding process (Fig. 1C). The details of the unfolding pathway are not discussed in this paper. The second basin was widely spread around RMSDnative = 4–7 Å and Rg = 7–9 Å. This corresponds to the unfolded state, and examples of the unfolded structures taken from this basin are shown in Figs. 1E and 1F. The difference in the PMF between the bottoms of the first and second stable basins was 1.014 kcal/mol, and the population of the native conformations (Pnative) was 22.37%. The landscape is qualitatively similar to that calculated using the REMD method reported by another group (Day, Paschek & Garcia, 2010). Our TTP-V-McMD simulation successfully identified the native structure as the most stable basin in the energy landscape, by using the reference setting.

FEL calculated by the reference ensemble of Trp-cage.

Figure 1: FEL calculated by the reference ensemble of Trp-cage.

(A) FEL based on the RMSDnativeRg plane. The color gradation indicates the PMF. (B) Snapshot taken from the first basin (blue) superimposed on the experimentally solved structure (gray; PDB ID: 1Y2L, model 1). (C) Examples of snapshots near the first basin. The structures colored dark cyan and light cyan correspond to the positions C1 and C2 marked in (A), respectively. (D and E) Examples of unfolded structures in the second basin. The positions of each snapshot on the FEL are marked in (A).

Effects of simulation time for each run

The FELs of the Trp-cage model were drawn for a variety of trun values under the condition of Nrun = 84 and compared with the reference FEL. The FELs based on the trajectories of 0–25, 0–50, and 0–100 ns are shown in Figs. 2A2C, respectively. The overall geometries of these FELs were qualitatively similar to the reference (Fig. 1A); their PCCPMF values were 0.936, 0.936, and 0.994, respectively. The bootstrap statistics of PCCPMF for each trun value are summarized in Fig. 2D. For trun = 200 ns, the bootstrap average and the standard deviation (SD) of PCCPMF were 0.990 and 0.007, respectively. Even in the worst case among 100 randomly generated ensembles with trun = 200 ns, PCCPMF was 0.966. From this condition, a decrease in trun yielded a slow decay of PCCPMF, and PCCPMF reached 0.9 at trun ≈ 30 ns, which corresponds to 15% of the samples in the reference. Further decreasing trun resulted in a steep decrease of PCCPMF. Along with the decrease of the bootstrap average of PCCPMF, the SD was increased. This means that an insufficient simulation time causes a loss of robustness of the results.

FELs of Trp-cage for various trun values with Nrun = 84.

Figure 2: FELs of Trp-cage for various trun values with Nrun = 84.

(A–C) FELs based on the trajectories of 0–25 ns (A), 0–50 ns (B), and 0–100 ns (C). (D) Bootstrap statistics of PCCPMF. The solid line is the average, the dashed lines are the sum of the average and SD and the subtraction of the SD from average. The dotted lines indicate the maximum and minimum values among 100 randomly generated ensembles in each condition. (E) Statistics of Pnative shown in the same scheme as (D).

In contrast to the fact that the PCCPMF decays in a shorter trun than the reference, the balance between the folded and unfolded states (Pnative) was almost constant regardless of trun (Fig. 2E); the bootstrap average of Pnative for trun = 5–200 ns was in the range of 0.220 to 0.225. However, the SD of Pnative was reduced with the increase of trun; the SDs of Pnative at trun = 5, 50, and 200 ns were 0.07, 0.02, and 0.008, respectively. The loss of robustness due to the insufficiency of the simulation time is demonstrated in terms of not only the similarity of the entire FEL but also the stability of the native fold.

Effects of number of independent runs

As in the previous subsection, the effects of the reduction of Nrun on the FELs were assessed under the condition of trun = 200 ns. Examples of FELs with Nrun = 10, 21, and 42 are shown in Figs. 3A3C, respectively; the PCCPMF values were 0.637, 0.939, and 0.993, respectively. Although the positions and wideness of the basins were similar to the reference, the FELs with a smaller Nrun were smoother and lacked small bumps on the landscapes. The bootstrap statistics of PCCPMF for various Nrun values (Fig. 3D) were similar to those for trun (Fig. 2D). The quantity of the samples required for PCCPMF ≥ 0.9 was approximately one-fourth of the reference (Nrun ≈ 21). The average (and the SD) of PCCPMF at Nrun = 21 and 42 were 0.906 (0.07) and 0.956 (0.04), respectively. Larger Nrun values are needed to obtain robust results.

Characteristics of FELs of Trp-cage for smaller Nrun values with trun = 200 ns.

Figure 3: Characteristics of FELs of Trp-cage for smaller Nrun values with trun = 200 ns.

(A–C) Examples of FELs with Nrun = 10 (A), Nrun = 21 (B), and Nrun = 42 (C). (D and E) Bootstrap statistics of PCCPMF (D) and Pnative (E). See also the legend of Fig. 2.

Regarding Pnative, the influence of the reduction of Nrun (Fig. 3E) differed from that of the reduction of trun (Fig. 2E). A lower Nrun resulted in the underestimation of the population of native conformations. Pnative reached at plateau for Nrun ≥ 21. A certain number of runs was needed to obtain robust results, and trun = 200 ns was too short to reach equilibrium with a small number of trajectories for this system.

Balance between simulation time and number of runs

The evaluation for various trun values with Nrun = 84 runs (Fig. 2) and that for various Nrun values with trun = 200 ns (Fig. 3) indicate that reducing trun produced better results than reducing Nrun if the cumulative simulation time (Nrun × trun) was the same. Figure 4 shows direct comparisons of the results, indicating that high-Nrun conditions resulted in a higher PCCPMF and more similar values of Pnative to the reference, with lower SDs, than long-trun conditions. In particular, the qualitative difference between the two strategies is shown by the mean of Pnative. Reducing Nrun resulted in the significant underestimation of the fold stability, but reducing trun did not.

Direct comparison between reducing trun with the fixed-Nrun condition (blue line) and reducing Nrun with the fixed-trun condition (red line) for the Trp-cage model.

Figure 4: Direct comparison between reducing trun with the fixed-Nrun condition (blue line) and reducing Nrun with the fixed-trun condition (red line) for the Trp-cage model.

The vertical axes indicate (A) the average of PCCPMF, (B) the SD of PCCPMF, (C) the average of Pnative, and (D) the SD of Pnative. The horizontal axis indicates the accumulated simulation length (Nrun × trun).

In addition, we performed bootstrap analyses for all the combinations of 40-trun settings (5, 10, 15, …, 200 ns) and 21-Nrun settings (4, 8, 12, …, 84). The average values of PCCPMF and Pnative in all the conditions are presented in Fig. 5 and Fig. S5. The PCCPMF was proportional to log(Nrun × trun). While the trend of Pnative is ambiguous, the use of a larger number of samples resulted in a higher Pnative. In the case where only small amount of data was available, a lower ratio of trun/Nrun (purple plots in Fig. 5) yielded better results.

Distribution of (A) the average of PCCPMF and (B) Pnative along the logarithm of the accumulated simulation length for various combinations of Nrun and trun extracted from the trajectories of the Trp-cage model.

Figure 5: Distribution of (A) the average of PCCPMF and (B) Pnative along the logarithm of the accumulated simulation length for various combinations of Nrun and trun extracted from the trajectories of the Trp-cage model.

The color of each plot indicates the log-ratio of Nrun to trun compared with the reference. The definition is log[(trun/Nrun)/(200/84)]. This value becomes greater than 0 for conditions with a higher ratio of trun to Nrun than the reference.

Effects of frequency of bias-strength replacement

The parameter tVST controls the frequency of the bias-strength switching in the TTP-V-McMD method. We investigated the effects of this parameter by comparing the TTP-V-McMD simulations of the Trp-cage model under the three conditions—tVST = 0.002, 0.2, and 20 ps—with trun = 50 ns for Nrun = 84.

Table 1 summarizes the frequency of traversals between high- and low-potential energy regimes (FtrvE), as defined in Eqs. (6) and (7) with X = 0.3 and 0.2, as well as the frequency of traversals between RMSDnative (FtrvRMSD) and Rg (FtrvRg). The simulations with a shorter tVST resulted in faster traversals in the potential energy space, indicating that with a shorter tVST, a wider potential energy range can be sampled in a shorter time. However, faster traversal in the potential energy space does not ensure faster transition of the protein conformation. For both X = 0.2 and 0.3, although the setting of tVST = 0.002 ps yielded the highest FtrvE, this condition did not yield a higher FtrvRMSD and FtrvRg compared to when a longer tVST was used. This result indicates that the relaxation of the conformation requires a longer time than that of the potential energy. If a strong bias is applied and the system takes a high-potential energy state, it can return to low-energy states before conformational changes. Therefore, a moderate speed for traversals in the potential energy space is ideal for efficient conformational sampling. In the case of X = 0.2, tVST = 0.2 ps exhibited the most frequent conformational changes.

Table 1:
Average values (and the standard errors) of the traversal frequencies over 84 runs for the Trp-cage model.
tVST (ps) 0.002 0.2 20
X 0.3
FtrvE (ns−1) 1.63 (0.06) 1.45 (0.04) 1.02 (0.04)
FtrvRMSD (ns−1) 0.057 (0.006) 0.060 (0.004) 0.062 (0.006)
FtrvRg (ns−1) 0.040 (0.005) 0.044 (0.003) 0.050 (0.006)
X 0.2
FtrvE (ns−1) 0.70 (0.03) 0.62 (0.02) 0.46 (0.02)
FtrvRMSD (ns−1) 0.005 (0.001) 0.011 (0.001) 0.006 (0.002)
FtrvRg (ns−1) 0.008 (0.002) 0.012 (0.001) 0.007 (0.002)
DOI: 10.7717/peerj-pchem.4/table-1

In addition, the resultant ensembles were slightly affected by the setting of tVST. We analyzed Pnative for ensembles of various trun values with Nrun = 84 using the bootstrap method (Fig. S6). The results for all three tVST values showed similar trends, i.e., near-constant average values and the gradual decay of the SD with the increase of trun. While tVST = 0.2 ps showed a smaller Pnative than the other two tVST settings, the difference was smaller than the SD. On the other hand, higher SD values were observed in the following order: tVST = 0.2 > 20 > 0.002 ps. This is consistent with the order of FtrvRMSD and FtrvRg (Table 1). The result indicates that more frequent traversals between high- and low-RMSDnative conformations make it possible to explore a wider region of the conformational space; thus, the population of the native conformation decreases, and the SD increases.

Regarding the PCCPMF with the reference setting (tVST = 0.2 ps, Nrun = 84, and trun = 200 ns), the average PCCPMF values at trun = 50 ns differed among different settings of tVST (Fig. S6). This indicates that changing tVST yields subtle differences in the resultant ensemble. Regarding the balance between trun and Nrun, the trends were similar for all the settings of tVST (Fig. S7).

Effects of system complexity: comparison with the PGA models

We performed the same analyses for the molecular models of PGA8 and PGA20. In contrast to Trp-cage, these peptides did not exhibit a particular fold. The FELs of both PGA8 and PGA20 were unimodal distributions, the basins of which consisted of a variety of collapsed conformations (Fig. 6 for tVST = 0.2 ps). The ensembles included short secondary structural elements but they were unstable. Although the shape of the small bumps in the basins differed depending on the simulation conditions, the overall geometries of the FELs were similar (Fig. S8 for tVST = 0.002 ps and 20 ps).

FELs calculated by ensembles of (A) PGA8 and (B) PGA20 using trun = 50 ns and Nrun = 84 with tVST = 0.2 ps.

Figure 6: FELs calculated by ensembles of (A) PGA8 and (B) PGA20 using trun = 50 ns and Nrun = 84 with tVST = 0.2 ps.

(C–E) Examples of snapshots in the basins marked in (A) and (B).

Regarding the balance between trun and Nrun, Fig. 7 shows the bootstrap averages of PCCPMF between the ensemble calculated by the full-length trajectory (trun = 50 ns and Nrun = 84) and those calculated by the reduced trajectories. No clear differences were found between the PCCPMF curve with reduced trun and that with reduced Nrun for both the PGA8 and PGA20 (Fig. 7 for tVST = 0.2 ps; Fig. S9 for the other conditions). A small number of long simulations exhibited the similar efficiency as that of many short simulations. In addition, no significant differences were found between the results of PGA8 and PGA20. It is noteworthy that the conformational space of PGA20 is considerably wider than that of PGA8 and similar to that of Trp-cage, because the conformational space volume of polypeptides is determined primarily by their length. Therefore, we concluded that the effects of balance between trun and Nrun are determined by the complexity of the FEL (e.g., existence of the free-energy barrier) rather than the conformational space volume. An increase in the number of runs is more effective for a system with more complex FEL.

Direct comparisons between reducing trun with fixed-Nrun (blue line) and reducing Nrun with fixed-trun (red line) for (A and B) the PGA8 and (C and D) PGA20 systems.

Figure 7: Direct comparisons between reducing trun with fixed-Nrun (blue line) and reducing Nrun with fixed-trun (red line) for (A and B) the PGA8 and (C and D) PGA20 systems.

The vertical axes indicate (A and C) the bootstrap average of PCCPMF, and (B and D) the SD of PCCPMF. The horizontal axis indicates the accumulated simulation length (Nrun × trun). The results of tVST = 0.2 ps are presented. See also Fig. S9 for the other tVST conditions.

For the PGA models, the frequencies of traversals in the potential energy and Rg spaces (FtrvE and FtrvRg, respectively) are summarized in Table 2. Both the PGA8 and PGA20 models yielded similar trends as the Trp-cage model (Table 1). Although frequent replacements of bias-potential strength enhanced the traversals in the potential energy space, they did not enhance the conformational changes in terms of Rg. This implies that the conformational changes are much slower than the potential energy changes even if there is no free-energy barrier exists in the landscape. However, in contrast to the Trp-cage case, the drawback of the frequent replacement, that is, slow traversals in the conformational space, is unclear in the case of PGA20.

Table 2:
Average values (and standard errors) of the traversal frequencies over 84 runs for PGA models.
Model PGA8 PGA20
tVST (ps) 0.002 0.2 20 0.002 0.2 20
X 0.3 0.3
FtrvE (ns−1) 2.91 (0.03) 2.70 (0.03) 0.86 (0.02) 1.05 (0.04) 0.99 (0.03) 0.46 (0.02)
FtrvRg (ns−1) 0.44 (0.02) 0.47 (0.02) 0.47 (0.02) 0.045 (0.005) 0.044 (0.005) 0.049 (0.006)
X 0.2 0.2
FtrvE (ns−1) 1.58 (0.02) 1.53 (0.02) 0.50 (0.01) 0.4 (0.02) 0.41 (0.01) 0.22 (0.01)
FtrvRg (ns−1) 0.117 (0.007) 0.147 (0.007) 0.146 (0.007) 0.011 (0.002) 0.013 (0.002) 0.015 (0.003)
DOI: 10.7717/peerj-pchem.4/table-2

Discussion

We examined the performance of the TTP-V-McMD method with regard to two adjustable settings: (i) the balance between the number of runs (Nrun) and the simulation length in each run (trun) and (ii) the frequency of the bias-strength replacement (tVST). For (i), in the Trp-cage model including folding–unfolding transition, we found higher robustness of the conditions with a larger number of runs than with longer simulations. In particular, the probability of the existence of native conformations in a resultant ensemble (Pnative) was more sensitive to the condition than the entire similarity of the FEL. However, for the cases of PGAs without free-energy barrier in their FELs, no significant effect was shown in the balance between the number and length of simulations. Therefore, the optimal balance depended on the molecular system, and the complexity of the FELs was a key feature rather than the degree of freedom. In any case, increasing the number of simulations was recommended because it is not worse than increasing the length of each run. This result is practically useful because performing many parallel runs is easier than executing a single long simulation. While the result obtained here encourages performing many short runs, it requires the condition that the initial structures of the production runs are uniformly sampled from the multicanonical ensemble, whose energy distribution is uniform (Ikebe et al., 2010). As our protocol samples the initial structures of the production runs from the previous iteration of the McMD, it is expected that this condition holds.

For (ii), whereas higher frequencies of bias-strength replacement enhance the sampling of a wider range of potential energy, they do not ensure the enhancement of the sampling of a wider range of conformations. This means that the enhancement of the sampling along one variable (e.g., potential energy or temperature) does not ensure the enhancement of the sampling along another variable (e.g., RMSD and Rg). Rapid traversals in the energy space sometimes obtain a high energy and return to the low-energy regime before conformational change regardless of the existence of free-energy barrier in the FEL. A moderate frequency is needed to maximize the performance for any molecular system.

The findings that we obtained by applying the TTP-V-McMD method provide insight into the characteristics of many other GE methods. (i) For GE methods that involve running independent parallel simulations, e.g., simulated tempering and AUS, performing many short runs can be more effective than increasing the length of each run. For GE methods where parallel runs are coupled, e.g., the REMD method, this conclusion should not be simply applied. For example, an increase in the number of runs in the REMD method resulted in larger overlaps of the distributions of neighboring replicas, along with an increase in the acceptance probability of replica-exchange trials. Our previous evaluation for the REMD method showed that a larger number of replicas does not always yield better results (Iwai, Kasahara & Takahashi, 2018). The number of runs should be adjusted independently from the coupling condition of the parallel runs; for example, the number of runs in a REMD simulation could be increased by performing two or more independent REMD simulations with different initial conformations, and aggregating the resultant ensembles. (ii) Regarding the frequency of the bias-strength replacement, the conclusion that the interval should be long enough to relax the conformation could be transferred to other GE methods. For the REMD methods, the effects of the interval for replica-exchange trials have been reported; while some studies recommended shorter intervals (Sindhikara, Meng & Roitberg, 2008; Sindhikara, Emerson & Roitberg, 2010), the side effects of highly frequent exchange trials have also been reported and were consistent to our result (Periole & Mark, 2007; Iwai, Kasahara & Takahashi, 2018).

Conclusions

In this study, the effects of two parameters of GE methods, i.e., (i) the balance between the number of runs (Nrun) and the simulation length in each run (trun) and (ii) the frequency of the bias-strength switching (tVST) were extensively examined with using all-atom explicit-solvent models of three polypeptides that are a foldable mini-protein and disordered peptides. We suggest a guide to adjust the setting for general molecular systems and GE methods. (i) Increasing in the number of runs should be prioritized rather than increasing the simulation length. (ii) Highly frequent replacements of the bias potentials may yield side effects because conformational relaxation was slower than potential energy relaxation. The time interval for replacement should be longer than or equal to 0.2 ps.

Supplemental Information

Schematic illustration of the V-McMD method.

(A) A virtual state is defined as a range of potential energies. The neighboring virtual states are overlapped in the potential energy space. This example shows the virtual system is consisting of five virtual states from v1 to v5. (B) A time course of potential energy in a V-McMD simulation. (C) A time course of virtual state in the same trajectory as (B). The virtual-state transitions in a certain time interval, tVST.

DOI: 10.7717/peerj-pchem.4/supp-1

Schematic illustration of the TTP method.

(A) Several V-McMD simulations are performed with the same molecular system but different initial conditions. They generate different trajectories. (B) The TTP method concatenates these trajectories in an arbitrarily order. The jumps at the concatenated points (triangles) are considered to be Monte Carlo steps.

DOI: 10.7717/peerj-pchem.4/supp-2

Effects of the number of bootstrap samples.

The bootstrap averages of PCCPMF using 100 (red) and 200 (cyan) samples for various trun are shown. The red curve is the same as the average shown in Fig. 2D.

DOI: 10.7717/peerj-pchem.4/supp-3

Potential energy distribution of the multi-canonical ensemble of the Trp-cage model.

The vertical axis indicates the natural logarithmic probability of snapshots with corresponding potential energies. The horizontal axis is the potential energy. The solid line is the multicanonical distribution generated by the reference ensemble, and the dashed lines are the canonical distribution at 300 K (left) and 600 K (right).

DOI: 10.7717/peerj-pchem.4/supp-4

Results of bootstrap analyses for combinations of various Nrun and trun.

The color gradations indicate (A) the average of PCCPMF, (B) the SD of PCCPMF, (C) the average of Pnative, and (D) the SD of Pnative.

DOI: 10.7717/peerj-pchem.4/supp-5

The effects of tVST on (A) the average of PCCPMF, (B) the SD of PCCPMF, (C) the average of Pnative, and (D) the SD of Pnative for the Trp-cage model.

The horizontal axis indicates trun. The blue, red, and orange lines indicate tVST = 0.002, 0.2, and 20 ps, respectively.

DOI: 10.7717/peerj-pchem.4/supp-6

Summary of the bootstrap averages (A, B, C, G, H, I) and SDs (D, E, F, J, K, L) of PCCPMF (A, B, C, D, E, F) and Pnative (G, H, I, J, K, L) for all the conditions of the Trp-cage model.

The horizontal and vertical axes indicate Nrun and trun, respectively.

DOI: 10.7717/peerj-pchem.4/supp-7

FELs calculated by ensembles of PGAs using trun = 50 ns and Nrun = 84.

(A) The FEL of PGA8 with tVST = 0.002 ps. (B) The FEL of PGA8 with tVST = 20 ps. (C) The FEL of PGA20 with tVST = 0.002 ps. (D) The FEL of PGA20 with tVST = 20 ps. See also Fig. 6 in the main text for tVST = 0.2 ps.

DOI: 10.7717/peerj-pchem.4/supp-8

Direct comparisons between reducing trun with the fixed-Nrun condition (blue line) and reducing Nrun with the fixed-trun condition (red line) for (A, B, C, D) the PGA8 and (E, F, G, H) PGA20.

The vertical axes indicate (A, C, E, G) the average of PCCPMF, (B, D, F, H) the SD of PCCPMF. The horizontal axis indicates the accumulated simulation length (Nrun × trun). The tVST were (A, B, E, F) 0.002 ps and (C, D, G, H) 20 ps. See also Fig. 7 in the main text for the other tVST condition.

DOI: 10.7717/peerj-pchem.4/supp-9