Abstract
Molecular simulation models have provided immense, often general, insight into the complex behavior of protein systems. Even for very detailed, e.g., atomistic, models, the generation of quantitatively accurate dynamical properties remains a formidable challenge. This lack of consistent dynamics largely hinders simulation models, especially coarse-grained models, from providing structural interpretations for kinetic experiments. In this work, we investigate to what extent a simple, native-biased coarse-grained model is capable of reproducing the dynamics, or more specifically kinetic properties, of an underlying helix-coil transition. In order to accurately represent the underlying structural ensemble, this model employs near-atomistic steric interactions. We investigate structure-kinetic relationships in order to identify the structural constraints necessary to guarantee consistent kinetics, given the implicit restrictions enforced by the physics of the model. From each set of simulations, we construct a Markov state model to efficiently and systematically assess the system kinetics. We demonstrate that the accurate representation of the structural ensemble results in a rather large restriction in the topology of the resulting kinetic networks. As a consequence, relatively weak structural constraints are needed in order to nearly quantitatively reproduce many kinetic properties of the underlying system. Not surprisingly, while structural constraints determine the kinetics at a single temperature, fixing the structure over multiple temperatures determines the thermodynamics, i.e., cooperativity, of the transition. Remarkably, topological features of the kinetic networks characterizing the degree of randomness of pathways traveling between the helix and coil states at a single reference temperature dictate the relative cooperativity of the resulting transition.
I. INTRODUCTION
In recent years, a significant overlap in the timescales accessible to experiments and computer simulations probing the dynamics of individual protein molecules has been achieved.1,2 This overlap provides an array of unexplored opportunities for a closer interplay between these traditionally disparate approaches, with potential for improving methodologies as well as recovering deeper insight from specific applications. While atomically-detailed molecular dynamics simulations have emerged as the gold standard for the theoretical investigation of microscopic and chemically-specific driving forces for particular, e.g., fast-folding, processes, there remains significant challenges for utilizing these detailed simulations as a general tool for interpreting both ensemble-averaged and kinetic experimental observations. While many studies have aimed at assessing and improving the static properties of all-atom (AA) force fields,3,4 as well as extending these models to accurately describe a wider range of systems, e.g., intrinsically disordered proteins,5 investigations to systematically assess kinetic properties have only recently begun.6 Additionally, despite ever-increasing computational power, an overwhelming gap remains for exclusively applying such detailed models to investigate the large range of timescales (from ps to hours), thermodynamic or chemical conditions (e.g., denaturant concentrations), as well as system variations (e.g., sequence mutations) commonly explored in experimental studies. Indeed, much simpler models, e.g., ensemble construction methods7,8 or analytically-solvable polymer models,9,10 are routinely employed to build structural interpretations of experimental observations.
In between these simple models and AA models, coarse-grained (CG) simulation models for proteins may retain specific structural and chemical properties of the underlying system, while removing extraneous details to save on the computational cost of exploring configuration space. In particular, simple physics-based11 and native-biased12 models, have largely shaped the foundation for our current interpretation of the protein folding process.13 Although these models provide a qualitative picture of the dynamical processes sampled by protein systems, it is typically not expected that such simplistic models can quantitatively reproduce the corresponding kinetic properties. Recent advancements in CG methodologies allow increased chemical detail and accuracy, while retaining the sampling efficiency necessary to address problems intractable for AA models.14–16 A number of advanced CG protein models have been developed,17–20 with varying representations, interactions and parametrization philosophies, and have been successfully employed to investigate a range of specific folding and aggregation processes.
The beneficial speed-up of CG models, attained through a combination of reduced molecular friction and softer interaction potentials, comes at the cost of obscuring the connection to the true dynamical properties of the underlying system. Unlike many polymer systems, where a single dynamical rescaling factor is capable of recovering the correct dynamics of the underlying system,21,22 the rescaling of CG dynamical processes may generally be a complex function of the system’s configuration. This lost connection to the true system dynamics represents a severe limitation for CG models, which may not only prevent quantitative prediction of kinetic properties, but may also lead to qualitatively misleading or incorrect interpretations generated from CG simulations. For example, Habibi et. al.23 recently demonstrated that three different CG models provide disparate descriptions of the forced unfolding process of a 110 residue peptide, despite the capability of all models to fold the peptide to the proper native structure. Furthermore, the fidelity of the folding process, with respect to an AA reference simulation, did not correlate with the complexity of the model. Perhaps more troubling, Rudzinski et. al.24 recently demonstrated that even for a tripeptide of alanines, the hierarchy of local dynamical processes along the peptide backbone, i.e., transitions between metastable states on the Ramachandran plot, is qualitatively misrepresented by a transferable, physics-based CG model.
Although it is possible in principle to rescue the dynamics of a CG model via a generalized Langevin formalism, this approach offers a daunting computational and conceptual challenge for complex biological molecules that give rise to hierarchical dynamics, i.e., kinetic processes coupled over various timescales. As an alternative, it may be possible to rescue the kinetic properties generated from a CG simulation via an a posteriori reweighting of simulation data in order to reproduce a set of provided reference observables, e.g., experimental measurements.24,25 However, the extent to which experimental data can correct for deficiencies in simulation models through a reweighting is obviously limited. Thus, a detailed understanding of the link between given target observables, e.g., structural properties, and the accurate reproduction of relative kinetic quantities is required.
In this work we investigate relationships between structural and kinetic properties in the helix-coil transition networks generated by various simulation models. As a fundamental process in protein folding, investigation of α-helical secondary-structure formation represents a logical step in understanding the kinetic properties generated by CG protein models. As a primary tool for investigation, we employ a relatively simple, native-biased model, whose parameters may be easily tuned to elucidate the connection between specific interactions, e.g., hydrophobic attraction between side chains, and emergent properties of the resulting kinetic network. However, in order to sample a physically-realistic ensemble of structures (likely a necessary condition for the consistent reproduction of kinetic properties), this model retains a near-atomistic description of backbone steric interactions. As a complement to this model, we additionally consider a transferable, physics-based CG peptide model, which also retains near-atomistic backbone resolution but employs phenomenological interactions in order to reproduce the balance of α/β structural propensities in a number of distinct peptide systems.19,26
As a model system, we examine an uncapped heptapeptide of alanines, for which reference AA simulations display a complex, disordered ensemble of pathways between the helix and coil states, resulting in low helical content and strong end effects—a challenge for the simple native-biased model. We demonstrate, using standard Lifson-Roig models as a structure-characterization tool, that matching of the Lifson-Roig parameters guarantees the reproduction of certain relative timescales, given a weak structural constraint on the system as well as the implicit constraints provided by the underlying physics of the model. We then characterize the temperature dependence of the various CG models, further validating the structure-kinetic relationship, and demonstrate that constraining simple structural properties at a single temperature is not enough to dictate the cooperativity of the resulting transition. However, an investigation of the detailed properties of the transition network, afforded by Markov state models built directly from the CG simulations, reveals connections between the network topology and the thermodynamic properties of the model. Remarkably, the average conditional path entropy—a graph measure quantifying the degree of randomness for trajectories traveling between the helix and coil states through particular intermediate states—determined at a single reference temperature, provides sufficient information for determining the relative cooperativity of each CG model.
II. METHODS
A. Coarse-grained (CG) models
1. Hybrid Gō (Hy-Gō)
To investigate the relationship between structural and kinetic properties generated from CG simulation models, we wanted a simple model (i.e., with few, physically-motivated parameters) which sampled all the relevant conformational states. For this purpose, we proposed a flavored-Gō model with three interactions: (i) a native contact (nc) attraction, Unc, employed between pairs of Cα atoms which lie within a certain distance in the native structure, i.e., the α-helix, of the peptide. (ii) a desolvation barrier (db) interaction, Udb, also employed between native contacts, and (iii) a hydrophobic (hp) attraction, Uhp, employed between all pairs of Cβ atoms of the amino acid side chains. Unc and Uhp are necessary for sampling the correct ensemble of conformations (i.e., helix, coil, and swollen structures), while Udb assists in providing cooperativity in the resulting transitions. We employed the same functional forms as in many previous studies,27 with a tunable prefactor for each interaction as described below.
These three Gō-type interactions may be viewed as the minimum resolution necessary to roughly sample the correct conformational ensemble for short peptides. However, because we are interested in characterizing kinetic properties, it is important that we reproduce the underlying conformational ensemble more precisely. More specifically, we wanted to ensure that the model (i) does not sample conformations that are sterically forbidden in the all-atom (AA) model and (ii) samples the relevant regions of the Ramachandran plot, while retaining barriers between metastable states. Thus, in addition to the simple interactions described above, the model also partially employs a standard AA force field, AMBER99sb,28 to model both the steric interactions between all nonhydrogen atoms and also the specific local conformational preferences along the chain. More specifically, the bond, angle, dihedral, and 1-4 interactions of the AA force-field are employed without adjustment. To incorporate generic steric effects, without including specific attractive interactions, we constructed Weeks-Chandler-Andersen potentials (i.e., purely repulsive potentials) directly from the Lennard-Jones parameters of each pair of atom types in the AA model. For simplicity of implementation, we then fit each of these potentials to an r-12 functional form. The van der Waals attractions and all electrostatic interactions in the AA force field were not included and water molecules were not explicitly represented.
The total interaction potential for the model may be written: Utot = ϵnc Unc + ϵdb Udb + ϵhp Uhp + ϵbb Ubb, where the backbone (bb) interaction includes both the intramolecular and steric interactions determined from the AA force field. The first three coefficients represent the only free parameters of the model, while ϵbb = 1. The relative impact of of the backbone interactions may then be characterized by Etot ≡ ϵnc + ϵdb + ϵhp.
2. PLUM
As an alternative CG model, we considered the PLUM model, which also describes the protein backbone with near-atomistic resolution, while representing each amino acid side chain with a single CG site, within an implicit water environment.19 In PLUM, the parametrization of local interactions (e.g., sterics) aimed at a qualitative description of Ramachandran maps, while longer-range interactions—hydrogen bond and hydrophobic— aimed at reproducing the folding of a three-helix bundle, without explicit bias toward the native structure.19 The model is transferable in that it aims at describing the essential features of a variety of amino-acid sequences, rather than an accurate reproduction of any specific one. After parametrization, it was demonstrated that the PLUM model folds several helical peptides,19,29–32 stabilizes β-sheet structures,19,26,33–35 and is useful for probing the conformational variability of intrinsically disordered proteins.36
B. Simulation Details
1. AA
We employed AA simulations of an uncapped heptamer of alanine residues (Ala7), previously published by Stock and coworkers.37 In short, these simulations employed the GROMOS 45A3 force field38 along with the SPC water model39 to obtain an 800 ns trajectory, sampled every 1 ps, for Ala7 in the zwitterionic state at 300 K.
2. Hy-Gō
CG molecular dynamics simulations of Ala7 with the Hy-Gō model were performed with the Gromacs 4.5.3 simulation suite40 in the constant NVT ensemble, while employing the stochastic dynamics algorithm with a friction coefficient and a time step of . For each model, ten independent simulations were performed with starting conformations varying from full helix to full coil. Each simulation was performed for 100,000 , recording the system every 0.5 . The CG unit of time, , can be determined from the fundamental units of length, mass, and energy of the simulation model, but does not provide any meaningful description of the dynamical processes generated by the model. In this case, ps.
3. PLUM
CG simulations of Ala7 with the PLUM force field19,31,41 were run using the ESPResSo simulation package.42 For details of the force field, implementation, and simulation parameters, see Bereau and Deserno.19 A single canonical simulation at temperature kBT = 1.0 ε was performed for 200,000 with at timestep of 0.01 , recording the system every 0.5 , where ps. Temperature control was ensured by means of a Langevin thermostat with friction coefficient .
C. Markov State Models
Markov state models (MSMs) are kinetic models that characterize the probability of transitioning between a finite set of microstates, chosen to represent the configuration space of the underlying system.43–45 The transition probabilities may be estimated directly from molecular dynamics trajectories via a Bayesian scheme to enforce relevant physical constraints. In the present work, MSMs are built from the Ala7 helix-coil trajectories generated from each simulation model. To determine the MSM microstate representation, Principle Component Analysis was performed on the configurational space characterized by the ϕ/ψ dihedral angles of each residue along the peptide backbone. A density clustering algorithm was then applied to the five “most significant” dimensions in order to determine the number and placement of microstates, following previous investigation of the AA trajectory.46 This procedure yields 32 states in all cases, corresponding to the enumeration of all possible helix/coil state combinations for each of the 5 peptide bonds. MSM construction and analysis was performed using the pyEmma package.47 (See Supporting Information for more details and MSM validation.) We also considered lower resolution MSMs, characterized by the number of helical residues in a conformation, Qh. These MSMs were constructed from the “full” MSM via a direct mapping of the transition probability matrix. Such a construction comprises the quantitative description of kinetics. However, we found that this reduced resolution provides a useful tool for assessing the differences between various kinetic descriptions of Ala7.
III. RESULTS AND DISCUSSION
In this study, we investigate the relationship between structural and kinetic properties of helix-coil transition networks generated by microscopic simulation models. The equilibrium statistical mechanics of helix-coil transitions is well-characterized by a 1D-Ising model, which represents the state of each residue as being either helical, h, or coil, c.48 These simple equilibrium models employ two parameters, w and v, according to the Lifson-Roig (LR) formulation,49 which are related to the free energy of helix propagation and nucleation, respectively. These parameters may be determined directly from simulation data using a Bayesian approach,50 and describe the overarching structural characteristics of the underlying ensemble (see Supporting Information for more details). Perhaps the most important quantity, the average fraction of helical segments, ⟨ƒh⟩, i.e., propensity of sequential triplets of h states along the peptide chain, may be measured directly from calorimetry experiments. The average number of helical residues, regardless of the particular sequence of states along the peptide chain, denoted ⟨Nh⟩, provides a complimentary observable to ⟨ƒh⟩. Although residue-or sequence-specific LR parameters may be determined in order to more faithfully reproduce the helix-coil properties generated from a simulation of a particular peptide system, in this work we determine a single set of {w, v} for each simulation model. These simple models are incapable of describing certain features, e.g., end effects, of the underlying systems; however, we utilize the LR parameters only as a characterization tool and demonstrate that the sequence-independent parameters are sufficient to effectively distinguish between various characteristics of the underlying simulations.
The kinetics of helix-coil transitions is often interpreted in terms of a kinetic extension of the Ising model.51,52 These kinetic models typically assume a simple relationship between the LR parameters and the on/off rate of helix formation and have been widely successful in describing the emergent kinetic properties observed in experiments.52,53 However, the kinetics of the helix-coil transition may demonstrate drastic divergences from this behavior, e.g., when misfolded intermediates complicate the network of transition pathways.54 Even in simpler cases, the precise impact of the model’s assumptions on the fine details of the resulting kinetic network is not well understood. Here, we construct kinetic models directly from the simulation trajectories, i.e., Markov state models, allowing both a more complex relationship between the LR parameters and the kinetic properties of the system and also providing a direct assessment of the assumptions made by the more approximate kinetic models.
As a model system, we consider an uncapped heptapeptide of alanine residues (Ala7), i.e., 5 peptide bonds. From the LR point of view, there are 32 states, determined by enumerating the various sequences of h’s and c’s. We employ an all-atom (AA) simulation, previously studied by Stock and coworkers,37,46,55 as a reference in order to assess to what extent CG models are capable of reproducing the kinetic properties of this more detailed model. We note that it is well-known that distinct AA force fields yield widely varying results, e.g., in terms of helical propensities, for short peptide systems.50 The AA simulation employed in this work serves as an ideal reference since it samples a largely disordered ensemble with a diverse collection of pathways from the coil to helix state, representing a challenge for the CG models considered. Fig. 1 presents a network representation of the AA simulation, plotted along simple reaction coordinates that characterize the helicity of each conformation on the horizontal axis and the direction (c- or n-terminus) of folding on the vertical axis. The thickness of the arrows denote the relative probability flux passing between pairs of states for trajectories starting from the coil state and ending at the helix state. The “middle” of the network appears quite random, with more directed transitions towards the ends of the graph. Interestingly though, the residue dynamics remain highly coupled (see Fig. S4). The color of each state denotes its committor value—the probability of reaching the coil state before returning to the helix state. Clearly the landscape is largely tilted toward the unfolded region, as the committor values are very close to 1 for all but a few states.
A. Identifying structure-kinetic relationships at a single temperature using the Hy-Gō model
The focus of our study is a relatively simple, native-biased CG model. The hybrid Gō (Hy-Gō) model, is a flavored-Gō model, with native contact (nc) interactions (i.e., hydrogen-bonding-like interactions between i/i + 4 residues) and associated desolvation barriers (db) between Cα atoms, as well as generic hydrophobic (hp) attractions between all pairs of Cβ atoms. The relative strengths of these interactions are determined by the set of 3 model parameters: {ϵnc, ϵdb, ϵhp}. Fig. 2 presents a visualization of the model representation and also the corresponding interaction potentials. The model also employs physics-based interactions in the form of sterics (see bottom-right panel of Fig. 2) and torsional preferences along the backbone. We denote this model “hybrid” since it employs both traditional Gō-type interactions as well as detailed physics-based interactions determined from an AA model. Further details of the model and corresponding simulations are described in the Methods section.
In order to assess the capability of the Hy-Gō model to reproduce structural and kinetic properties of the AA model, we considered various combinations of ϵnc, ϵdb, and ϵhp and then adjusted the temperature to reproduce the average helicity of the AA model, . We found three classes of CG models: (i) those that reproduced at some temperature T*, (ii) same as (i), but whose native state was the coil state, and (iii) those where the helix state was native but the model could not achieve such a low value of helicity at any temperature. For each model, we determined the two LR parameters, {w,v}, and constructed a 32-state Markov state model (MSM) directly from the simulation data.
Fig. 3 presents a “parameter landscape” for the Hy-Gō model, plotted as a ternary diagram with each axis characterizing the relative importance of the three Gō-type interactions: ϵi/Etot, where Etot ≡ ϵnc + ϵdb + ϵhp. Because the relative impact of the backbone interactions depends on Etot, we discretize along Etot and plot ternary diagrams for distinct values along this coordinate. Squares indicate models of type (i), circles of type (ii), and triangles of type (iii). The color of each model denotes the error with respect to some property of the AA model (cooler colors represent lower errors). The first row characterizes the root mean square error (rmse) with respect to the LR parameters of the AA model, while the second row quantifies the rmse with respect to the slowest 2 dynamical processes of the system (as characterized by the corresponding eigenvectors of the MSM at the Qh level of resolution, see the Methods section for details) and also the ratio of timescales between these two processes. There is a clear correspondence between models that reproduce the structural and kinetic metrics. In other words, given the constraint of ⟨Nh⟩, approximately fixing the LR parameters is enough to also accurately reproduce the overarching hierarchy of kinetic processes of the underlying system at T*. In particular, 7 “AA-like” Hy-Gō models were identified as reproducing the properties of the AA transition with the highest accuracy. It is important to note that while previous studies have connected the LR parameters to emergent kinetic properties of the helix-coil transition,52,53 the generation of consistent helix-coil properties from a microscopic simulation model is non-trivial. This is perhaps best demonstrated by the difficulties of AA models to reproduce the melting curve for relatively simple peptides.5,50
Fig. 4 presents both static and kinetic properties generated from the AA model (solid, black curves), the 7 AA-like Hy-Gō models (colored curves), and for two additional Hy-Gō models that also reproduce (while still sampling all 32 states), but do so at the largest and smallest simulated temperatures (dashed and dashed-dotted, black curves, respectively). The properties of the latter 2 models represent the range of attainable values of each observable, given the weak structural constraint of and also the implicit constraints enforced by the details of the model, but allowing for distinct LR parameters and, thus, ⟨ƒh⟩ values. Panel (ai) presents ⟨Nh⟩ and ⟨ƒh⟩ (described above), as well as the average fraction of neighboring pairs of helical residues, ⟨Ns⟩, and the average fraction of isolated helical residues, ⟨Nl⟩. Panel (aii) presents the equilibrium distribution along the number of helical residues in the sequence, Qh. Panel (aiii) presents the average fraction of helical segments per residue, ⟨h(i)⟨. Panels (ai) and (aii) demonstrate that the AA-like Hy-Gō models reproduce the static properties of the AA model quite well. As expected, panel (aiii) shows that these models are incapable of perfectly reproducing the strong end effects of the underlying ensemble.
Panel (bi) presents the fractional flux passing through each state for trajectories that begin in the coil state and end in the helix state. Panel (bii) presents ratios of important timescales in the underlying simulations. Note that we always consider ratios of timescales, to account for the overall speed-up of each CG model. knuc (kr-nuc) and kel (kr-el) characterize the (reverse) rates of nucleation and elongation, respectively (see Supporting Information for details about the calculation of these rates). ti denotes the timescale of the ith slowest kinetic process according to the MSM. The Qh superscript denotes timescales calculated from a reduced MSM with microstates corresponding the number of h residues in the sequence. Despite the detailed structural deficiencies of these models, both the fractional flux per state and the various timescale ratios are reproduced with surprisingly high accuracy. Interestingly, the approximate bounds on the timescale ratios, represented by the black dashed and dashed-dotted curves, demonstrate that there is a rather small amount of freedom in these quantities (other than for knuc/kel). We suggest that this is due to the implicit constraints on the kinetic network, enforced by the underlying physics of the model.
Fig. 5 presents network representations of the AA model and a representative AA-like Hy-Gō model (top panel) as well as the low- and high-temperature Hy-Gō models introduced above. The networks are presented in a different way than Fig. 1, with the horizontal axis corresponding to the value of the committor for each state and the vertical axis corresponding to the local graph entropy: , where Tij is the transition probability from state i to state j. Sloc (i) quantifies the degree of randomness for pair transitions originating in state i. For visual clarity, we discretized the networks along these two coordinates and grouped states sharing the same grid into a single node. The relative size of each node denotes the number of underlying states. The color of each node corresponds to the average fractional flux passing through the node for trajectories starting from the coil state and ending in the helix state. The networks in Fig. 5 provide an illustration of the network topologies which generate the various properties presented in Fig. 4. The AA-like model appears to mimic the AA network in many ways, although the spread of states along each axis is not quite reproduced. On the other hand, the low- and high-temperature networks display drastically different topologies, albeit while retaining the average helicity of the AA model, . Thus, ⟨Nh⟩ is indeed a weak structural constraint for this system.
B. Simple kinetic approximations
As already mentioned above, the emergent kinetics of the helix-coil transition are well characterized by simple kinetic models constructed directly from the LR parameters.51,52 Here, we briefly assess the resulting properties of these approximate models for comparison with the CG simulation models. We consider two models built directly from the LR parameters, denoted kinzip and kIsing. The kinzip model corresponds to the kinetic zipper model,52 which assumes that the rate of transition from the c to h state for each residue is limited by the nucleation parameter, v. The kIsing model is based on the formulation of Schwarz,51 which uses the LR parameters to define the reaction rates for individual triplets of residues transitioning between conformational states. We also employed a different approach for constructing approximate models for the simulation kinetics, following the n-m approximation.56 In this procedure, we start with a “local” kinetic model determined from the simulation data, i.e., by ignoring the state of residues which lie beyond some number of peptide bonds away from a given residue, and then construct a full MSM, assuming that the residue dynamics beyond the chosen separation are decoupled. In this way, a systematic investigation into the coupling of residue dynamics along the peptide backbone can be performed. We refer to this procedure as the dynamic coupling analysis (DCA), and the corresponding models are denoted DCA-corr-x, where x is the chosen number of peptide bonds to be correlated in the model. See the Supporting Information section for details about the construction of each of these models.
Overall, the approximate kinetic models quite accurately reproduce both the static and kinetic properties of the underlying, AA model (Fig. S4). A full decoupling of residue dynamics (DCA-corr-0) is required to introduce large errors, yielding a spread of flux throughout the network and the systematic oversampling of intermediate states, as previously reported for bottom-up CG models for the helix-coil transition.57 Smaller discrepancies in the other models can be understood in terms of the strong end effects in this short, uncapped peptide. Fig. 5 presents network representations of the kinzip and kIsing models, and two approximate models constructed by systematic decoupling of the residue dynamics, assuming no correlations and i/i + 3 correlations (DCA-corr-0 and DCA-corr-3, respectively). The kinzip and kIsing networks seem to reproduce the AA properties via a simplified network topology, while DCA-corr-3 retains a closer resemblance to the AA network. The DCA-corr-0 network displays drastically different topological features, as is apparent in the resulting properties of this model. Decoupling of residue dynamics leads to an increase in committor values and local entropies, especially for the largely helical states. Interestingly, many properties of the underlying network (as demonstrated in Figs. 4 and S4), may be reproduced by quite distinct network topologies. Further investigation into the properties generated from these simple kinetic approximations may provide insight into how the details of the kinetic network are affected by the particular approximations made within the model construction. This is beyond the scope of the present work.
C. Further validation of structure-kinetic relationships using a transferable CG model
We also considered the transferable CG PLUM model with three distinct parametrizations. Results for the original model are presented in the main text, while equivalent results for the two reparametrizations are presented in the Supporting Information. The static and kinetic properties of the PLUM model for Ala7 are presented in Fig. 6 as solid violet curves. The PLUM model strongly stabilizes helical structures for Ala7, and is incapable of achieving the low helical content of the AA model. The significant discrepancies between the AA (black, dashed-dotted curves) and PLUM model are not surprising since (i) neither the PLUM nor the AA model was parametrized to reproduce properties of small peptides and (ii) various AA models yield widely varying structural properties for peptide systems.50 Rather, in this study the AA and PLUM models represent two distinct reference ensembles for investigating the interplay between structure and kinetics. We extended the search in Hy-Gō parameter space to find a set of models which more closely reproduce the structural and kinetic properties of the PLUM model. Fig. 6 demonstrates that by approximately matching ⟨Nh⟩ and ⟨ƒh⟩, the “PLUM-like” Hy-Gō model nearly quantitatively reproduces both the equilibrium and kinetic properties of the PLUM model. The network topology of the PLUM-like Hy-Gō model also closely resembles that of the PLUM model (Fig. S7).
D. Thermodynamics of Ala7 helix-coil transitions
The temperature dependence of the helicity, in particular ⟨ƒh)(T), is a fundamental quantity for characterizing secondary structure formation in proteins. Fig. 7 presents the T-dependence of w (top panel) and ⟨ƒh⟩ (bottom panel) for the AA-like Hy-Gō models (purple, circle markers), other Hy-Gō models which reproduce at some temperature T* (blue, upward triangle markers), Hy-Gō models incapable of reproducing (green, sideways triangle markers), the PLUM-like Hy-Gō models (red, square markers), and the PLUM model (black, X markers). The energy scales of the different models were aligned by shifting the temperature such that all models achieve at T*. For models not simulated at this temperature, a linear extrapolation is employed.
From the temperature dependence of w, we fit a thermodynamic model to quantify the T-independent enthalpy and entropy of helix extension. In particular, we assume kBT ln w(T) ~ ΔHhb — TΔShb, neglecting the heat capacity contributions to the free energy and considering a relatively small range in T. ΔHhb corresponds to the slope of the curves in Fig. 7 and is a simple measure of the cooperativity of the transition. The resulting thermodynamic values for each model are given in Table S1 and are consistent (in terms of order of magnitude) with those determined from various AA models for a longer peptide,50 although the entropic values are somewhat larger. It is straightforward to correlate these thermodynamic quantities with the parameters of the Hy-Gō model. This analysis confirms general intuition about their behavior with changes in the model parameters and provides useful insight for tuning the Hy-Gō parameters to reproduce particular features of a reference model. For example, the ϵnc and ϵhp interaction nearly exclusively determine ΔHhb, while ΔSnuc (obtained from building a thermodynamic model for v) depends strongly on both ϵdb and the effective impact of the backbone interactions (Figs. S5 and S6).
Although Fig. 4 demonstrates that the AA-like Hy-Gō models display very similar structural and kinetic properties at T*, Fig. 7 and Table S1 show that their cooperativities (i.e., ΔHhb) vary by more than 25% of the average value for these models. In other words, not surprisingly, reproducing simple structural properties at a single temperature is not enough to determine the thermodynamics of the model. To further validate the structure-kinetic relationships discussed above, we examined these relationships for each of the Hy-Gō models and PLUM models considered, regardless of their particular structural features. There is a 1-to-1 correspondence between the ratio of nucleation and elongation rates and the average propensity of helical segments, as well as the T-dependence of these quantities (Fig. S9). These particular relationships are unlikely to generally hold for much longer, heterogeneous sequences, where complex nucle-ation/elongation kinetics may arise.54 However, these results indicate that, when Arrhenius behavior holds, there may be simple relationships between the T-dependence of structural and kinetic properties. Moreover, these findings suggest that reproducing particular structural features over multiple temperatures may be an avenue for improving structural, thermodynamic and kinetic consistency for CG models.
E. Relationships between thermodynamics and network topology
The Markov state model characterization of the system kinetics provides a convenient and powerful framework for investigating the details of the hierarchy of kinetic processes sampled by particular helix-coil transitions. Indeed, graph-theoretic observables have been previously employed for understanding complex phenomena, e.g., kinetic frustration,58 arising within the dynamical processes of proteins. We applied discrete transition path theory59 to investigate the ensemble of pathways between the helix and coil states and identified simple relationships between the network topology and typical physical observables, e.g., ⟨ƒh⟩. We found that ⟨ƒh⟩, at a particular temperature, is largely determined by the number of paths needed to account for half of the fractional flux of probability for trajectories traveling from the coil to the helix (Fig. S12). This somewhat surprising relation is likely due to the relatively strong constraints on the network topology enforced implicitly through the underlying physics of the models.
As the temperature of each system is decreased, both the committor value and local entropy of each state tends to decrease (Fig. S11). Because of these generic changes in the network as a function of temperature, there may exist a feature of the network topology, at a particular reference temperature Tref, which determines the thermodynamics, i.e., cooperativity, of the corresponding transition. In the following, Tref corresponds to the highest temperature at which each model was simulated (see Fig. 7). We consider the path entropy, i.e., Shannon entropy of the probability distribution of pathways, which characterizes the degree of randomness of paths traveling from a starting state, s, to an ending state, d. Similarly, the conditional path entropy,60 Hsd|u, characterizes the average degree of randomness for paths from s to d passing through a particular intermediate state, u.
Naively, one may hypothesize that networks with the most directed flux from coil to helix at Tref would display a faster change of population into the helical state as the temperature is reduced. In other words, in order to maximize the cooperativity, the conditional path entropy, averaged over all intermediate states, should be minimized. To investigate the relationship between directed flux in the network and cooperativity of the underlying transition, we performed a correlation analysis by constructing a linear combination of input features, {f}, to characterize the target observable, , where the coefficients, {ci}, are determined as a best fit over all models. In this case, O corresponds to the slope of ⟨ƒh⟩(T), denoted Δ⟨ƒh⟩/ΔT, or equivalently ΔHhb. The set of 60 conditional path entropies for each intermediate state between coil and helix and for both the folding and unfolding directions were initially considered as input features. For numerical convenience, we weighted each of the features by the fractional flux passing through each state (as introduced above). Panel (a) of Fig. 8 demonstrates the resulting correlation when all 60 features are employed. Clearly, there is more than sufficient information in these features to distinguish between the cooperativities of the different models. However, we believe that this correlation is non-trivial since (i) the network topology metric, , was determined at different temperatures and, therefore, for distinct structural ensembles for each model, (ii) we are considering two types of models, namely, Hy-Gō and PLUM, with various parametrizations, and (iii) the reference ensemble for each model corresponds to an unfolded ensemble. Moreover, the removal of input features corresponding to the unfolding direction results in significant deviations in the correlation, indicating the necessity of particular features in order to faithfully describe cooperativity.
By systematically disregarding features which played a minimal role in the resulting correlation, we determined a reduced set of 16 features necessary to retain a reasonably accurate correlation (panel (b) of Fig. 8). Interestingly, the most important features corresponded to the conditional path entropies for a set of 8 states with intermediate helical content: {cchhh,hcchh,chhcc,cchhc,hchcc,chcch,hcchc,hhccc}.
This correlation analysis revealed that, unlike our naive assumption, it is not a simple minimization of the average conditional path entropies which determines a maximally cooperative model. Rather, there is a balance between achieving small conditional path entropies (i.e., directed flux) for some intermediate states and higher conditional path entropies (i.e., nondirected flux) for others. Additionally, there is a strong anticorrelation between features corresponding to the folding and unfolding directions. Apparently, it is a maximization of directed flux in the folding direction for particular states, while maintaining sufficient connections in the network to facilitate a relatively high degree of randomness in the unfolding direction which yields a network topology that, upon cooling, results in the most cooperative transition.
To further demonstrate this concept, panel (c) of Fig. 8 presents the conditional path entropy in the folding direction averaged over all intermediate states (i.e., the naive picture of cooperativity described above) versus Δ⟨ƒh⟩/ΔT. Clearly, the naive interpretation has some merit, indicated by the weak correlation between these quantities, with no adjustable parameters. However, the discrepancies in the correlation, especially for the PLUM models which are significant outliers to the trend, indicate that an essential feature is missing from this simple description of cooperativity. Interestingly, if the anticorrelation between folding and unfolding directions is incorporated by subtracting the average conditional path entropy in the unfolding direction, the correlation for the Hy-Gō models is somewhat improved (see Fig. S13). Further analysis is required to better understand the connection between network topology at a single reference temperature and cooperativity of the model and to test the applicability of this relationship for distinct systems.
IV. CONCLUSIONS
In this study, we have demonstrated that a relatively simple coarse-grained model is capable of nearly quantitatively reproducing the detailed kinetic properties of a complex helix-coil transition. In particular, we verified a robust relationship between the Lifson-Roig parameters, or similarly the average fraction of helical segments ⟨ƒh⟩, and the resulting hierarchy of kinetic processes (as determined by a Markov state model of the transition). Importantly, this relationship is likely dependent upon the implicit constraints enforced by the physics of the simulation model. Here, we employed near-atomistic steric and intramolecular backbone interactions, along with simple Gō-type interactions, in order to accurately sample the underlying structural ensemble. This Hybrid Gō (Hy-Gō) model proved extremely useful for the kinetic investigation since it (i) employs few, physically-motivated parameters, allowing an extensive and easily interpretable search through parameter space, while (ii) allowing a direct, microscopic comparison with other models employing similar backbone resolution and (iii) retaining accurate sampling of metastable states along the Ra-machandran plot. Similar to other Gō-type models, the Hy-Gō model provides a transparent approach for elucidating the essential interactions necessary to accurately model kinetic properties and may be particularly useful for future investigations of kinetic protein experiments.
Analysis of an all-atom simulation of a heptapeptide of alanines revealed that the seemingly disordered ensemble of pathways between the helix and coil states does not preclude significant coupling of residue dynamics. The Hy-Gō models not only produced consistent kinetics by matching certain structural features of the underlying model, but also demonstrated rather stringent topological constraints on the network, due to the underlying physics of the model. These constraints supposedly resulted in simple relationships between the network topology and standard physical observables. Quite remarkably, the degree of randomness of paths traveling between the coil and helix states through particular intermediate states (as measured by the conditional path entropy) at a single reference temperature determines the relative cooperativity of the transition. Although further analysis is required to better understand this relationship, the prediction of structural changes upon environmental or chemical perturbations from the network topology of the unfolded ensemble may be an advantageous approach for understanding the behavior of intrinsically disordered proteins.
The relationships between structural and kinetic properties and between thermodynamics and network topology found in this work were tested on a large variety of Hy-Gō models with varying interaction parameters and also for three different parametrizations of the transferable CG PLUM model. While investigation into the validity of these particular relationships for distinct peptide systems remains for future work, the results presented here contain several general take home messages for the examination of CG kinetic properties. First, the physical details of a CG model provide constraints on the allowable kinetic properties for any particular parametrization. When the essential physics is accounted for, i.e., when the structural ensemble is reproduced with sufficient accuracy, simple relationships between structure and kinetics may emerge. These relationships may then be exploited to ensure kinetic consistency through the reproduction of structural properties. Therefore, our work motivates further investigation into the structure-kinetic relationships for CG models and suggests that the matching of particular structural properties over multiple temperatures (or other thermodynamic state points) may provide a general scheme for simultaneous inclusion of structural, thermodynamic, and kinetic consistency into CG models.
DATA
An online database consisting of various analysis scripts and input files for the coarse-grained simulations can be found at https://github.com/JFRudzinski/Scripts_and_models_for_Structure-kinetic-thermodynamic_relationships.git.
ACKNOWLEDGEMENTS
We thank Fabian Knoch and Raffaello Postetio for critical reading of the manuscript. We are also very grateful to Gerhard Stock for the use of the Ala7 AA trajectory. J.F.R. thanks Florian Sittel for many fruitful discussions and assistance with the density clustering software. J.F.R. also thanks Amedeo Caflisch and Andreas Vitalis for useful discussions concerning the Lifson-Roig models. J.F.R. is grateful to the organizers of the Hün-feld workshop for supplying a stimulating atmosphere for scientific discussions which made a positive impact to this work. This work was funded through a postdoctoral fellowship from the Alexander von Humboldt foundation (J.F.R.) and an Emmy Noether fellowship from the German Research Foundation (T.B.)
Footnotes
↵a) Electronic mail: rudzinski{at}mpip-mainz.mpg.de