ABSTRACT
A model which treats the denatured and native conformers of spontaneously-folding fixed two-state systems as being confined to harmonic Gibbs energy-wells has been developed. Within the assumptions of this model the Gibbs energy functions of the denatured (DSE) and the native state (NSE) ensembles are described by parabolas, with the mean length of the reaction coordinate (RC) being given by the temperature-invariant denaturant m value. Consequently, the ensemble-averaged position of the transition state ensemble (TSE) along the RC, and the ensemble-averaged Gibbs energy of the TSE are determined by the intersection of the DSE and the NSE-parabolas. The equations derived enable equilibrium stability and the rate constants to be rationalized in terms of the mean and the variance of the Gaussian distribution of the solvent accessible surface area of the conformers in the DSE and the NSE. The implications of this model for protein folding are discussed.
INTRODUCTION
Understanding the mechanism(s) by which denatured or nascent polypeptides under folding conditions spontaneously fold to their unique three-dimensional structures is one of the fundamental problems in biology. Although there has been tremendous progress since the ground-breaking discovery of Anfinsen, and various theories and models have been proposed for what has come to be known as the “Protein Folding Problem,” our understanding of the same is far from complete.1 The purpose of this paper is to address issues that are pertinent to the folding problem using a treatment that is analogous to that given by Marcus for electron transfer.2
FORMULATION OF THE HYPOTHESIS
Parabolic approximation
Consider the denatured state ensemble (DSE) of a spontaneously-folding fixed two-state folder at equilibrium under folding conditions wherein the variables such as temperature, pressure, pH, ionic strength etc. are defined and constant.3, 4 The solvent accessible surface area (SASA) and the Gibbs energy of each one of the conformers that comprise the DSE, and consequently, the mean SASA and Gibbs energy of the ensemble will be determined by a complex interplay of intra-protein and protein-solvent interactions (hydrogen bonds, van der Waals and electrostatic interactions, salt bridges etc.).5-8 At finite but constant temperature, the incessant transfer of momentum from the thermal motion of water causes the polypeptide to constantly drift from its mean SASA.9 As the chain expands, there is a favourable gain in chain entropy due to the increased backbone and side-chain conformational freedom, and a favourable gain in solvation enthalpy due to the increased solvation of the backbone and the side-chains; however, this is offset by the loss of favourable chain enthalpy that stems from the intra-protein backbone and the side-chain interactions, and the unfavourable decrease in solvent entropy, since more water molecules are now tied down by the relatively more exposed hydrophobic residues, hydrogen-bond donors and acceptors, and charged residues in the polypeptide. Conversely, as the chain attempts to become increasingly compact, there is a favourable gain in chain enthalpy due to an increase in the number of residual interactions, and a favourable increase in the solvent entropy due to the release of bound water molecules; however, this is opposed by the unfavourable decrease in both the backbone and the side-chain entropy (excluded volume entropy) and the enthalpy of desolvation.10, 11 Therefore, it is postulated that the restoring force experienced by each one of the conformers in the DSE would be proportional to their displacement from the mean SASA of the DSE along the SASA-reaction coordinate (SASA-RC), or F(xi) ∝ (xi – ) where xi is the SASA of the ith conformer in the DSE, is the mean SASA of the DSE, and F(xi) is the restoring force experienced by it. Consequently, the Gibbs energy of the conformer, G(xi), is proportional to the square of this displacement, or G(xi) ∝ - (xi – )2. If the totality of forces that resist expansion and compaction of the polypeptide chain are assumed to be equal, then to a first approximation the conformers in the DSE may be treated as being confined to a harmonic Gibbs energy-well with a defined force constant (Figure 1A). Once the Gibbs energies of the conformers are known, the probabilities of their occurrence within the ensemble at equilibrium can be readily ascertained using the Boltzmann distribution law (Figure 1B). We will come back to this later.
The native state ensemble (NSE) in solution may be treated in an analogous manner: Although the NSE is incredibly far more structurally homogeneous than the DSE, and is sometimes treated as being equivalent to a single state (i.e., the conformational entropy of the NSE is set to zero) for the purpose of estimating the difference in conformational entropy between the DSE and the NSE, the NSE by definition is an ensemble of structures.12-14 In fact, this thermal-noise-induced tendency to oscillate is so strong that native-folded proteins even when constrained by a crystal lattice can perform this motion.15 Thus, at finite temperature the NSE is defined by its mean SASA () and its ensemble-averaged Gibbs energy. As the native conformer attempts to become increasingly compact, its excluded volume entropy rises tremendously since most of the space in the protein core has already been occupied by the polypeptide backbone and the side-chains of the constituent amino acids.16 In contrast, any attempt by the polypeptide chain to expand and consequently expose more SASA is met with resistance by the multitude of interactions that keep the folded structure intact. Therefore, it is postulated that the restoring force would be proportional to the displacement of the native conformer from along the SASA-RC, or F(yi) ∝ (yi – ) ⇒ G(yi) ∝ (yi – )2 where yi is the SASA of the ith conformer in the NSE, F(yi) is the restoring force experienced by it, and G(yi) is the Gibbs energy of the native conformer. If the sum total of the forces that resist compaction and expansion of the native conformer, respectively, are assumed to be equal in magnitude, then the conformers in the NSE may be treated as being confined to a harmonic Gibbs energy-well with a defined force constant.
On the use of the denaturant mD-N value as a global reaction coordinate
The description of protein folding reactions in terms of reaction coordinates (RCs) and transition states is based on concepts borrowed from the covalent chemistry of small molecules. Because protein folding reactions are profoundly different from reactions in covalent chemistry owing to their non-covalent and multi-dimensional nature, it is often argued that their full complexity cannot be captured in sufficient detail by any single RC.17 Nevertheless, it is not uncommon to analyse the same using one-dimensional RCs, such as the native-likeness in the backbone configuration, the fraction of native pair-wise contacts (Qi) relative to the ground states DSE and NSE, the radius of gyration (Rg), SASA, Pfold etc.18, 19 The use of SASA as a global RC in the proposed hypothesis poses a problem since it is very difficult, if not impossible, to accurately and precisely determine the ensemble-averaged length of the RC (ΔSASAD-N) using structural and/or biophysical methods. Although the mean SASA of the NSE and its fluctuations may be obtained by applying computational methods to the available crystal or solution structures of proteins,20 such approaches are not readily applicable to the DSE.3 Although there has been considerable progress in modelling the SASA of the DSEs using simulations,21, 22 these methods have not been used here for one predominant reason: Unlike the NSE, the residual structure in the DSEs of most proteins can be very sensitive to minor changes in the primary sequence and solvent conditions, which may not be captured effectively by these theoretical methods. Therefore, the experimentally accessible mD-N has been used as a proxy for the true ΔSASAD-N.19
Postulates of the model
The Gibbs energy functions of the DSE and the NSE, denoted by GDSE(r)(T) and GNES(r)(T) respectively, have a square-law dependence on the RC, r, and are described by parabolas (Figure 2). The curvature of parabolas is given by their respective force constants, α and ω. As long as the primary sequence is not perturbed (via mutation, chemical or post-translational modification), and pressure and solvent conditions are constant, and the properties of the solvent are temperature-invariant (for example, no change in the pH due to the temperature-dependence of the pKa of the constituent buffer), the force constants α and ω are temperature-invariant (Figure 3), i.e., the conformers in the DSE and the NSE behave like linear-elastic springs. A corollary is that changes to the primary sequence, or change in solvent conditions (a change in pH, ionic strength, or addition of co-solvents) can bring about a change in either α or ω or both.
The vertices of the DSE and NSE-parabolas, denoted by GD(T) and GN(T), respectively, represent their ensemble-averaged Gibbs energies. Consequently, in a parabolic representation, the difference in Gibbs energy between the DSE and NSE at equilibrium is given by separation between GD(T) and GN(T) along the ordinate (ΔGD-N(T) = GD(T) –GN(T)). A decrease or an increase in ΔGD-N(T) relative to the standard state/wild type upon perturbation is synonymous with the net movement of the vertices of the parabolas towards each other or away from each other, respectively, along the ordinate (Figure 3). Thus, a decrease in ΔGD-N(T) can be due to a stabilized DSE or a destabilized NSE or both. Conversely, an increase in ΔGD-N(T) can be due to a destabilized DSE or a stabilized NSE or both.
The mean length of the RC is given by the separation between GD(T) and GN(T) along the abscissa, and is identical to the experimentally accessible mD-N (Figure 2C). For the folding reaction D ⇌ N, since the RC increases linearly from 0 → mD-N in the left-to-right direction, the vertex of the DSE-parabola is always at zero along the abscissa while that of the NSE-parabola is always at mD-N. An increase or decrease in ΔSASAD-N, relative to a reference state or the wild type, in accordance with the standard paradigm, will manifest as an increase or a decrease in mD-N, respectively.19 In a parabolic representation, an increase in mD-N is synonymous with the net movement of vertices of the DSE and NSE-parabolas away from each other along the abscissa. Conversely, a decrease in mD-N is synonymous with the net movement of the parabolas towards each other along the abscissa (Figure 4). As long as the primary sequence is not perturbed, and pressure and solvent conditions are constant, and the properties of the solvent are temperature-invariant, and are invariant with temperature, leading to ΔSASAD-N being temperature-independent; consequently, the mean length of the RC, mD-N, for a fixed two-state folder is also invariant with temperature. A corollary is that perturbations such as changes to the primary sequence via mutation, chemical or post-translational modification, change in pressure, pH, ionic strength, or addition of co-solvents can bring about a change in either , or , or both, leading to a change in ΔSASAD-N, and consequently, a change in mD-N. Because by postulate mD-N is invariant with temperature, a logical extension is that for a fixed two-state folder, the ensemble-averaged difference in heat capacity between DSE and the NSE (ΔCpD-N = CpD-N =CpD(T) − CpN(T)) must also be temperature-invariant since these two parameters are directly proportional to each other (see discussion on the temperature-invariance of ΔSASAD-N, mD-N and ΔpD-N).23, 24
The mean position of the transition state ensemble (TSE) along the RC, , and the ensemble-averaged Gibbs energy of the TSE (GTS(T)) are determined by the intersection of GDSE(r)(T) and GNSE(r)(T) functions. In a parabolic representation, the difference in SASA between the DSE and the TSE is given by the separation between GD(T) and the curvecrossing along the abscissa and is identical to mTS-D(T). Thus, if the mean SASA of the TSE is denoted by , then mTS-D(T) is a true proxy for – = ΔSASAD-TS(T) and is always greater than zero no matter what the temperature. Similarly, the difference in SASA between the TSE and the NSE is given by the separation between GN(T) and the curve-crossing along the abscissa and is identical to mTS-N(T), i.e., mTS-N(T) is a true proxy for – = ΔSASATS-N(T). However, unlike mTS-D(T) which is always greater than zero, mTS-N(T) can approach zero (when = ) and even become negative () at very low and high temperatures for certain proteins. The ensemble-averaged Gibbs activation energy for folding is given by the separation between GD(T) and the curve-crossing along the ordinate (ΔGTS-D(T) = GTS-(T) − GD(T)), and the ensemble-averaged Gibbs activation energy for unfolding is given by the separation between GN(T) and the curve-crossing along the ordinate (ΔGTS-N(T) = GTS(T) − GD(T)). The position of the curve-crossing along the abscissa and ordinate relative to the ground states is purely a function of the primary sequence when temperature, pressure and solvent conditions are defined. A corollary of this is that for any two-state folder, any perturbation that brings about a change in the curvature of the parabolas or the mean length of the RC can lead to a change in mTS-D(T) Because mD-N = mTS-D(T) + mTS for a two-state system, any perturbation that causes an increase in mTS-D(T) without a change mD-N will concomitantly lead to a decrease in mTS-N(T), and vice versa. Consequently, the normalized solvent RCs βT(fold)(T) = mTS-D(T)/mD-N and βT(unfold)(T) =mTS-N(T)/mD-N will also vary with the said perturbation.25
Thus, from the postulates of the parabolic hypothesis we have three fundamentally important equations for fixed two-state protein folders:
Consequently, for two-state proteins under folding conditions, as long as ΔGTS-N(T) > ΔGTS-D(T) (i.e., ΔGD-N(T) > 0 or ΔGN-D(T) < 0) and mTS-D(T) > mTS-N(T) we have the logical condition ω > α (Figure 2C).
Expression for the mean position of the TSE
Consider the conventional barrier-limited interconversion of the conformers in the DSE and NSE of a two-state folder at any given temperature, pressure and solvent conditions (Figure 2C). Because by postulate the Gibbs energy functions GDSE(r)(T) and GNSE(r)(T) have a square-law dependence on the RC, r, whose ensemble-averaged length is given by mD-N, and since the RC increases linearly from 0 → mD-N in the left to right direction, we can write
If the units of the ordinate are in kcal.mol-1 and the RC in kcal.mol-1.M-1, then by definition the force constants α and ω have the units M2.mol.kcal-1. The mean position of the TSE along the abscissa is determined by the intersection of GDSE(r)(T) and GNES(r)(T). Therefore, at the curve-crossing we have
Solving for rgives (see Appendix) where the discriminant φ = λω+ΔGD-N(T) (ω–α), and the parameter λ =α(mD-N)2 is analogous to the “Marcus reorganization energy,” and by definition is the Gibbs energy required to compress the denatured polypeptide under folding conditions to a state whose SASA is identical to that of the native folded protein but without the stabilizing native interactions (Figure 5). Since α and mD-N are by postulate temperature-invariant, λ is temperature-invariant by extension and depends purely on the primary sequence for a given pressure and solvent conditions. Since mD-N = mTS-D(T) + mTS-N(T) for a two-state system, we have
If the values of the force constants α and ω, mD-N and ΔGD-N(T) of a two-state system at any given temperature, pressure and solvent conditions are known, we can readily calculate the absolute Gibbs activation energies for the folding and unfolding (Eqs. (1) and (2)).
Equations for the folding and the unfolding rate constants
The two theories that feature prominently in the analyses of protein folding kinetics are the transition state theory (TST) and the Kramers’ theory under high friction limit.26-28 Despite their profound differences what is common to both is the exponential term or the Boltzmann factor. Therefore, we will start with the conventional Arrhenius expression for the rate constants (the complexity of the prefactor which here is assumed to be temperature-invariant is addressed elsewhere). Substituting Eqs. (1) and (9), and (2) and (10) in the expressions for the rate constants for folding (kf(T)) and unfolding (ku(T)), respectively, gives where k0 is the pre-exponential factor with units identical to those of the rate constants (s-1). Because the principle of microscopic reversibility stipulates that for a two-state system the ratio of the folding and unfolding rate constants must be identical to the independently measured equilibrium constant, the prefactors in Eqs. (11) and (12) must be identical.29 Eqs. (11) and (12) may further be recast in terms of βT(fold)(T) and βT(unfold)(T) to give
Eqs. (11) – (14) at once demonstrate that the relationship between the rate constants, the equilibrium stability, and the denaturant m value is incredibly complex since the parameters in the said equations can all change depending on the nature of the perturbation and will be explored in detail elsewhere.
The force constants are inversely proportional to the variances of the Gaussian distribution of the conformers
If GDi(T) and GNj(T) (i, j = 1 …..n) denote the Gibbs energies of the conformers in the DSE and the NSE, respectively, then the probability distribution of their conformers along the RC at equilibrium is given by the Boltzmann law. Because by postulate the Gibbs energies of the conformers in the DSE and the NSE have a square-law dependence on the RC, r, whose ensemble-averaged length is given by mD-N, and because the RC increases linearly from 0 → mD-N in the left-to-right direction, we can write where pDi(T) and pNj(T) denote the Boltzmann probabilities of the conformers in the DSE and the NSE, respectively, with their corresponding partition functions QDSE(T) and QNSE(T) being given by
Because the equilibrium is dynamic, there is always a constant thermal noise-driven flux of the conformers from the DSE to the NSE, and from the NSE to the DSE, via the TSE. Consequently, there is always a constant albeit incredibly small population of conformers in the TSE at equilibrium. Now consider the first-half of a protein folding reaction as shown in Scheme 1, where [D], [TS] and [N] denote the equilibrium concentrations of the DSE, the TSE, and the NSE, respectively, in molar.
From the perspective of a folding reaction, the conformers in the activated state or the TSE may be thought of as a subset of denatured conformers with very high Gibbs energies. Therefore, we may assume that the conformers in the TSE are in equilibrium with those conformers that are at the bottom of the denatured Gibbs basin. If GD(T), GN(T) and GTS(T) denote the mean Gibbs energies of the DSE, NSE, and the TSE, respectively, then the ratio of the molar concentration of the conformers at the bottom of the denatured Gibbs basin and those in the TSE is given by
Similarly for the partial unfolding reaction (Reaction Scheme 2), the conformers in the TSE may be thought of as a subset of native conformers with very high Gibbs energies. Therefore, we may write
Because the SASA of the conformers in the DSE or the NSE is determined by a multitude of intra-protein and protein-solvent interactions, we may invoke the central limit theorem and assume that the distribution of the SASA of the conformers is a Gaussian. If σ2DSE(T) and σ2NSE(T) denote the variances of the DSE and the NSE-Gaussian probability density functions (Gaussian-PDFs), respectively, along the SASA-RC which in our case is its proxy, the experimentally measurable and temperature-invariant mD-N, and , , and denote the mean SASAs of the DSE, the NSE, and the TSE, respectively, then the ratio of the molar concentration of the conformers whose SASA is identical to the mean SASA of the DSE to those whose SASA is identical to the mean SASA of the TSE is given by
Similarly, the ratio of the molar concentration of the conformers whose SASA is identical to the mean SASA of the TSE to those whose SASA is identical to the mean SASA of the NSE is given by
Because the ratio of the conformers in the TSE to those in the ground states must be the same whether we use a Gaussian approximation or the Boltzmann distribution (compare Eqs. (19) and (21), and Eqs. (20) and (22)), we can write
Thus, for any two state folder at constant temperature, pressure, and solvent conditions, the variance of the Gaussian distribution of the conformers in the DSE or the NSE along the mD-N RC is inversely proportional to their respective force constants; and for a two-state system with given force constants, the variance is directly proportional to the absolute temperature. Naturally, in the absence of thermal energy (T = 0 K), all classical motion will cease and σ2DSE(T) = σ2NES(T) = 0. The relationship between protein motion and function will be explored elsewhere. The area enclosed by the DSE and the NSE-Gaussians is given by where x = (xi – ), y = (yi – ), xi and yi denote the SASAs of the ith conformers in the DSE and the NSE, respectively, a = 1/2σ2DSE(T), b = 1/2σ2NSE(T), and IDSE(T) and INSE(T) denote the areas enclosed by the DSE and NSE-Gaussians, respectively, along the mD-N RC. The reader will note that for a polypeptide of finite length, the maximum permissible SASA is determined by the fully extended chain and the minimum by the excluded volume entropy. Thus, the use of the limits -∞ to +∞ in Eqs. (17), (18), (25) and (26) is not physically justified. However, because the populations decrease exponentially as the conformers in both the DSE and the NSE are displaced from their mean SASA, the difference in the magnitude of the partition functions calculated using actual limits versus –∞ and +∞ will be insignificant. Eqs. (23) and (24) allow kf(T) and ku(T) to be recast in terms of the variances of the DSE and the NSE-Gaussians
We will show elsewhere when we deal with non-Arrhenius kinetics in protein folding in detail that although the variance of the DSE and the NSE-Gaussians increases linearly with absolute temperature, the curve-crossing and the Gibbs barrier heights for folding and unfolding are non-linear functions of their respective variances.
Equations for equilibrium stability
The relationship between the partition functions, the area enclosed by DSE and the NSE Gaussians, and the Gibbs energy of unfolding may be readily obtained by dividing Eq. (18) by (17) where σDSE(T) and σNSE(T) denote the standard deviations of the DSE and NSE-Gaussians, respectively, along the mD-N RC. There are many other ways of recasting the equation for equilibrium stability (not shown), but the simplest and perhaps the most useful form is
Eq. (31) demonstrates that when pressure, temperature, and solvent conditions are constant, the equilibrium and kinetic behaviour of those proteins that fold spontaneously without the need for any accessory factors is determined purely by three primary-sequence-dependent variables which are: (i) the ensemble-averaged mean and variance of the Gaussian distribution of the conformers in the DSE along SASA-reaction-coordinate; (ii) the ensemble-averaged mean and variance of the Gaussian distribution of the conformers in the NSE along the SASA-reaction-coordinate; and (iii) the position of the curve-crossing along the abscissa. A necessary consequence of Eq. (31) is that: (i) if for spontaneously-folding fixed two-state systems at constant pressure and solvent conditions ΔSASAD-N is positive and temperature-invariant (i.e., mD-N and ΔCpD-N are temperature-invariant), and βT((fold)(T) ≤ 0.5 when T = TS (the temperature at which stability is a maximum),30 then it is impossible for such systems to be stable at equilibrium (ΔCpD-N > 0) unless σ2DES(T) > σ2NSE(T) no matter what the temperature; (ii) if two related or unrelated two-state systems have identical pair of force constants, and if their ΔSASAD-N as well as the absolute position of the DSE and the NSE along the SASA-RC are also identical, then the protein which folds through a more solvent-exposed TSE will be more stable at equilibrium; and (iii) if mD-N and ΔCpD-N are temperature-invariant, a spontaneously-folding two-state system at constant pressure and solvent conditions, irrespective of its primary sequence or 3-dimensional structure, will be maximally stable at equilibrium when its denatured conformers are displaced the least from the mean of their ensemble to reach the TSE along the SASA-RC (the principle of least displacement). Because equilibrium stability is the greatest at TS, a logical extension is that mTS-D(T) or βT(fold)(T) must be a minimum, and mTS-N(T) or βT(unfold)(T) a maximum at TS (Figure 3). A corollary is that the Gibbs activation barriers for folding and unfolding are a minimum and a maximum, respectively, when the difference in SASA between the DSE and the TSE is the least. Mathematical formalism for why the activation entropies for folding and unfolding must both be zero at TS will be shown in the subsequent publication.
The correspondence between Gibbs parabolas and Gaussian-PDFs for two well-studied two-state proteins: (i) (CI2); and (ii) the B domain of staphylococcal protein A (BdpA Y15W) are shown in Figures 6 and 7, respectively. The parameters required to generate these figures are given in their legends. As mentioned earlier, the logical condition that as long as ΔGD-N(T) > 0 and mTS-D(T) > mTS-N(T) then ω > α is readily apparent from Figures 6A and 7A. Because the Gaussian variances of the DSE and the NSE are inversely proportional to the force constants, ω > α implies σ2NSE(T) < σ2DES(T) (Figures 6B and 7B). A detailed discussion of the theory underlying the procedure required to extract the values of the force constants from the chevrons and its inherent limitations is beyond the scope of this article since it involves a radical reinterpretation of the chevron. Nevertheless, a brief description is given in methods.
On the temperature-invariance of ΔSASAD-N, mD-N and ΔCpD-N
One of the defining postulates of the parabolic hypothesis is that for a spontaneously-folding fixed two-state folder, as long as the primary sequence is not perturbed via mutation, chemical or post-translational modification, and pressure and solvent conditions are constant, and the properties of the solvent are invariant with temperature, the ensemble-averaged SASAs of the DSE and NSE, to a first approximation, are temperature-invariant; consequently, the dependent variables mD-N and ΔCpD-N will also be temperature-invariant. Consider the DSE of a two-state folder at equilibrium under folding conditions: Within the steric and energetic constraints imposed by intra-chain and chain-solvent interactions, the SASA of the denatured conformers will be normally distributed with a defined mean () and variance (σ2DSE(T)). Now, if we raise the temperature of the system by tiny amount δT such that the new temperature is T+δT, a tiny fraction of the conformers will be displaced from the mean of the ensemble, some with SASA that is greater than the mean, and some with SASA that is less than the mean; and the magnitude of this displacement from the ensemble-mean will be determined by the force constant. Consequently, there will be a tiny increase in the variance of the Gaussian distribution, and a new equilibrium will be established. Thus, as long as the integrity of the spring (i.e., the primary sequence) is not compromised, and pressure and solvent conditions are constant, the distribution itself will not be biased in any one particular way or another, i.e., the number of conformers that have become more expanded than the mean of the ensemble, on average, will be identical to the number of conformers that have become more compact than the ensemble-mean, leading to being invariant with temperature. A similar argument may be applied to the NSE leading to the conclusion that although its variance increases linearly with temperature, its mean SASA () will be temperature-invariant. However, if the molecular forces that resist expansion and compaction of the conformers in the DSE are not equal or approximately equal and change with temperature, then the assumption that the conformers in the DSE are confined to a harmonic Gibbs energy-well would be flawed. What is implied by this is the distribution of the conformers in the DSE along the SASA-RC is no longer a Gaussian, but instead a skewed Gaussian. For example, if the change in temperature causes a shift in the balance of molecular forces such that it is relatively easier for the denatured conformer to expand rather than become compact, in a parabolic representation, the left arm of the DSE-parabola will be shallow as compared to the right arm, and the Gaussian distribution will be negatively skewed, leading to a shift in the mean of the distribution to the left. In other words, will increase, and assuming that is temperature-invariant, will lead to an increase in ΔSASAD-N, and by extension, an increase in mD-N. In contrast, if the change in temperature makes it easier for the denatured conformer to become compact rather than expand, then the right arm of the DSE-parabola will become shallow as compared to the left arm, and the Gaussian distribution will become positively skewed; consequently, will decrease leading to a decrease in ΔSASAD-N and mD-N. Similar arguments apply to the NSE. Thus, as long as the Gibbs energy-wells are harmonic and their force constants are temperature-invariant, ΔSASAD-N, mD-N, and ΔCpD-N will be temperature-invariant. The approximation that the mean length of the RC is invariant with temperature is supported by both theory and experiment: (i) The Rg of the DSE and the NSE (after 200 pico seconds of simulation) of the truncated CI2 generated from all atom molecular dynamic simulations (MD simulations) varies little between 300 – 350 K(see Table 1 and explanation in page 214 in Lazaridis and Karplus, 1999);31 (ii) Studies on the thermal expansion of native metmyoglobin by Petsko and colleagues demonstrate that the increase in the SASA and the volume of the folded protein on heating from 80 – 300 K is not more than 2 – 3%;32 (iii) In chemical denaturation experiments as a function of temperature, mD-N is, in general, temperature-invariant within experimental error.33-36 In addition, it is logically inconsistent to argue about possible temperature-induced changes in mD-N when its counterpart, ΔCpD-N, is assumed to be temperature-invariant in the analyses of thermal denaturation data.30
The widely accepted explanation for the large and positive ΔCpD-N of proteins is based on Kauzmann’s “liquid-liquid transfer” model (LLTM) which likens the hydrophobic core of the native folded protein to a liquid alkane, and the greater heat capacity of the DSE as compared to the NSE is attributed primarily to the anomalously high heat capacity and low entropy of the “clathrates” or “microscopic icebergs” of water that form around the exposed non-polar residues in the DSE (see Baldwin, 2014, and references therein).37, 38 Because the size of the solvation shell depends on the SASA of the non-polar solute, it naturally follows that the change in the heat capacity must be proportional to the change in the non-polar SASA that accompanies a reaction. Consequently, protein unfolding reactions which are accompanied by large changes in non-polar SASA, also lead to large and positive changes in the heat capacity.39, 40 Because the denaturant m values are also directly proportional to the change in SASA that accompany protein (un)folding reactions, the expectation is that mD-N and ΔCpD-N values must also be proportional to each other: The greater the mD-N value, the greater is the ΔCpD-N value and vice versa.23, 24 However, since the residual structure in the DSEs of proteins under folding conditions is both sequence and solvent-dependent (i.e., the SASAs of the DSEs two proteins of identical chain lengths but dissimilar primary sequences need not necessarily be the same even under identical solvent conditions),3, 4 and because we do not yet have reliable theoretical or experimental methods to accurately and precisely quantify the SASA of the DSEs of proteins under folding conditions (the values are model-dependent),21, 22 the data scatter in plots that show correlation between the experimentally determined mD-N or ΔCpD-N values (which reflect the true ΔSASAD-N) and the theoretical model-dependent values of ΔSASAD-N can be significant (see Fig. 2 in Myers et al., 1995, and Fig. 3 in Robertson and Murphy, 1997). Now, since the solvation shell around the DSEs of large proteins is relatively greater than that of small proteins even when the residual structure in the DSEs under folding conditions is taken into consideration, large proteins on average expose relatively greater amount of non-polar SASA upon unfolding than do small proteins; consequently, both mD-N and ΔCpD-N values also correlate linearly with chain-length, albeit with considerable scatter since chain length, owing to the residual structure in the DSEs, is unlikely to be a true descriptor of the SASA of the DSEs of proteins under folding conditions (note that the scatter can also be due to certain proteins having anomalously high or low number of non-polar residues). The point we are trying to make is the following: Because the native structures of proteins are relatively insensitive to small variations in pH and co-solvents,41-43 and since the number of ways in which foldable polypeptides can be packed into their native structures is relatively limited (as inferred from the limited number of protein folds, see SCOP: www.mrc-lmb.cam.ac.uk and CATH: www.cathdb.info databases), one might find a reasonably good correlation between chain lengths and the SASAs of the NSEs for a large dataset of proteins of differing primary sequences under varying solvents (see Fig. 1 in Miller et al., 1987).16, 44 However, since the SASAs of the DSEs under folding conditions, owing to residual structure are variable, until and unless we find a way to accurately simulate the DSEs of proteins, and if and only if these theoretical methods are sensitive to point mutations, changes in pH, co-solvents, neutral crowding agents, temperature and pressure, it is almost impossible to arrive at a universal equation that will describe how the ΔSASAD-N under folding conditions will vary with chain length, and by logical extension, how mD-N and ΔCpD-N will vary with SASA or chain length. Analyses of ΔCpD-N values for a large dataset of proteins show that they generally vary between 10-20 cal.mol-1.K-1.residue-1.23, 24
Now that we have summarised the inter-relationships between ΔSASAD-N, mD-N, and ΔCpD-N, it is easy to see that when ΔSASAD-N is temperature-invariant, so too must ΔCpD-N, i.e., the absolute heat capacities of the DSE and the NSE may vary with temperature, but their difference, to a first approximation, can be assumed to be temperature-invariant. The reasons for this approximation are as follows: (i) the variation in ΔCpD-N(T) over a substantial temperature range is comparable to experimental noise;39 and (ii) the variation in equilibrium stability that stems from small variation in ΔCpD-N(T) is once again comparable to experimental noise.30 Consequently, the use of modified Gibbs-Helmholtz relationships with a temperature-invariant ΔCpD-N term is a common practice in the field of protein folding, and is used to ascertain the temperature-dependence of the enthalpies, the entropies, and the Gibbs energies of unfolding/folding at equilibrium (Eqs. (32) – (34)). However, what is not justified is the use of experimentally determined ΔCpD-N of the “wild type/reference protein” for all its mutants for the purpose of calculating the change in enthalpies, entropies and the Gibbs energies of unfolding upon mutation (i.e., ΔΔHD-N(wt-mut)(T), ΔΔSD-N(wt-mut)(T), and ΔΔGD-N(wt-mut)(T); the subscripts ‘wt’ and ‘mut’ denote the wild type and the mutant protein, respectively). This is especially true if the mD-N values of the mutants are significantly different from that of the wild type, since those mutants with increased mD-N values will be expected to have increased ΔCpD-N values, and vice versa, for identical solvent conditions and pressure, as compared to the wild type or the reference protein. These considerations are implicit in the Schellman approximation: ΔΔGD-N(wt-mut)(Tm) ≈ ΔHD-N(wt-)(Tm) ΔTm(wt-mut)/Tm(wt-mut) (see Fig. 8 in Becktel and Schellman, 1987, and discussion therein). where ΔHD-N(T), ΔHD-N(Tm) and ΔSD-N(T), ΔSD-N(Tm) denote the equilibrium enthalpies and the entropies of unfolding, respectively, at any given temperature, and at the midpoint of thermal denaturation (Tm), respectively, for a given two-state folder under defined solvent conditions. The temperature-invariant and the temperature-dependent difference in heat capacity between the DSE and NSE is denoted by ΔCpD-N and ΔCpD-N(T), respectively.
TESTS OF HYPOTHESIS
A logical way of testing hypotheses in empirical sciences is to make quantitative predictions and verify them via experiment.45 The greater the number of predictions, and the more risky they are, the more testable is the hypothesis and vice versa; and the greater is the agreement between theoretical prediction and experiment in such tests of hypothesis, the more certain are we of its veracity. Naturally, any hypothesis that insulates itself from “falsifiability, or refutability, or testability,” is either pseudoscience or pathological science.45, 46 The theory described here readily lends itself to falsifiability because it makes certain quantitative predictions which can be immediately verified via experiment.
The variation in mTS-D(T) and mTS-N(T) with mD-N
A general observation in two-state protein folding is that whenever mutations or a change in solvent conditions cause statistically significant changes in the mD-N value, a large fraction or almost all of this change is manifest as a variation in mTS-D(T), with little or almost no change in mTS-N(T) (see Figs. 7 and 9 in Sanchez and Kiefhaber, 2003).19 Although these effects were analysed using self-interaction and cross-interaction parameters,19 the question is “Why must perturbation-induced changes in mD-N predominantly manifest as changes in mTS-D(T)?” Is there any theoretical basis for this empirical observation? Importantly, can we predict how mTS-D(T) varies as a function of mD-N for any given two-state folder of a given equilibrium stability when temperature, pressure and solvent are constant? To simulate the behaviour two hypothetical two-state systems one with force constants α = 1 and ω = 10 M2.mol.kcal-1 (Figure 8A), and the other with α = 1 and ω = 100 M2.mol.kcal-1 (Figure 8B) were chosen. Within each one of these pair of parent two-state systems are six sub-systems with the same pair of force constants as the parent system but with a unique and constant ΔGD-N(T). We now ask how the curve-crossings for each of these systems change when the separation between the vertices of DSE and NSE-parabolas along their abscissae are allowed to vary (i.e., a change in mD-N as in Figure 4). Simply put, what we are doing is taking a pair of intersecting parabolas of differing curvature such that ω > α and systematically varying the separation between their vertices along the abscissa and ordinate, and calculating the position of the curve-crossing along the abscissa for each case according to Eqs. (9) and (10). Despite the model being very simplistic (because the curvature of the parabolas can change with structural or solvent perturbation), the simulated behaviour is strikingly similar to that of 1064 proteins from 31 two-state systems: A perturbation-induced change in mD-N is predominantly manifest as a change in mTS-D(T) with little or no change in mTS-N(T) (Figure 9 and Figure 9-figure supplement 1). Although the apparent position of the TSE along the RC as measured by βT(fold)(T) changes, the absolute position of the TSE along the RC may not change significantly, and this effect can be particularly pronounced for systems with high βT(fold)(T) or late TSEs (Figure 8B). This ability to simulate the behaviour of real systems serves as the first test of the hypothesis.
Non-Arrhenius kinetics
Unlike the temperature-dependence of the rate coefficients of most chemical reactions of small molecules, protein folding reactions are characterised by non-Arrhenius kinetics, i.e., at constant pressure and solvent conditions, kf(T) initially increases with an increase in temperature and reaches a plateau; and any further increase in temperature beyond this point causes kf(T) to decrease. This anomalous non-linear temperature-dependence of kf(T) has been observed in both experiment and computer simulations.17, 36, 47-56 Two predominant explanations have been given for this behaviour: (i) non-linear temperature-dependence of the prefactor on rugged energy landscapes;17 and (ii) the heat capacities of activation, ΔCpD-TS(T) and ΔCpTS-N(T) which in turn lead to temperature-dependent enthalpies and entropies of activation for folding and unfolding.36, 48, 50, 51 Arguably one of the most important and experimentally verifiable predictions of the parabolic hypothesis is that “as long as the enthalpies and the entropies of unfolding/folding at equilibrium display a large variation with temperature, and as a consequence, equilibrium stability is a non-linear function of temperature, both kf(T) and ku(T) will have a non-linear dependence on temperature.” The equations that describe the temperature-dependence of kf(T) and ku(T) of two-state systems under constant pressure and solvent conditions may be readily derived by substituting Eq. (34) in (11) and (12).
Thus, if the parameters ΔHD-N(Tm), Tm, ΔCpD-N, mD-N, the force constants a and ω, and k0 (assumed to be temperature-invariant) are known for any given two-state system, the temperature-dependence of kf(T) and ku(T) may be readily ascertained. Why does the prediction of non-Arrhenius kinetics constitute a rigorous test of the parabolic hypothesis (see confirming evidence, Popper, 1953)? As is readily apparent, the values of the constants and variables in Eqs. (35) and (36) come from two different sources: While the values of a and ω, k0, and mD-N are extracted from the chevron, i.e., from the variation in kf(T) and ku(T) with denaturant at constant temperature, pressure and solvent conditions (i.e., all solvent variables excluding the denaturant are constant), ΔHD-N(Tm) and Tm are determined from thermal denaturation at constant pressure and identical buffer conditions as above but without the denaturant, using either calorimetry or van’t Hoff analysis of a sigmoidal thermal denaturation curve, obtained for instance by monitoring the change in a suitable spectroscopic signal with temperature (typically CD 217 nm for β-sheet proteins, CD 222 nm for α-helical proteins or CD 280 nm to monitor tertiary structure).30, 57 The final parameter, ΔCpD-N, is once again determined independently (i.e., the slope of a plot of modelindependent calorimetric ΔHD-N(Tm)(cal) versus Tm, see Fig. 4 in Privalov, 1989). What this essentially implies is that if Eqs. (35) and (36) predict a non-linear temperature-dependence of kf(T) and ku(T), and importantly, if their absolute values agree reasonably well with experimental data, then the success of such a prediction cannot be fortuitous since it is statistically improbable for these parameters obtained from fundamentally different kinds of experiments to collude and yield the right values. We are then left with the alternative that at least to a first approximation, the hypothesis is valid.
The predictions of Eqs. (35) and (36) are shown for three well-studied two-state folders: (i) BdpA Y15W, the 60-residue three-helix B domain of Staphylococcal protein A (Figure 10 and its figure supplements);58 (ii) BBL H142W, the 47-residue all-helical member of the Peripheral-subunit-binding-domain family (Figure 11 and its figure supplements);59 and (iii) FBP28 WW, the 37-residue Formin-binding three-stranded β-sheet WW domain (Figure 12 and its figure supplements).60 Inspection of Figures 10A, 11A and 12A (see also Figure supplement 2A for each of these figures) shows that Eq. (35) makes a remarkable prediction that kf(T) has a non-linear dependence on temperature. Starting from a low temperature, kf(T) initially increases with an increase in temperature and reaches a maximal value at T = TH(TS-D) where ∂ ln kf(T)/∂T = ΔHTS-D(T)/RT2 = 0 ⇒ΔHTS-D(T) = 0; and any further increase in temperature beyond this point will cause a decrease in ku(T) The reader will note that the partial derivatives are purely to indicate that these relationships hold if and only if the pressure, solvent variables and the prefactor are constant.
In contrast, inspection of Figures 10B, 11B and 12B (see also Figure supplement 2B for each of these figures) shows that ku(T) starting from a low temperature, decreases with a rise in temperature and reaches a minimum at T = TH(TS-N) where ∂ ln ku(T)/∂T = ΔHTS-N(T)/ RT2 = 0 ⇒ ΔHTS-N(T) = 0; and any further increase in temperature beyond this point will cause an increase in ku(T) This behaviour which is dictated by Eq. (36) at once provides an explanation for the origin of a misconception: It is sometimes stated that non-Arrhenius kinetics in protein folding is limited to kf(T) while ku(T) usually follows Arrhenius-like kinetics.36, 49, 53, 61 It is readily apparent from these figures that if the experimental range of temperature over which the variation in ku(T) investigated is small, Arrhenius plots can appear to be linear (see Fig. 5A in Tan et al., 1996, Fig. 3 in Schindler and Schmid, 1996, and Fig. 6c in Jacob et al., 1999).49, 53, 62 In fact, even if the temperature range is substantial, but owing to technical difficulties associated with measuring the unfolding rate constants below the freezing point of water, the range is restricted to temperatures above 273.16 K, ku(T) can still appear to be have a linear dependence on temperature in an Arrhenius plot since the curvature of the limbs in Figures 10B, 11B and 12B is rather small. This can especially be the case if the number of experimental data points that define the Arrhenius plot is sparse. Consequently, the temperature-dependence of ku(T) can be fit equally well within statistical error to a linear function, and is apparent from inspection of the temperature-dependence of ku(T) of CI2 protein (see Fig. 4 in Tan et al., 1996).62 Because TH(TS-N) ≪ 273.16 K for psychrophilic and mesophilic proteins, it is technically demanding, if not impossible, to experimentally demonstrate the increase in ku(T) for T < TH(TS-N) for the same. Nevertheless, the levelling-off of ku(T) at lower temperatures (see Fig. 3 in Schindler and Schmid, 1996), and extrapolation of data using non-linear fits (see Fig. 6B in Main et al., 1999) indicates this trend.49, 63 In principle, it may be possible to experimentally demonstrate this behaviour for those proteins whose TH(TS-N) is significantly above the freezing point of water. It is interesting to note that lattice models consisting of hydrophobic and polar residues (HP+ model) also capture this behaviour (see Fig. 22B in Chan and Dill, 1998).50 As mentioned earlier, the cause of non-Arrhenius behaviour is a matter of some debate. However, because we have assumed a temperature-invariant prefactor and yet find that the kinetics are non-Arrhenius, it essentially implies that one does not need to invoke a super-Arrhenius temperature-dependence of the configurational diffusion constant to explain the non-Arrhenius behaviour of proteins.17, 36, 48, 50, 55
Once the temperature-dependence of kf(T), and ku(T) across a wide temperature range is known, the variation in the observed or the relaxation rate constant (kobs(T)) with temperature may be readily ascertained using (see Appendix)64
Inspection of Figure supplement-1 of Figures 10, 11 and 12 demonstrates that ln(kobs(T)) vs temperature is a smooth “W-shaped” curve, with kobs(T) being dominated by kf(T) around TH(TS-N), and by ku(T) for T < Tc and T > Tm, which is precisely why the kinks in ln(kobs(T)) occur around these temperatures. It is easy to see that at Tc or Tm, kf(T) = ku(T) ⇒ kobs(T) = 2kf(T), = 2ku(T), and ΔGD-N(T) = RT ln (kf(T)/ku(T)) = 0. In other words, for a two-state system, Tc and Tm measured at equilibrium must be identical to the temperatures at which kf(T) and ku(T) intersect.
This is a consequence of the principle of microscopic reversibility, i.e., the equilibrium and kinetic stabilities must be identical for a two-state system at all temperatures.29
Although a detailed discussion is beyond the scope of this article, the phenomenal increase in ku(T) and kobs(T) for T < Tc and T > Tm is due to the ΔGTS-N(T) approaching zero (barrierless unfolding) at very low and high temperatures. Consequently, the unfolding rate constants approach their physical limit which is k0; and any further decrease or an increase in temperature in the very low and high temperature regimes, respectively, must lead to a decrease in ku(T) (Marcus-inversion). This is readily apparent for FBP28 WW (Figures 12B, Figure 12-figure supplement 1B and 2B). To summarise, for any fixed two-state folder, unfolding is conventional barrier-limited around T = TS and the position of the curvecrossing occurs in between the vertices of the DSE and the NSE parabolas. As the temperature deviates from TS, ΔGTS-N(T) decreases and eventually becomes zero at which point the curve-crossing occurs at the vertex of the NSE-parabola (i.e., the right arm of the DSE-parabola intersects the vertex of the NSE-parabola); and any further decrease or an increase in temperature in the very low and high temperature regimes, respectively, will cause unfolding to once again become barrier-limited with the curve-crossing occurring to the right of the vertex of the NSE-parabola (i.e., the right arm of the DSE-parabola intersects the right arm of the NSE-parabola). Interestingly, in contrast to unfolding which can become barrierless at certain high and low temperature, folding is always barrier-limited with the absolute minimum of ΔGTS-D(T) occurring when T = TS; and any deviation in the temperature from TS will only lead to an increase in ΔGTS-D(T). Thus, from the perspective of parabolic hypothesis “if folding is barrier-limited at TS, then a two-state system at constant pressure and solvent conditions cannot spontaneously fold in a downhill manner, no matter what the temperature, and irrespective of whether or not it is an ultrafast folder” A corollary is that if there exists a chevron with a well-defined linear folding arm at TS, then the prohibitive rule is that a two-state system at constant pressure and solvent conditions cannot spontaneously (i.e., unaided by co-solvents, ligands, metal ions etc.) fold by a downhill mechanism no matter what the temperature (see Popper, 1953, on why “prohibition” is as important as “confirming evidence” to any scientific method of inquiry).
In other words, although the parabolic hypothesis predicts that barrierless and Marcus-inverted regimes for folding can occur, especially when mD-N is very small (Figure 13), the existence of a chevron with a well-defined linear folding arm at TS is sufficient to conclusively rule out such a scenario. It is imperative for the reader to take note of two aspects: First, the downhill folding scenario that is being referred to here is not the one wherein the denatured conformers fold to their native states via a first-order process with kf(T) ≅ k0 (manifest when ΔGTS-D(T) is approximately equal to ambient thermal noise, i.e., ΔGTS-D(T) ≅ 3RT), but the controversial Type 0 scenario according to the Energy Landscape Theory, (see Fig. 6 in Onuchic et al., 1997) wherein the conformers in the DSE ostensibly reach the NSE without encountering any barrier (ΔGTS-D(T) = 0).65-69 Second, the theoretical impossibility of a Type 0 scenario as claimed by the parabolic hypothesis comes with a condition and applies only for proteins that have linear folding chevron arms at TS, and their folding proceeds without the need for accessory factors (metal ions for example) that are extrinsic with respect to the polypeptide chain. In other words, we are not outright ruling out a Type 0 scenario, since this could occur under certain conditions. However, what is being ruled out is that the proteins BBL and lambda repressor (λ6-85) which have been touted to be paradigms for Type 0 scenario are most certainly not.66, 70, 71 Further discussion on downhill scenarios is beyond the scope of this article and will be addressed elsewhere.
Comparison of the data shown in Figures 10-12 and their figure supplements leads to an important conclusion: Just as sigmoidal changes in spectroscopic signals upon equilibrium chemical or thermal denaturation, Gaussian-like thermograms in differential scanning calorimetry (i.e., plots of change in partial heat capacity vs temperature), and the classic V-shaped chevrons (plots of kobs(T) vs chemical denaturant) are a characteristic feature of fixed two-state folders, so too must be the features of the temperature-dependences of kobs(T), kf(T) and ku(T). Although it might appear farfetched to arrive at this general conclusion merely from data on three proteins, the irrevocable requirement that kf(T) and ku(T) must approach each other as T → Tc or Tm, and that ku(T) must dominate kf(T) for T < Tc and T > Tm stems from the principle of microscopic reversibility, which unlike empirical laws, is grounded in statistical mechanics.29 Consequently, the expectation is that ln(ku(T)) will have an approximate “V- or U-shape” and ln(kobs(T)) will have an approximate “W-shape” with respect to temperature (see Fig. 3 in Mayor et al., 2000, and Fig. 2 in Ghosh et al., 2007).55, 72 Further, since the large variation in equilibrium enthalpies and entropies of unfolding, including the pronounced curvature in ΔGD-N(T) of proteins with temperature is due to the large and positive ΔCpD-N, a corollary is that “non-Arrhenius kinetics can be particularly acute for reactions that are accompanied by large changes in the heat capacity.” Because the change in heat capacity is proportional to the change in SASA, and since the change in SASA upon unfolding/folding increases with chain-length, “non-Arrhenius kinetics, in general, can be particularly pronounced for large proteins, as compared to very small proteins and peptides.” Now, Fersht and co-workers, by comparing the non-Arrhenius behaviour of the two-state-folding CI2 and the three-state-folding barnase argued that the pronounced curvature of ln(kf(T)/T) of barnase as compared to CI2 in Eyring plots is a consequence of barnase folding via three-state kinetics (see Figs. 1 and 2 in Oliveberg et al., 1995).48 Although there is no denying that barnase is not a fixed two-state system,7, 73 and their conclusion that “the non-Arrhenius behaviour of proteins is a consequence of the ensemble-averaged difference in heat capacity between the various reaction states” is rather remarkable, the pronounced curvature of barnase is highly unlikely to be a signature of three-state kinetics, but instead could be predominantly due to its larger size: While barnase is a 110-residue protein with ΔCpD-N = 1.7 kcal.mol-1.K-1,74 CI2 is significantly smaller in size (64 or 83 residues depending on the construct) with significantly lower ΔCpD-N value (0.72 kcal.mol-1.K-1 for the short form and 0.79 kcal.mol-1.K-1 for the long form).75 Although beyond the scope of this article and addressed elsewhere, it is important to recognize at this point that the non-linear temperature-dependence of equilibrium stability (see Fig.1 in Becktel and Schellman, 1987) is not the cause of non-Arrhenius kinetics, but instead is the consequence or the equilibrium manifestation of the underlying non-linear temperature-dependence of kf(T) and ku(T).
IMPLICATIONS FOR PROTEIN FOLDING
The demonstration that the equilibrium stability and the rate constants are related to the mean and variance of the Gaussian distribution of the SASA of conformers in the DSE and NSE and the curve-crossing has certain implications for protein folding.12
First, analysing the effect of perturbations such as mutations on equilibrium stability purely in structural and native-centric terms, such as the removal or addition of certain interactions can be flawed because any perturbation that causes a change in the distribution of the conformers in either the DSE, or the NSE, or both, or the curve-crossing can cause a change in equilibrium stability. A corollary is that “mutations need not be restricted to the structured elements of the native fold such as α-helices, β-sheets to cause a change in the rate constants or equilibrium stability, as compared to the wild type or the reference protein.” Nagi and Regan’s work offers a striking example: Increasing the loop-length using unstructured glycine linkers in the four-helix Rop1 leads to a dramatic change in the equilibrium stability, the rate constants, and the denaturant m value (i.e., the mean length of the RC), despite little or no effect on its native structure as determined by NMR and other spectroscopic probes, or its function as indicated by the ability of the mutants to form highly helical dimers and bind RNA (see Table 1 and Figs. 5 and 6 in Nagi and Regan, 1997).76
An important conclusion that we may draw from this Gaussian view of equilibrium stability is that if the DSEs and the NSEs of two related or unrelated spontaneously folding fixed two-state systems have identical mean SASAs and variances under identical environmental conditions (pressure, pH, temperature, ionic strength, co-solvents etc.), and if the position of the TSE along the SASA-RC is also identical for both, then irrespective of the: (i) primary sequence, including its length; (ii) amount of residual structure and the kinds of residual interactions in the DSE; (iii) topology of the native fold and the kinds of interactions that stabilize the native fold; and (iv) folding and unfolding rate constants, the said two-state systems must have identical equilibrium stabilities.
A further consequence of stability being a function of the variance of the distribution of the conformers in the ground states and the curve-crossing is that “the contribution of a noncovalent interaction to equilibrium stability is not per se equal to the intrinsic Gibbs energy of the bond if the removal of the said interaction perturbs the variances of the both the DSE and NSE.” What is implied by this is that, for example, if the removal of a salt-bridge in the native hydrophobic core of a hypothetical protein decreased the equilibrium stability by say 3 kcal.mol-1, it is not logically incorrect to state that the removal of salt-bridge destabilized the protein by the said amount; however, what need not be true is the conclusion that the Gibbs energy of the said interaction is identical to the change in equilibrium stability brought forth by its removal.77 It thus provides a rational explanation for why the Gibbs energies of molecular interactions such as hydrogen bonds and salt bridges as inferred from structural-perturbation-induced changes in equilibrium stability vary significantly across proteins.78, 77
Second, since atoms are incompressible under conditions where the primary sequence exists as an entity, the maximum compaction (i.e., the reduction in SASA) that a given primary sequence can achieve is dictated by the excluded volume effect. Thus, the expectation is that as the chain length increases, the SASAs of the both the DSE and the NSE increase, albeit at different rates (otherwise mD-N and ΔCpD-N will not increase with chain length); and on a decreasing absolute SASA scale, the position of the DSE and the NSE shift to the left with a concomitant increase in the relative separation between them. Conversely, a decrease in chain length will cause a shift in the position of both DSE and the NSE to the right with a concomitant decrease in the relative separation between them. What this implies is that the variance of the NSE is highly unlikely to be several orders of magnitude greater than that of the DSE (Figures 6 and 7 and the values of the force constants given in the legends for Figures 10, 11 and 12). Thus, as long as the ratio of the variances of the DSE and the NSE is not a large number, and as long as there is a need to bury a minimum SASA by the denatured conformers for the microdomains (formed en route to the TSE or pre-existing in the DSE) to collide, coalesce and cross the Gibbs barrier to reach the native Gibbs basin, spontaneously-folding two-state systems can only be marginally stable.80 A corollary of this is that “the marginal stability of spontaneously-folding proteins is a consequence of physical chemistry.”81, 82 In other words, intelligent life is built around marginal stability. This takes us back to Anfinsen’s thermodynamic hypothesis:83 Random mutations can lead to a repertoire of primary sequences via the central dogma; but whether or not these will fold into regular structures (spontaneously or not), and which when folded are stable enough to withstand thermal noise by virtue of their numerous intra-protein and protein-solvent interactions, and consequently, reside long enough in the native basin, thus giving rise to what we term “equilibrium stability” is ultimately governed by the laws of physical chemistry.80, 82, 83 A detailed discussion on the physical basis for why the denatured conformers, in general, must diffuse a minimum distance along the SASA-RC for them to fold, and how this is related to the marginal stability of proteins is beyond the scope of this article and will be addressed in subsequent publications.
Third, it tells us that the equilibrium and kinetic behaviour of proteins in vivo can be significantly different from what we observe in vitro, not because the laws of physical chemistry do not apply to cellular conditions, but because the mean and variance of the distribution of the conformers in the ground states including the curve-crossing, owing to macromolecular crowding, can be significantly different in vivo.86, 87 This is apparent from the dramatic effect of metabolites such as glucose on the rate constants, the equilibrium stability, and mD-N (see Supplementary Fig. 4 in Wensley et al., 2010).88 Thus, from the perspective of the parabolic hypothesis, isozymes are a consequence of a primary sequence optimization for function in a precisely defined environment. A further natural extension of folding and stability being functions of the Gaussian variances of the ground states and the curvecrossing is that the disulfide bonds in those proteins that fold in the highly crowded cytosol but must function in less crowded environments (for instance, cell-surface, soluble-secreted and extracellular matrix proteins) could be an evolutionary adaptation to fine tune the variances of the ground states for less crowded environments.89 This will be dealt with in greater detail in subsequent publications where we will show from analysis of experimental data that the variances of DSE and the NSE are crucially dependent on their ensemble-averaged SASA, and the more expanded or solvent-exposed they are, the greater is their Gaussian variance, and vice versa.
Fourth, any experimental procedure that significantly perturbs the Gaussian mean and variance of the distribution of the SASA of the conformers in either the DSE or the NSE, or both, can significantly influence the outcome of the experiments, even if the final readout such as equilibrium stability is relatively unperturbed. These include treatments such as tethering the protein under investigation to a surface, or the covalent attachment of large donor and acceptor fluorophores such as those of the Alexa Fluor family (∽1200 Dalton). Consequently, the conclusions based on data obtained from such measurements, although may be applicable to the system being studied, may not be readily extrapolated to the unperturbed system that is either free in solution or devoid of extrinsic fluorophores.90, 91 This can especially be true if one places a large donor and acceptor labels on a very small protein or a peptide. A detailed comparison of the chemically-denatured and force-denatured DSEs of ubiquitin demonstrate that consistent with the large amount of data on the DSEs of proteins while the chemically-denatured DSE comprises significant population of α-helices, the force-denatured DSE is devoid of such secondary structural elements except under the lowest applied force.92 Tethering a polypeptide to a surface has also been shown to greatly reduce the attempt frequency with which the protein samples its free energy.93
In summary, any perturbation– which may be intrinsic (cis-acting) or extrinsic (trans-acting) from the viewpoint of the primary sequence– that causes a change in the mean and the variance of the Gaussian distribution of the SASA of the conformers in the DSE, or the NSE, or both, or the curve-crossing can affect the equilibrium and kinetic behaviour of proteins. The cis-acting perturbations can be: (i) a change in the primary sequence via change in the gene sequence; (ii) any post-translation modification (phosphorylation, glycosylation, methylation, nitrosylation, acetylation, ubiquitylation, etc.) including covalent linking of fluorophores for the purpose of monitoring the dynamics of various protein conformational states; (iii) the introduction of disulfide bonds; and (iv) a change in the isotope composition of the primary sequence, i.e., homonuclear vs heteronuclear. The trans-acting perturbations can be: (i) a change in the temperature; (ii) a change in the pressure; (iii) a change in the solvent properties such as pH, ionic strength, solvent isotope, i.e., H2O vs D2O; (iv) macromolecular crowding; (v) selective non-covalent binding of any entity whether be it a small molecule (ligand-gating of ion channels, metal ions as in calcium signalling, metabolites, nucleotides and nucleotide-derivatives, the binding of substrates to enzymes, hormones, pharmaceutical drugs etc.), peptides and proteins (for example, a chaperone-client interaction, a chaperone-co-chaperone interaction, the cognate partner of an intrinsically denatured protein, the interaction between a G-protein coupled receptor and the G-protein, nucleotide exchange factors, protein and peptide therapeutics etc.), and DNA and RNA, to the conformers in either the DSE or the NSE; (vi) covalent tethering of the polypeptide to a surface, whether be it a synthetic such as a glass slide (typically employed in single-molecule experiments), or biologic such as the cell-walls of prokaryotes, the plasma membrane and the membranes of other organelles, such as those of the nucleus, the endoplasmic reticulum, the Golgi complex, mitochondria, chloroplasts, peroxisomes, lysosomes etc. (includes extrinsic, intrinsic, and membrane-spanning proteins); (vii) voltage, as in the case of voltage-gated ion channels; and (viii) molecular confinement, as in the case of chaperonin-assisted folding and proteasome-assisted degradation of proteins.
METHODS
Standard chevron-equation for two-state folding
The denaturant-dependence of the observed rate constant of any given two-state folder at constant temperature, pressure and solvent conditions (pH, buffer concentration, co-solvents other than the denaturant are constant) is given by the standard chevron-equation:64 where kobs(Den)(T) denotes the denaturant-dependence of the observed rate constant, kf(H2O)(T) and ku(H2O)(T) are the first-order rate constants for folding and unfolding, respectively, in water, [Den] is the denaturant concentration in molar, mkf(T) and mku(T), are the denaturant dependencies of the natural logarithm of kf(T) and ku(T) respectively, with dimensions M-1, mTS-D(T) (=RT|mkf(T)|) and mTS-N(T) (=RTmku(T)) are parameters that are proportional to the ensemble-averaged difference in SASA between the DSE and TSE, and between the NSE and TSE, respectively, R is the gas constant and Tis the absolute temperature (Figure 1).25 Fitting kobs(Den)(T) versus [Den] data using non-linear regression to Eqs. (38) and (39) at constant temperature yields the said parameters. Conversely, if the values of mTS-D(T), mTS-N(T), kf(H2O)(T) and ku(H2O)(T) are known for any given two-state folder, one can readily simulate its chevron albeit without the experimental noise.
Modified chevron-equation for two-state folding
The derivation of the modified chevron-equation is straightforward: mTS-D(T), mTS-N(T), kf(H2O)(T) and ku(H2O)(T) in Eq. (40) are replaced with Eqs. (9), (10), (11) and (12) respectively. The expanded equation is too long but the concise form is given by where x = mTS-D(T) and y = mTS-N(T) Fitting kobs(Den)(T) versus [Den] data to this equation using non-linear regression yields the Gibbs energy of unfolding in water, ΔGD-N(H2O)(T), mD-N, the force constants α and ω, and k0. In the fitting procedure the statistical program starts the iterations with a pair of parabolas of arbitrary force constants and simultaneously: (i) adjusts the separation between their vertices along the abscissa such that it is exactly equal to mD-N; (ii) adjusts the separation between their vertices along the ordinate such that it is exactly equal to ΔGD-N(T); (iii) adjusts their curvature such that the separation between the curvecrossing and the vertex of the DSE-parabola along the abscissa is exactly equal to mTS-D(T) all the while looking for a suitable value of the prefactor such that: (a) kf(T) and ku(T) satisfy the Arrhenius equation at each one of the denaturant concentrations, and their sum is identical to the experimentally measured kobs(T) at that particular denaturant concentration; and (b) the principle of microscopic reversibility is satisfied at each one of the denaturant concentrations. The theory underlying the fitting procedure and its inherent limitations are addressed elsewhere.
COMPETING FINANCIAL INTERESTS
The author declares no competing financial interests.
APPENDIX
Expression for the curve-crossing along the abscissa relative to the vertex of the DSE-Gibbs basin
At the curve-crossing we have
Expanding the term in the brackets and recasting gives
The roots of this quadratic equation are given by
Substituting the coefficients a = (ω–α), b = –2ωmD-N c = ω(mD-N)2 – ΔGD-N(T) (A5) and simplifying gives two options
The point where the right arm of the DSE-parabola intersects the left arm of the NSE-parabola along the RC is given by
The point where the right arm of the DSE-parabola intersects the right arm of the NSE-parabola along the RC is given by
Because the TSE occurs in between the vertices of the DSE and the NSE Gibbs basins along the abscissa (this is not always true and is addressed in subsequent publications), we ignore Eq. (A8). Substituting λ = α(mD-N)2 in Eq. (A7) gives
Substituting φ = λω+ΔGD-N(T) (ω–α) in Eq. (A9) yields the final form
Expression for the curve-crossing along the abscissa relative to the vertex of the NSE-Gibbs basin
For a two-state folder we have
Substituting Eq. (A10) in (A11) gives
Simplifying Eq. (A12) gives
Expression for ku(T) using the principle of microscopic reversibility
For a two-state folder, we have from the principle of microscopic reversibility29
Substituting Eq. (11) in (A14) and simplifying gives
Expressions for the Gibbs barrier heights and the rate constants in terms of βT(foid)(T) and βT(unfold)(T)
We have from Tanford’s adaptation of the Brønsted framework to solvent denaturation of proteins25
Substituting Eq. (A16) in (1) and Eq. (A17) in (2) yield
Substituting Eq. (A18) in (11) and (A19) in (12) yield
Expressions for the curve-crossing, kf(T) and ku(T) at the midpoint of cold and heat denaturation
At the midpoint of thermal (Tm) or cold dénaturation (Tc), ΔGD-N(Tm/Tc)= 0. Consequently, Eqs. (A10) and (A13) become
Eqs. (A22) and (A23) may be recast in terms of βT(fold)(T) and βT(unfold)(T) to give
Substituting Eqs. (A24) and (A25) in (13) and (14), respectively, and simplifying gives expressions for the rate constants for folding and unfolding at Tm or Tc
Eqs. (A22) - (A26) demonstrate that the curve-crossing, kf(T) and ku(T), βT(fold)(T) and βT(unfold)(T) at Tm or Tc for any given two-state system are defining constants when solvent and pressure are defined since they depend only on the length of the RC and the force constants, all of which are invariant with temperature. Consequently, these are properties that are dependent purely on the primary sequence when pressure and solvent are defined. There are other defining relationships at Tm or Tc. Substituting ΔGD-N(Tm/Tc) = 0 in Eq. (3) gives
Recasting Eq. (A27) in terms of Eqs. (23) and (24) gives
Eqs. (A27) and (A28) demonstrate that at Tm or Tc the ratio of the slopes of the folding and unfolding arms of the chevron, or the ratio of the distances by which the conformers in the DSE and NSE travel from the mean of their respective ensembles to reach the TSE along the mD-N RC for a given two-state system is identical to: (i) the square root of the ratio of the force constants of the NSE and the DSE; or (ii) the ratio of the standard deviations of the DSE and NSE Gaussians. A corollary is that irrespective of the primary sequence, or the topology of the native state, or the residual structure in the DSE, if for a spontaneously folding two-state system at constant pressure and solvent conditions it is found that at a certain temperature the ratio of the distances by which the denatured and the native conformers must travel from the mean of their ensemble to reach the TSE along the SASA RC is identical to the ratio of the standard deviations of the Gaussian distribution of the SASA of the conformers in the DSE and the NSE, then at this temperature the Gibbs energy of unfolding or folding must be zero.
Footnotes
Vinkensteynstraat 128, 2562 TV, Den Haag, Netherlands
robert.sade{at}gmail.com