Protein Collapse is Encoded in the Folded State Architecture

Himadri S. Samanta; Pavel I. Zhuravlev; Michael Hinczewski; Naoto Hori; Shaon Chakrabarti; D. Thirumalai

doi:10.1101/070920

Abstract

Folded states of single domain globular proteins, the workhorses in cells, are compact with high packing density. It is known that the radius of gyration, R_g, of both the folded and unfolded (created by adding denaturants) states increase as N^ν where N is the number of amino acids in the protein. The values of the celebrated Flory exponent ν are, respectively, , and ≈ 0.6 in the folded and unfolded states, which coincide with those found in homopolymers in poor and good solvents, respectively. However, the extent of compaction of the unfolded state of a protein under low denaturant concentration, conditions favoring the formation of the folded state, is unknown. This problem which goes to the heart of how proteins fold, with implications for the evolution of foldable sequences, is unsolved. We develop a theory based on polymer physics concepts that uses the contact map of proteins as input to quantitatively assess collapsibility of proteins. The model, which includes only two-body excluded volume interactions and attractive interactions reflecting the contact map, has only expanded and compact states. Surprisingly, we find that although protein collapsibility is universal, the propensity to be compact depends on the protein architecture. Application of the theory to over two thousand proteins shows that the extent of collapsibility depends not only on N but also on the contact map reflecting the native fold structure. A major prediction of the theory is that β-sheet proteins are far more collapsible than structures dominated by α-helices. The theory and the accompanying simulations, validating the theoretical predictions, fully resolve the apparent controversy between conclusions reached using different experimental probes assessing the extent of compaction of a couple proteins. As a by product, we show that the theory correctly predicts the scaling of the collapse temperature of homopolymers as a function of the number of monomers. By calculating the criterion for collapsibility as a function of protein length we provide quantitative insights into the reasons why single domain proteins are small and the physical reasons for the origin of multi-domain proteins. We also show that non-coding RNA molecules, whose collapsibility is similar to proteins with β-sheet structures, must undergo collapse prior to folding, adding support to “Compactness Selection Hypothesis” proposed in the context of RNA compaction.

1. INTRODUCTION

Folded states of globular proteins, which are evolved (slightly) branched heteropolymers made from twenty amino acids, are roughly spherical and are nearly maximally compact with high packing densities [1–3]. Despite achieving high packing densities in the folded states, globular proteins tolerate large volume substitutions while retaining the native fold [4]. This is explained in a couple of interesting theoretical studies [5, [6], which demonstrated that there is sufficient free volume in the folded state to accommodate mutations. Collectively these and related studies show that folded proteins are compact. When they unfold, which can be achieved upon addition of high concentrations of denaturants (or applying a mechanical force), they swell adopting expanded conformations. The radius of gyration (R_g) of a folded globular protein is well described by the Flory law with [7], whereas in the swollen state R_g ≈ a_DN^ν, where a_D is an effective monomer size and the Flory exponent ν ≈ 0.6 [8]. Thus, viewed from this perspective we could surmise that proteins must undergo a coil-to-globule transition [9, [10], a process that is reminiscent of the well characterized equilibrium collapse transition in homopolymers [11, [12]. The latter is driven by the balance between conformational entropy and intra-polymer interaction energy resulting in the collapsed globular state. The swollen state is realized in good solvents (interaction between monomer and solvents is favorable) whereas in the collapsed state monomer-monomer interactions are preferred. The coil-to-globule transition in large homopolymers is akin to a phase transition. The temperature at which the interactions between the monomers roughly balance monomer-solvent energetics is the θ temperature. By analogy, we may identify high (low) denaturant concentrations with good (poor) solvent for proteins.

Despite the expected similarities between the equilibrium collapse transition in homopolymers and the compaction of proteins, it is still debated whether the unfolded states of proteins under folding conditions are more compact compared to the states created at high denaturant concentrations. If polypeptide chain compaction is universal, is collapse in proteins essentially the same phenomenon as in homopolymer collapse or is it driven by a different mechanism [13–17]? Surprisingly, this fundamental question in the protein folding field has not been answered satisfactorily [10, [18]. In order to explain the plausible difficulties in quantifying the extent of compaction, let us consider a protein, which undergoes an apparent two-state transition from an unfolded (swollen) to a folded (compact) state as the denaturant concentration (C) is decreased. At the concentration, C_m, the populations of the folded and unfolded states are equal. A vexing question, which has been difficult to unambiguously answer in experiments, is: what is the size, R_g, of the unfolded state under folding conditions (C < C_m)? Small Angle X-ray Scattering (SAXS) experiments on some proteins show practically no change in the unfolded R_g as C is changed [19]. On the other hand, from experiments based on single molecule Fluorescence Resonance Energy Transfer (smFRET) it has been concluded that the size of the unfolded state is more compact below C_m compared to its value at high C [20, [21]. The so-called smFRET-SAXS controversy is unresolved. Resolving this apparent controversy is not only important in our understanding of the physics of protein folding but also has implications for the physical basis of the evolution of natural sequences.

The difficulties in describing the collapse of unfolded states as C is lowered could be attributed to the following reasons. (1) Following de Gennes [22], homopolymer collapse can be pictured as formation of a large number of the blobs driven by local interactions between monomers on the scale of the blob size. Coarsening of blobs results in the equilibrium globule formation with the number of maximally compact conformations whose number scales exponentially with the number of monomers. Other scenarios resulting in fractal globules, enroute to the formation of equilibrium maximally collapsed structures, have also been proposed [23]. The globule formation is driven by non-specific interactions between the monomers or the blobs. Regardless of how the equilibrium globule is reached it is clear that it is largely stabilized by local interactions, because contacts between monomers that are distant along the sequence are entropically unfavorable. In contrast, even in high denaturant concentrations proteins could have residual structure, which likely becomes prominent at C < C_m. At low C there are specific favorable interactions between residues separated by a few or several residues along the sequence. As their strength grows, with respect to the entropic forces, the specific interactions may favor compaction in a manner different from the way non-specific local interactions induce homopolymer collapse. In other words, the dominant native-like contacts also drive compaction of unfolded states of proteins. (2) A consequence of the impact of the native-like contacts (local and non-local) on collapse of unfolded states is that specific energetic considerations dictate protein compaction resulting in the formation of minimum energy compact structures (MECS) [24]. The number of MECS, which are not fully native, is small, scaling as ln N with N being the number of amino acid residues. Therefore, below C_m their contributions to R_g have to be carefully dissected, which is more easily done in single molecule experiments than in ensemble measurements such as SAXS. (3) Single domain proteins are finite-sized with N rarely exceeding ~ 200. Most of those studied experimentally have N < 100. Thus, the extent of change in R_g of the unfolded states is predicted to be small, requiring high precision experiments to quantify the changes in R_g as C is changed. For example, in a recent study [25], we showed that in PDZ2 domain the change in R_g of the unfolded states as the denaturant concentration changes from 6 M guanidine chloride to 0 M is only about 8%. Recent experiments have also established that changes in R_g in helical proteins are small [20].

In homopolymers there are only two possible states, coil and globule, with a transition between the two occurring at T_θ. On the other hand, even in proteins that fold in a two-state manner one can conceive of at least three states (we ignore intermediates here): (i) the unfolded state U_D at high C; (ii) the compact but unfolded state U_C, which could possibly exist below C_m; (iii) the native state. Do the sizes of U_D and U_C differ? This question requires a clear answer as it impacts our understanding of how proteins fold, because the characteristics of the unfolded states of proteins plays a key role in determining protein foldability [26–28].

Given the flexibility of proteins (persistence length on the order of 0.5 – 0.6 nm), we expect that the size of the extended polypeptide chain must gradually decrease as the solvent quality is altered. Experiments on a number of proteins show that this is the case [29–31]. However, in some SAXS experiments the theoretical expectation that for one protein was not borne out [10, [19], precipitating a more general question: are chemically denatured proteins compact at low C? The absence of collapse is not compatible with inferences based on smFRET [21] and theory [26]. Here, we create a theory to not only resolve the smFRET-SAXS controversy but also provide a quantitative description of how the propensity to be compact is encoded in the native topology. The theory, based on polymer physics concepts, includes specific attractive interactions (mimicking interactions accounting for native contacts in the Protein Data Bank (PDB)) and a two-body excluded volume repulsion. By construction the model does not have a native state. In order to validate the theoretical predictions, we performed simulations using a completely different model often used in protein folding simulations. In both the models, there are only two states (analogues of U_D and U_C) in the model. The formation of U_C is driven by the contact map of the folded state. Thus, chain compaction is driven in much the same way as in homopolymers, altered only by specific interactions that differentiate proteins from homopolymers.

Theory and simulations predict how the extent of compaction (collapsibility) is determined by the strength and the number of the native contacts and their locations along the chain. We use a large representative selection of proteins from the PDB to establish that collapsibility is an inherent characteristic of evolved protein sequences. A major outcome of this work is that β-sheet proteins are far more collapsible than structures dominated by α-helices. Our theory suggests that there is an evolutionary pressure on proteins for being compact as a pre-requisite for kinetic foldability, as we predicted over twenty years ago [26]. We come to the inevitable conclusion that the unfolded state of proteins must be compact under native conditions, and the mechanism of polypeptide chain compaction has similarities as well as differences to collapse in homopolymers. As a by-product of this work, we also establish that certain non-coding RNA molecules must undergo compaction prior to folding as their folded structures are stabilized predominantly by long-range tertiary contacts.

2. THEORY

We start with an Edwards Hamiltonian for a polymer chain [32]: where r(s) is the position of the monomer s, α_o the monomer size, and N is the number of monomers. The first term in Eq. (1) accounts for chain connectivity, and the second term represents volume interactions and favorable interactions between select monomers given by 𝒱(r(s)),

The first term in Eq.(2) accounts for the homopolymer (non-specific) two-body interactions. It is well established in the theory of homopolymers that in good solvents with υ > 0 the polymer swells with R_g ~ aN^ν (ν ≈ 0.6). In poor solvents (υ < 0) the polymer undergoes a coil-globule transition with R_g ~ aN^υ (υ ≈ 1/3). These are the celebrated Flory laws. Here, we consider only the excluded volume repulsion case (υ > 0).

The second term in Eq. (2) requires an explanation. The generic scenario for homopoly-mer collapse is based on an observation by de Gennes, who pictured the collapse process as being driven by the initial formation of blobs that arrange to form a sausage-like structure. At later stages the globule forms to maximize favorable intra-molecular contacts while simultaneously minimizing surface tension. Compaction in proteins, although shares many features in common with homopolymer collapse, could be different. A key difference is that the folded states of almost all proteins are stabilized by a mixture of local contacts (interaction between residues separated by less than say ~ 8 but greater than 3 residues) as well as non-local (> 8 residues) contacts. Note that the demarcation using 8 between local and non-local contacts is arbitrary, and is not germane to the present argument. These specific interactions also dominate the enthalpy of formation of the compact, non-native state U_C, playing an important role in its stability. Previous studies using lattice models of proteins in two [33] and three [34] dimensions showed that formation of compact but unfolded states are predominantly driven by native interactions with non-native interactions playing a sub-dominant role. A more recent study [35], analyzing atomic detailed folding trajectories has arrived at the same conclusion. Therefore, our assumption is that the topology of the folded state could dictate collapsibility (the extent to which the U_D state becomes compact as the denaturant concentration is lowered) of a given protein. In combination with the finite size of single domain proteins (N ~ 200), the extent of protein collapse could be small. In order to assess chain compaction under native conditions we should consider the second term in Eq.(2).

It is worth mentioning that several studies investigated the consequences of optimal packing of polymer-like representations of proteins [36–42]. These studies primarily explain the emergence of secondary structural elements by considering only hard core interactions, attractive interactions due to crowding effects [40, [43], or formation of compact states induced by anisotropic attractive patchy interactions [42]. However, the absence of tertiary interactions in these models, which give rise to compact states of varying topologies, prevents them from addressing the coil-to-globule transition. This requires creating a microscopic model along the lines described here.

We note in passing (with discussion to follow) that a number of studies have considered the effect of crosslinks on the shape of polymer chains [44–50]. Polymers with crosslinks have served as models for polymer gels and rubber elasticity [51–53]. In these studies the contacts were either random, leading to the random loop model [45], or explicit averages over the probability of realizing such contacts were made [44, 54], as may be appropriate in modeling gels. These studies inevitably predict a coil-to-globule phase transition as the number of crosslinks increases.

In contrast to models with random crosslinks, in our theory attraction exists only between specific residues, described by the second term in Eq. (2), where the sum is over the set of interactions (native contacts) involving pairs {s_i, s_j}. We use the contact map of the protein (extracted from the PDB structure) in order to assign the specific interactions (their total number being N_nc). The contact is assigned to any two residues s_i and s_j if the distance between their C_α atoms in the PDB entry is less than R_c = 0.8nm and |s_i − s_j| > 2. We use Gaussian potentials in order to have short (but finite) range attractive interactions. For the excluded volume repulsion, this range is on the order of the size of the monomer, a₀ = 0.38 nm. For the specific attraction, the range is the average distance in the PDB entry between C_α atoms forming a contact (averaged across a selection of proteins from the PBD). We obtain σ = 0.63 nm.

By changing the value of k, and hence the strength of attraction, there is a transition between the extended and compact states. Decreasing k is analogous to chemically denaturing proteins, although the connection is not precise. At high denaturant concentrations (k ≈ 0, good solvent) the excluded volume repulsion (first term in Eq.(2)) dominates the attraction, while at low C (high k, poor solvent) the attractive interactions are important. The point where attraction balances repulsion is the θ-point, and the value of k = k_θ. Although reserved for the coil-to-globule transition in the limit of N ≫ 1 in homopolymers, we will use the same notation (θ-point) here. In our model, at the θ-point, the chain behaves like an ideal chain. To describe the globular state, a three-body repulsion needs to be added to the Hamiltonian (Eq. (2)), but we focus on the region between the extended coil and the θ-point because our interest is to access only the collapsibility of proteins. If k_θ is very large then significant chain compaction would only occur at very low (C ≪C_m) denaturant concentrations, implying low propensity to collapse. Conversely, small k_θ implies ease of collapsibility. Note that the ground state (k ≫ 1) of the Hamiltonian in Eq. (2) is a collapsed chain whose R_g is on the order of the monomer size. In other words, a stable native state does not exist for the model described in Eq. (2). Thus, we define protein collapse as the propensity of the polypeptide chain to reach the θ-point as measured by the k_θ value, and use the changes in the radius of gyration R_g as a measure of the extent of compaction.

Assessing collapsibility:For our model, which encodes protein topology without favoring the folded state, we calculate using the Edwards-Singh (ES) method [55]. Although from a technical view point the ES method has pros as well as cons, numerous applications show that in practice it yields physically sensible results on a number of systems. First, ES showed that the method does give the correct dependence of on N for homopolymers. Second, even when attractive interactions are included, the ES method leads to predictions, which have been subsequently verified by more sophisticated theories. An example of particular relevance here is the problem of the size of a polymer in the presence of obstacles (crowding particles). The results of the ES method [56] and those obtained using renor-malization group calculations [57] are qualitatively similar. Here, we adopt the ES method, allowing us to deduce far reaching conclusions for protein collapsibility than is possible solely based on simulations. We use simulations on a limited set of proteins to further justify the conclusions reached using the analytic theory.

The ES method is a variational type calculation that represents the exact Hamiltonian by a Gaussian chain, whose effective monomer size is determined as follows. Consider a virtual chain without excluded volume interactions, with the radius of gyration [55], described by the Hamiltonian, where the monomer size in the virtual Hamiltonian is a. We split the deviation 𝒲 between the virtual chain Hamiltonian and the real Hamiltonian as, where The radius of gyration is , with the average being, where 〈· · · 〉_υ denotes the average over ℋ_υ.

Assuming that the deviation 𝒲 is small, we calculate the average to first order in 𝒲. The result is, and the radius of gyration is If we choose the effective monomer size a in ℋ_υ such that the first order correction (second and third terms on the right hand side of Eq. (A5)) vanishes, then the size of the chain is, . This is an estimate to the exact , and is an approximation as we have neglected 𝒲² and higher powers of 𝒲. Thus, in the ES theory, the optimal value of a from Eq. (A5) satisfies, Since 𝒲 = 𝒲₁ + 𝒲₂, the above equation can be written as Evaluation of the 〈r²(s)𝒲₁〉_υ term yields,

With the help of Eq. (11) and Eq. (9) we obtain the following self-consistent expression for a, Calculating the averages in Fourier space, where , we obtain

The best estimate of the effective monomer size a can be obtained by numerically solving Eq. (13) provided the contact map is known. A bound for the actual size of the chain is . Because we are interested only in the collapsibility of proteins we use the definition of the θ-point to assess the condition for protein compaction instead of solving the complicated Eq. (13) numerically. The volume interactions are on the right hand side of Eq. (13). At the θ-point, the υ-term should exactly balance the k-term. Since at the θ-point the chain is ideal with a = a₀, we can substitute this value for a in the sums in the denominators of the υ- and k-terms. By equating the two, we obtain an expression for k_θ. Thus, from Eq. (13), the specific interaction strength at which two-body repulsion (υ-term) equals two-body attraction (k-term) is: The numerator in Eq. (14) is a consequence of chain connectivity and the denominator encodes protein topology through the contact map, determining the extent to which the sizes in U_D and U_C states change as C becomes less than C_m. The numerical value of k_θ is a measure of collapsibility.

A comment about the solution of Eq. (13) for a is worth making. For k = 0, corresponding to the good solvent condition, we expect that a ≫ a₀. In this case, analysis of Eq. (13), in manner described in Appendix A, shows that there is only one solution with . Similarly, at k_θ Eq. (13) also admits only one solution. Thus, from the structure of Eq. (13) we surmise there are no multiple solutions, at least in the extreme limits υ = 0 and k = 0.

The expression for k_θ(Eq. (14)) is equally applicable to homopolymers in which contacts between all monomers are allowed, provided the self-avoidance condition is not violated. In Appendix A, we derive an expression for k_θ ∝ T_θ ~ υ(1 – (υN^−0.5)/2). Thus, our model correctly reproduces the known N dependence of T_θ obtained long ago by Flory [58] using insightful mean field arguments.

3. RESULTS

Native topology determines collapsibility: The central result in Eq. (14) can be used to quantitatively predict the extent to which a given protein has a propensity to collapse. We used a list of proteins with low mutual sequence identity selected from the Protein Data Bank PDBselect [59], and calculated k_θ using Eq. (14) for these proteins. In all we considered 2306 proteins. For each contact (i,j), the energetic contribution due to interaction between i and j is k = (2πσ²)^−3/2k according to Eq. (2). Thus, k_θ = (2πσ²)^−3/2k_θ is the average strength (in units of k_BT) of a contact at the θ-point. If k_θ, calculated using Eq. (14), is too large then the extent of polypeptide chain collapse is expected to be small. It is worth reiterating that the theory cannot be used to determine the stability of the folded state, because in the Hamiltonian there are only two states, U_D(k = 0 in Eq.(2)) and U_C (k > k_θ).

The strength of contacts in real proteins (excluding possibly salt bridges) is typically on the order of a few k_BT in the absence of denaturants. This is the upper bound for the contact strength any theory should predict, as adding denaturant only decreases the strength. If k_θ is unrealistically high (tens of k_BT) then the attractive interactions of the protein would be too weak to counteract the excluded volume repulsion even at zero denaturant concentration, resulting in negligible difference in R_g between the U_d and U_C states.

Fig.(1a) shows a two-dimensional histogram of the PDBselect proteins in the (N,k_θ) plane. For the majority of small proteins (less than 150 residues) the value of k_θ is less than 3 k_BT, indicating that the unfolded states of all of these proteins should become compact at C < C_m. That collapse must occur, as predicted by our theory and established previously in lattice [26], and off-lattice models of proteins [60], does not necessarily imply that it can be easily detected in standard scattering experiments, because the changes could be small requiring high precision experiments (see below).

FIG. 1:

(a) Collapsibility quantified using k_θ (in units of k_BT) for a set of 2,306 PDB structures as a function of the length N of the proteins. White lines show the k_θ at the boundaries for maximally and minimally collapsible proteins (lower and upper lines respectively). Colors give a rough estimate of the number of proteins, which decreases from red to violet. A dynamic visualization of the data is available at author’s website [61]. (b) Weight function W (Eq.(15)) of a contact, showing how much a contact between residues i and j contributes to the compaction of a protein. The colors are for different N values (shown in the inset). Interestingly, the location of the maximum is roughly independent of N.

Weight function of a contact: For a given N, the criterion for collapsibility in Eq. (14) depends on the architecture of the proteins explicitly represented in the denominator through the contact map. Analysis of the weight function of a contact, defined below, provides a quantitative measure of how a specific contact influences protein compaction. Some contacts may facilitate collapse to a greater extent than others, depending on the location of the pair of residues in the polypeptide chain. In this case, the same number of native contacts N_nc in the protein of the same length N might yield a lower (easier collapse) or higher (harder collapse) value of k_θ. In order to determine the relative importance of the contacts with respect to collapse, we consider the contribution of the contact between residues i and j in the denominator of Eq. (14), A plot of W(i − j) in Fig.(1b) for different values of the chain length N shows that the weight depends on the distance between the residues along the chain. Contacts between neighboring residues have negligible weight, and there is a maximum in W(i − j) at i − j ≈ 30 (for a₀ /σ = 0.6), almost independent of the protein length. The maximum is at a higher value for proteins with N > 100 residues. The figure further shows that longer range contacts make greater contribution to chain compaction than short range contacts. The results in Fig. (1b) imply that proteins with a large fraction of non-local contacts are more easily collapsible than those dominated by short range contacts, which we elaborate further below.

Maximum and minimum collapsibility boundaries: Using W(i − j) in Eq. (15), we can design protein sequences to optimize for “collapsibility”. To design a “maximally collapsible” protein, for fixed N and number of native contacts N_nc, we assign each of the N_nc contacts one by one to the pair i,j with a maximal W(i,j) among the available pairs with the criterion that |i−j| > 2. Such an assignment necessarily implies that the artificially designed contact map will not correspond to any known protein. Similarly, we can design an artificial contact map by selecting i,j pairs with minimal W(i,j) till all the N_nc are fully assigned. Such a map, which will be dominated by local contacts, are minimally collapsible structures.

The white lines in Fig.(1a) show k_θ of chains of length N with N_nc(N) contacts distributed in ways to maximize or minimize collapsibility. We estimated N_nc(N) ≈ 0.6N^γ, with γ ≈ 1.3, from the fit of the proteins selected from the PDBSelect set ( a fuller discussion is presented in Appendix A). Since the lines are calculated for N_nc from the fit over the entire set, and not from N_nc for every protein, there are proteins below the minimal and above the maximal curve in Fig.(1a). For a given protein, with N and N_nc defined by its PDB structure, k_θ for all possible arrangements of native contacts is largely in between the maximally and minimally collapsible lines in Fig.(1a). The majority of proteins in our set are closer to the maximal collapsible curves, suggesting that the unfolded proteins have evolved to be compact under native folding conditions. This theoretical prediction is in accord with our earlier studies which suggested that foldability is determined by both collapse and folding transitions [26], and more recently supported by experiments [20].

β-sheet rather than α-helical proteins undergo larger compaction: The weight function W (Eq. (15) and Fig.(1b)) suggests that contacts in α-helices (|i − j| = 4) only make a small contribution to collapse. Contacts corresponding to the maximum of W at i − j ≈ 30 are typically found in loops and long antiparallel β-sheets. Fig.(2) shows a set of proteins with high α-helix (> 90%) and a set with high content of β-sheets (> 70%) [61]. The values of k_θ for the two sets are very distinct, so they barely overlap. We find that many of the α-helical proteins lie on or above the curve of minimal collapsibility while the rest are closer to the maximal collapsibility. The smaller β-rich proteins lie on the curve of maximal collapsibility slightly diverging from it as the chain length grows. These results show that the extent of collapse of proteins that are mostly α-helical is much less than those with predominantly β-sheet structures.

FIG. 2:

Dependence of k_θ on the secondary structure content of proteins. We display k_θ for α-rich (> 90%) and β-rich (> 70%) proteins. Proteins that are predominantly α-helical tend to be close to minimally collapsible (upper line), while β-rich proteins are close to maximally collapsible curve (lower curve). The green stars are for RNA with the left one at small N corresponding to theMouse Mammary Tumor Virus (MMTV) pseudoknot (N=34) and the other is Azoarcus ribozyme (N=196).

A note of caution is in order. The minimal collapsibility of most α-helical proteins in the set may be a consequence of some of them being transmembrane proteins, which do not fold in the same manner as globular proteins. Instead, the transmembrane α-helices are inserted into the membrane by the translocon, one by one, as they are synthesized. Such proteins would not have the evolutionary pressure to be compact.

Comparison between theory and simulations: The major conclusions, summarized in Figs.(1-2), are based on an approximate theory. In order to validate the theoretical predictions, we performed simulations for 21 proteins using realistic models (see Appendix B for details) that capture the known characteristics of the unfolded states of proteins and the coil to globule transition.

In accord with our theoretical predictions, R_g decreases as k increases. For k = 0, corresponding to the maximally expanded state (high denaturant concentration) we expect that R_g ≈ a_DN^0.588. A plot of R_g versus N^0.588 is linear with a value of a_D = 0.25 nm (Fig.3a). Remarkably, this finding is in accord with the experimental fit showing R_g ≈ a_DN^0.588 with a_D = 0.2 nm [8]. The modest increase in the a_D, compared to the experimental fit, predicted here can be explained by noting that in real proteins there is residual structure even at high denaturant concentrations whereas in our model this is less probable. The scaling shown in Fig. (3a) shows that the model used in the simulations provides a realistic picture of the unfolded states. We emphasize that the parameters in the simulations were not adjusted to obtain the correct R_g scaling or a_D.

FIG. 3:

(a) Average R_g at k = 0 is plotted as a function of N^0.588 for the 21 proteins. The line is a fit, R_g = 0.25N^0.588 − 0.15 (nm). (b) The probability distribution of the radius of gyration, P(R_g) for different values of interaction strength k for protein-L. As k increases, the distribution becomes narrower. (c) Same as (a) except this panel shows end-to-end distribution P(R_ee) for different values of attractive strength k for protein-L. The similarity between P(R_ee) and P(R_g) shows that R_ee also is a reasonable measure of compaction.

In Fig. (4) we show the dependence of R_g as a function of k for three representative proteins along with their native and unfolded structures and contact maps. The α helical protein myoglobin and the β-lactoglobulin with β sheet architecture, have nearly the same number of amino acids, N ~ 150. The sizes of the two proteins are similar (Fig.4b) when k is small (k < 0.5) implying that the values of R_g in the unfolded states are determined solely by N (see Fig.3a). For each protein, we identified k_θ from simulations with the k value at which is a minimum. Using this method, we find that the k_θ value for β-lactoglobulm is less than for myoglobin. This result is consistent with the theoretical prediction, demonstrating that generically α proteins are less collapsible than β proteins. Interestingly, TIM barrel, an α/β protein with larger chain length (N = 246), collapses at k_θ = 1.6, which is larger than β-lactoglobulin but smaller than myoglobin (purple line in Fig.4b). These results are qualitatively consistent with theoretical predictions.

FIG. 4:

Collapse transitions revealed by simulations for three representative proteins. (a) Contact maps and ribbon-diagram structures of three proteins, β-lactoglobulin (top), TIM barrel (middle), and myoglobin (bottom). Representative structures in the simulations are also shown for three values of k. (b) Average radius of gyration, 〈R_g〉, monotonically decreases as k increases. The three proteins with different native topology have different k_θ values with myoglobin being less collapsible (larger k_θ) than β-lactoglobulin. (c) Average end-end distance, 〈R_ee〉, also monotonically decreases as k increases although the changes in R_ee are larger than in R_g. The middle panel shows snapshots from simulations at different k. The predicted conformation at k ≈ k_θ is not random, supporting experiments showing persistent structures in the collapsed state of proteins.

In Fig. (5), we compare the predicted k_θ (Eq. (14)) and the values from simulations. The absolute values of k_θ are different between simulations and theory because we used entirely different models to describe the coil to globule transition. The potential used in the theory, convenient for serving analytic expression for k_θ, is far too soft to describe the structures of polypeptide chains. As a result the polypeptide chains explore small R_g values without significant energetic penalty. Such unphysical conformations are prohibited in the realistic model used in the simulations. Consequently, we expect that the theoretical values of k_θ should differ from the values obtained in simulations. Despite the differences in the potentials used in theory and simulations, the trends in k_θ predicted using theory are the same as in simulations. The Pearson correlation coefficient, ρ = 0.79. Since we examined only 21 proteins in simulations, which is fewer than theoretical predictions made for 2306 proteins, we analyzed the correlation data by the bootstrap method to ascertain the statistical significance of ρ. The estimated probability distribution of ρ is shown in Fig. (5b). The mean of correlation coefficient is 0.78 and ρ_90% > 0.61 with 90% confidence. The distribution is bimodal indicating that there is at least one outlier in the data set, which is likely to be the three helix bundle B domain of Protein A (labeled 5 in Fig. (5)). For 20 proteins excluding Protein A, the distribution has a single peak (green broken line) with the mean 0.88 and ρ_90% > 0.82 (green dotted line in Fig. (5)). From these results, we surmise that both theory and simulations qualitatively lead to the conclusion that proteins with β-sheet architecture are more collapsible than α-helical is structures, which is one of the major predictions of this work.

FIG. 5:

(a) Correlation between simulation results and theoretical predictions for k_θ. The trends observed in simulations are consistent with theoretical predictions. The horizontal axis is theoretical k_θ, and the vertical axis is k_θ value from simulations. In general, the theoretical k_θ values are larger than what is obtained in simulations (see the main text for an explanation), with the exception of protein labeled 5, a small all α-helical B domain of protein A. In both theory and simulations, all-α proteins (blue crosses) have greater k_θ, and all-β proteins (red circles) have smaller k_θ. The purple triangles are for proteins with α/β and α + β architecture. The linear-regression line for all data points is shown in black and the Pearson correlation coefficient is 0.79. Following is the complete list of 21 proteins with their PDB code and number of residues in parentheses. 1: Myoglobin (1mbo, 153); 2: Spectrin (3uun, 116); 3: Endonuclease III (2abk, 211); 4: BRD2 Bromodomain (5ibn, 111); 5: B domain of Protein A (1bdd, 51); 6: Villin headpiece (1vii, 36); 7: Homeodomain (1enh, 49); 8: GFP (1gfl, 230); 9: β-lactoglobulin (1beb, 156); 10: PDZ2 (1gm1, 94); 11: src SH3 (1srl, 56); 12: CspTm (1g6p, 66); 13: TIM Barrel (1r2r, 246); 14: Lysozyme (2lyz, 129); 15: CheY (3chy, 128); 16: Protein L (1K53, 64); 17: Barstar (1bta, 89); 18: RNase H (2rn2, 155); 19: Proteinase K (2id8, 279); 20: Ubiquitin (1ubq, 76); 21: Monellin (1iv9, 96). (b) Population distribution of the correlation coefficient estimated by the bootstrap analysis. The blue curve is generated for the data set of all 21 proteins examined in the simulations. The mean of correlation coefficient is ρ = 0.78 with ρ_90% > 0.61 (vertical dotted line) with 90% confidence. The distribution has two peaks indicating that there is at least one outlier in the data set, which is the B domain of Protein A. For remaining 20 proteins, the distribution has a single peak (green broken line) with the mean 0.88 and ρ_90% > 0.82 (green dotted).

Given that the simulations describe the characteristics of the unfolded states, we show in Fig.(3b) the variations in the probability distribution of R_g, P(R_g) for protein-L as a function of k. The broadest distribution, with k = 0, corresponds to the extended chain. We find that P(R_g) becomes narrower as the attractive strength (k) increases. The continuous shift to the compact state with gradual increase in the attractive strength is consistent with experiments that the unfolded proteins collapse as the denaturant concentration decreases. Thus, generally R_g of the U_C state is less than that of the U_D state. The end-to-end distribution, P(R_ee), for different values of values of k in Fig.(3c) is broad at k = 0 corresponding to the unfolded protein. Average R_ee decreases as attractive strength increases and the distribution becomes narrower. The results in Fig.(3) show that both R_ee, which can be inferred using smFRET, and R_g (measurable using SAXS), are smaller in the U_C state than the U_D state. However, the extent of decrease is greater in R_ee than R_g, an observation that has contributed to the smFRET-SAXS controversy.

RNAs are compact: There are major differences between how RNA and proteins fold [62]. In contrast to the apparent controversy in proteins, it is well established that RNA molecules are compact [63–65] at high ion concentrations or at low temperatures. Because our theory relies only on the knowledge of contact map, used to assess collapsibility in Azoarcus ribozyme and MMTV pseudoknot to merely illustrate collapsibility of RNA (Fig. (6)). The k_θ values (green stars in Fig. (2)) are close to the lower β-sheet line, indicating that these molecules must undergo compaction as they fold. This prediction from the theory is fully supported by both equilibrium and time-resolved SAXS experiments [66] on Azoarcus ribozyme. In this case (N = 196) the changes are so large that even using low resolution experiments collapse is readily observed [67]. We should emphasize that the size of different RNAs (for example viral, coding, non-coding) vary greatly. For a fixed length, single-stranded viral RNAs have evolved to be maximally compact, which is rationalized in terms of the density of branching. Although the sizes of the viral RNAs considered in [68] are much longer than the Azoarcus ribozyme the notion that compaction is determined by the density of branching might be valid even when N ~ 200.

FIG. 6:

Native topologies of two RNA molecules, MMTV pseudoknot (a–c) and Azoarcus ribozyme (d-f). Three-dimensional structures (a and d), secondary structures (b and e), and contact maps (c and f) are shown for each RNA. Colors are used to distinguish secondary structures. Contact pairs in RNA are defined as any nucleotide pair i and j (|i–j| > 2) satisfying R_ij < 14, where R_ij is the distance between centers of mass of the nucleotides [76]. MMTV has two stem basepairs (cyan and green in a-c), which contribute to non-local contacts (cyan and green in c). Azoarcus ribozyme has several hairpin basepairs (P2, P5, P6, P8 and P9) which can be seen in the vicinity of the diagonal in the contact map (f). There are also basepairs between nucleotides far along the sequence such as P3, P4 and P7, as well as tertiary interactions such as TL2-TR8 and TL9-TR5. These non-local contacts contribute to the collapsibility of the ribozyme.

Dependence of k_θ on the values of the cut-off: In order to ensure that the theoretical predictions do not change qualitatively if the cutoff values are changed, we varied them over a reasonable range. The reason for our choice of R_c is that in majority of folding simulations, using C_α representation of proteins, R_c = 0.8 nm is typically used. Consider the variation of k_θ with R_c, the cut-off used to define contacts at a fixed σ = 0.63 nm. As R_c increases the number of contacts also increases. From Eq. (14) it follows that k_θ should decrease, which is borne out in the results in Fig.(8a). Reassuringly, the trends are preserved. In particular, the prediction that β-sheet proteins are most collapsible is independent of R_c. The trend that β-rich proteins are more collapsible than α-rich proteins remains same irrespective of the R_c values.

Fig.(8b) shows the changes in k_θ for proteins as a function of σ (contact distance) for fixed R_c = 0.8 nm. The k_θ values decrease with increasing σ. The predicted trend is independent of the precise value. It is worth emphasizing that the predictions based on simulations that the size of the proteins at k_θ is about (5-8)% of the folded state was obtained using σ = 0.63nm. This range is consistent with estimates based on experiments on a few proteins (see for example [69]). Higher values of σ would give values of compact states of proteins that are less than the native state R₉.

4. DISCUSSION

We have shown that polymer chains with specific interactions, like proteins (but ones without a unique native state), become compact as the strength of the specific interaction changes. A clear implication is that the size of the U_D state should decrease continuously as C decreases. In other words, the unfolded state under folding conditions is more compact than it is at high denaturant concentrations. Compaction is driven roughly by the same mechanism as the collapse transition in homopolymers in the sense that when the solvent quality is poor (below C_m) the size of the unfolded state decreases continuously. When the set of specific interactions is taken from protein native contacts in the PDB, our theory shows that the values of k_θ are in the range expected for interaction between amino acids in proteins. This implies that collapsibility should be a universal feature of foldable proteins but the extent of compaction varies greatly depending on the architecture in the folded state. This is manifested in our finding that proteins dominated by β-sheets are more collapsible compared to those with α-helical structures.

Magnitude of k_θ and plausible route to multi-domain formation: The scaling of k_θ with N allows us to provide arguments for the emergence of multi-domain proteins. In Eqs. (13) or (14) attractive (k-) and repulsive (v-) terms have the same structure. The only difference in their scaling with N is due to the difference in the sums (over all the monomers in the repulsive term and over native contacts in the attractive term). Double summation over all the monomers gives a factor of N² to the repulsive term. The summation over native contacts in the attractive term scales as N_nc. Therefore, to compensate for the repulsion, N_nc should scale as N². However, for a given protein with a certain length N and certain numbers of contacts, it is not clear how the denominator in Eq. (14) scales with N. Empirically we find N_nc(N) dependence across a representative set of sequences scales as N^γ with γ at most ≈ 1.3 (Appendix A). Thus, it follows from Eq. (14) that k_θ increases without bound as N continues to increase. Because this is unphysical, it would imply that proteins whose lengths exceeds a threshold value N_C cannot become maximally compact even at C = 0. An instability must ensue when N exceeds N_c. This argument in part explains why single domain proteins are relatively small [70].

Scaling of N_nc as a power law in N^γ means that as the protein size grows, the value of k_θ will deviate more and more from those found in globular proteins, implying such proteins cannot be globally compact under physiologically relevant conditions. However, such an instability is not a problem because larger proteins typically consist of multiple domains. Thus, if the protein does not show collapse as a whole, the individual domains could fold independently, having lower values of k_θ for each domain of the multi-domain protein. It would be interesting to know if the predicted onset of instability at N_C provides a quantitative way to assess the mechanism of formation of multi-domain proteins. Extension of the theory might yield interesting patterns in the assembly of multi-domain proteins. For instance, one can quantitatively ascertain if the N-terminal domains of large proteins, which emerge from the ribosome first, have higher collapsibility (lower k_θ) than C-terminal domains.

SAXS-smFRET controversy resolved: Our theory resolves, at least theoretically, the contradictory results using SAXS and FRET experiments on compaction of small globular proteins. It has been argued, based predominantly using SAXS experiments on protein-L (N = 72) that R_g of U_D and U_C states are virtually the same at denaturant concentrations that are less than C_m [19]. This conclusion is not only at variance with SAXS experiments on other proteins but also with interpretation of smFRET data on a number of proteins. The present work, surveying over 2300 proteins, shows that the compact state has to exist, engendered by mechanisms that have much in common with homopolymer collapse. For protein-L, the k_θ = 1.7k_BT, a very typical value, is right on the peak of the heat map in Fig.(1). We have previously argued that because the change in R_g between the U_D and U_C states for small proteins is not large, high precision experiments are needed to measure the predicted changes in R_g between U_C and U_D. For protein-L the change is less than 10% [71], making its detection in ensemble experiments very difficult. Similar conclusions were reached in recent experiments [20]. A clear message from our theory is that, tempting as it may be, one cannot draw universal conclusions about polypeptide compaction by performing experiments on just a few proteins. One has to survey a large number of proteins with varying N and native topology to quantitatively assess the extent of compaction. Our theory provides a framework for interpreting the results of such experiments.

random contact maps, local and non-local contacts: In order to differentiate collapsibility between evolved and random proteins, we created twelve random contact maps keeping the total number of contacts the same as in protein-L (see Fig.(7) for examples). For each of these pseudo-proteins we calculated k_θ using Eq. (14). We find that for all the random contact maps the k_θ values are less than for protein-L, implying that the propensity of the pseudo-proteins to become compact is greater than for the wild type. This finding is in accord with studies based on homopolymer and heteropolymer collapse with random crosslinks. These studies showed that the polymer undergoes a collapse transition as the density of crosslinks is increased [45, 47, 48]. Of particular note is the demonstration by Camacho and Schanke [50], who showed using exact enumeration of random heteropolymers and scaling arrangements that the collapse can be either a first or second order transition depending on the fraction of hydrophobic residues [50].

FIG. 7:

Collapsibility for synthetic contact maps. Two representative contact maps for each category are shown in the upper left and lower right of each square. Given the number of residues N = 72 and total number of contacts N_nc = 185 (same as protein-L), residue pairs (i,j) are randomly chosen to satisfy the following conditions: (a) uniformly distributed, |i − j| > 3; (b) local contacts only |i − j| ≥ 3 and |i−j| ≥ 8;, and (c) non-local contacts only |i−j| > 8. The calculated values of k_θ are explicitly shown. The k_θ value for protein-L is 1.7k_BT.(d) Distribution of the fraction of non-local contacts in the 2306 proteins. For each protein, the fraction is calculated as the number of non-local contacts divided by the total number of contacts (N_nc). A contact between residues i and j is “non-local (NL)” if |i − j| ≥ 8. There is a clear separation in this distribution for proteins rich in α helices compared to those that are rich in β-sheets implying that the latter are more collapsible than the former.

Some time ago Abkevich et al. [72] showed, using Monte Carlo simulations of proteinlike lattice polymers, that the folding transition in proteins with predominantly non-local contacts was first order like, which is not the case for proteins in which local contacts dominate. In light of this finding, it is interesting to examine how compaction is affected by local and non-local contacts. We created for N=72 (protein-L) a contact map with 185 (same number as with WT protein-L), predominantly local contacts (Fig.(7b)). The values of k_θ for these pseudo-proteins is considerably larger than for the WT, implying that proteins dominated by local contacts are minimally collapsible. We repeated the exercise by creating contact maps with predominantly non-local contacts (Fig.(7c)). Interestingly, k_θ values in this case are significantly less than for the WT. This finding explains why in proteins with varied α/β topology there is a balance between the number of local and non-local contacts. Such a balance is needed to achieve native state stability and speed of folding [72] with polypeptide compaction playing an integral part [26].

Based on these findings we conclude that R_g of the unfolded states of proteins dominated by non-local contacts must undergo greater compaction compared to those with that have mostly local contacts. The results in Fig. (2) also show that proteins rich in β-sheet are more collapsible than predominantly α-helical proteins. It follows that β-sheet proteins must have a larger fraction of non-local contacts than proteins rich in α-helices. In Fig. (7d) we plot the distribution of the fraction of non-local contacts for the 2306 proteins. Interestingly, there is a clear separation in the distribution of non-local contacts between α-helical rich and β-sheet rich proteins. The latter have substantial fraction of non-local contacts which readily explains the findings in Fig. (7c) and the predictions in Fig. (2).

5. CONCLUSIONS

We have created a theory to assess collapsibility of proteins using a combination of analytical modeling and simulations. The major implications of the theory are the following. (i) Because single domain proteins are small, the changes in the radius of gyration of the unfolded states as the denaturant concentration is lowered are often small. Thus, it has been difficult to detect the R_g changes using SAXS experiments in a couple of proteins, raising the question if unfolded polypeptide chains become compact below C_m. Here, we have solved this long-standing problem showing that the unfolded states of single-domain proteins do become compact as the denaturant concentration decreases, sharing much in common with the physical mechanisms governing homopolymer collapse. By adopting concepts from polymer physics, and using the contact maps that reflect the topology of the native states, we established that proteins are collapsible. Simulations using models that describe the unfolded states of proteins reasonably well further confirm the conclusions based on theory. (ii) Based on a survey of over two thousand proteins we surmise that there is evolutionary pressure for collapsibility is universal although the extent of collapse can vary greatly, because this ensures that the propensity to aggregate is minimized even if environmental fluctuations under cellular conditions transiently populate unfolded states. Two factors contribute to aggregation. First, the rate of dimer formation by diffusion controlled reaction would be enhanced if a pair of U_D rather than U_C molecules collided due cellular stress because the contact radius in the former would be greater than in the latter. Second, the fraction of exposed hydrophobic resides in U_D is much greater than in U_C, thus greatly increasing the probability of aggregation. The second factor is likely to be more important than the first. Consequently, transient population of U_C due to cellular stress minimizes the probability of aggregation. (iii) We have also shown that the position of the residues forming the native contact greatly influences the collapsibility of β sheet proteins (containing a number of non-local contacts showing greater compaction than α helical proteins, which are typically stabilized by local contacts.

Our theory also shows that most RNAs may have evolved to be compact in their natural environments. Although the evolutionary pressure to be compact is likely to be substantial for viral RNAs [64, 65, 68, 73], it is apparent that even non-coding RNAs are also likely to be almost maximally compact in their natural environments. Our theory suggests that, to a large extent, collapsibility of RNA is similar to proteins with β-sheet structures. Both classes of biological macromolecules are stabilized by non-local contacts. Interestingly, it has been argued that the need to be compact (“Compaction selection hypothesis” [73]) could be a major determinant for evolved biopolymers to have minimum energy compact structures as their ground states.

Acknowledgements:

This work was supported by a grant from the National Science Foundation (CHE 16-36424). We acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing resources for the simulations.

Appendix A: Collapse of homopolymers:

The theory described for protein collapse resulting in Eq. (14) is general and applicable to the collapse of homopolymers as well. We show in this Appendix that the ES formalism can be used to derive the scaling of k_θ with N, the number of monomers.

Consider a homopolymer with the following Hamiltonian: where r(s) is the position of the monomer s, and a₀ is the monomer size. The first term in Eq. (A1) accounts for chain connectivity, and the second term represents volume interactions and favorable interactions between monomers, given by V_H(r(s)),

The form of V_H(r(s)) is exactly the same as in Eq. 2 except in the above equation all monomers interact favorably as long as self-avoidance is not violated whereas in Eq. (2) attractive interactions depend on the topology of the protein. The first (second) term in Eq. (A2) describes non-specific excluded volume (attractive) interactions. Thus, the model in Eq. (A1) describes the behavior in good solvents (k = 0) as well as the transition point at which there is a transition to the collapsed state. For the excluded volume repulsion, the range of interactions is on the order of the size of the monomer α₀ and for attractive interactions, the range is σ. In good solvents, with υ > 0, the polymer swells with R_g ~ aN^ν (ν ≈ 0.6). In poor solvents (υ < 0), the polymer undergoes a coil-globule transition with . These are the well-known Flory laws.

Following the ES method described in the main text, we arrive at the self-consistent equation for a for the homopolymer chain,

To obtain an expression for the θ-point we derive the condition for homopolymer collapse instead of solving the complicated Eq. (A3) numerically. The volume interactions are on the right hand side of Eq. (A3). At the θ-point, the υ-term should exactly balance the k-term arising from attractive interaction between the monomers. Since at the θ-point the chain is ideal with a = a₀, we can substitute this value for a in the sums in the denominators of the υ- and k-terms, to obtain an expression for k_θ. Thus, from Eq. (A3), the specific interaction strength at which two-body repulsion (υ-term) equals two-body attraction (k-term) is: The expression for k_θ in Eq. (A4) for homopolymers differs from k_θ (Eq. (14)) for proteins only by the term in the denominator. The sum over specific interactions for proteins is replaced by the non-specific interaction in Eq. (A4). It can be shown that the N dependence is the same in both the numerator and denominator in Eq. (A4). Therefore, to leading order in 𝒲, k_θ is independent of N for a homopolymer.

In order to derive the scaling of k_θ with N, we need to analyze the corrections arising from second order in 𝒲. To second order in 𝒲, the radius of gyration is, In the expression only the contribute to k_θ. Here, 𝒲₁ is the same as Eq. (5), and 𝒲₂ is given by Eq. A2. The terms associated with 𝒲₁ are zero at the θ-transition point. By counting the powers of N it follows that scales as and scales as . Hence, at the θ-point, we find that k_θ satisfies the following quadratic equation, in the large N limit. The scaling law for k_θ (∝ T_θ) obtained first by Flory [58], was confirmed using simulations much later [74]. To our knowledge this is the first microscopic derivation of the result. Thus, our general formalism can be applied to describe collapse of homopolymers as well as proteins and RNA.

Proteins: The results for homopolymers given above may be extended to obtain the N dependence of k_θ for proteins. By considering the second order correction to the radius of gyration, we obtain the following quadratic equation for k_θ, In deriving the above equation we assume that total number of contacts N_nc ~ N^γ. A plot of N_nc as a function of N (Fig. (8e)) for the PDBselect proteins confirms that this is indeed the case. For γ = 1.3, k_θ ~ N^0.9, which shows that larger proteins are less collapsible than smaller ones, implying that when N exceeds a critical value they are likely to form multi-domain structures. Comparison of Eqs. (A6) and (A7) shows that collapsibility in proteins and homopolymers differs dramatically. For homopolymers the coil-to-globule transition occurs at a finite temperature. The sharpness of the transition increases as N increases. In sharp contrast, the growth of k_θ with N for proteins (Eq. (A7)) implies that larger proteins must organize themselves into domains with individual domains forming compact structures.

FIG. 8:

(a) k_θ values for list of proteins with varying R_c with fixed σ = 0.63 nm. (b) k_θ values for list of proteins with varying σ with fixed R_c = 0.8 nm. (c and d) Comparison of potentials used in the theory and simulations with (c) k=0 and (d) k=3. In the theory Gaussian potentials are used for both non-specific repulsion (black broken line) and specific attraction (thick red-broken line). In the simulations, potentials have hard core repulsion (black line) and WCA-type attraction (thick red line). In both the theory and simulations, the depth of the attraction potential changes depending on values of k, whereas the repulsive part does not. The use of soft potentials in the theory results in larger k_θ than in simulations (Fig. 5). (e) The dependence of the number of contacts, N_nc, as a function of N for the PDBselect proteins. α-rich and β-rich proteins are colored in blue and red, respectively. The green line is a fit using N_nc = 0.6N^γ with γ = 1.3. A plot for 21 proteins used in the simulations is shown in the inset.

Appendix B: Simulations

The theoretical results were obtained using a set of approximations, whose validity need to be confirmed using simulations. The purpose of these simulations is to show that the predicted theoretical values of k_θ correlate well with simulation results. We performed Langevin dynamics simulations for 21 globule proteins (Fig. (5)). The set includes both all-α and all-β proteins as well as α + β and α/β proteins according to Structural Classification Of Proteins (SCOP).

The simple form (sum of Gaussians) of the interaction energy in Eq. (2) was devised in order to obtain analytic expression for k_θ so that collapsibility of two thousand or more proteins could be easily analyzed. The potential in Eq. (2) has no hard core, which is physically not realistic. Because of the soft interactions it is clear that the theoretical values of k_θ have to be an upper bound. In order to firmly establish the qualitative predictions obtained using theory we use a realistic interaction energy in the simulations. The potential function in the simulations is, where

The first term, describing chain connectivity, the is discrete version of the first term in Eq. (1) with a₀ = 0.38 nm. The second term accounts for excluded volume interactions used for any pair of residues not included in the contact map. We chose ε_υ = 1.0 kcal/mol so that monomer particles do not overlap with each other. In this crucial respect, the potential function is drastically different from the interaction potential used in the theory, in which the Gaussian-type soft core potential was used in order to solve the problem analytically.

The summation in the last term in Eq. (B1) runs over all pairs in the contact map. The potential, Φ_WCA, is the Weeks-Chandler-Andersen potential [75], a variant of Lenard-Jones potential, consisting of well-separated repulsive and attractive terms (Fig. 8(c), (d)). This is necessary in order to vary the strength of the attraction potential without affecting the repulsive interactions. The coefficient of the attractive term is ε_k = k · k_BT. We varied k between 0.0 and 5.0 to find the collapse-transition point, k = k_θ. The contact distance is the same as in the theory, σ = 0.63 nm.

For each protein and k value, we generated 100 independent simulation trajectories. Initial conformations were generated in a preliminary simulation at high temperature T = 400 K with k = 0. Each production run at T = 300 K lasts for 10⁸ steps. We discarded the first 2 × 10⁷ steps in analyzing the data. Conformations are sampled every 10⁴ steps. In total, 8 × 10⁵ conformations were sampled to calculate the average radius of gyration, 〈R_g〉 for each k.

References

[1].↵
F. M. Richards. The interpretation of protein structures: Total volume, group volume distributions and packing density. J. Mol. Biol., 82:1–14, 1974.
OpenUrl CrossRef PubMed Web of Science
[2].
F. M. Richards and W. A. Lim. An anlysis of packing in the protein-folding problem. Q. Rev. Biophys., 26:423–498, 1994.
OpenUrl Web of Science
[3].↵
J. L. Finney. Volume occupation, environment and accessibility in proteins. the problem of the protein surface. J. Mol. Biol., 96:721–732, 1975.
OpenUrl CrossRef PubMed Web of Science
[4].↵
W. A. Lim and R. T. Sauer. Alternative packing arrangements in the hydrophobic core of λ-repressor. Nature, 339:31–36, 1989.
OpenUrl CrossRef PubMed Web of Science
[5].↵
J. Liang and K. A. Dill. Are proteins well-packed? Biophys. J., 81:751–756, 2001.
OpenUrl CrossRef PubMed Web of Science
[6].↵
S. Bromberg and K. A. Dill. Side-chain entropy and packing in proteins. Protein Sci., 3(7):997⌃1009, 1994.
OpenUrl CrossRef PubMed Web of Science
[7].↵
R. I. Dima and D. Thirumalai. Asymmetry in the shapes of folded and denatured states of proteins. J. Phys. Chem. B, 108:6564–6570, 2004.
OpenUrl CrossRef
[8].↵
J. E. Kohn, I. S. Millett, J. Jacob, B. Azgrovic, T. M. Dillon, N. Cingel, R. S. Dothager, S. Seifert, P. Thiyagarajan, T. R. Sosnick, M. Z. Hasan an V. S. Pande, I. Ruczinski, S. Doniach, and K. W. Plaxco. Random-coil behavior and the dimensions of chemically unfolded proteins. Proc. Natl. Acad. Sci. USA, 101:12491–12496, 2004.
OpenUrl Abstract/FREE Full Text
[9].↵
E. Sherman and G. Haran. Coil-globule transition in the denatured state of a small protein. Proc. Natl. Acad. Sci., 103:11539–11543, 2006.
OpenUrl Abstract/FREE Full Text
[10].↵
G. Haran. How, when and why proteins collapse: the relation to folding. Curr. Opin. Struct. Biol, 22:14–20, 2012.
OpenUrl CrossRef PubMed
[11].↵
I. M. Lifshitz, A. Yu. Grosberg, and A. R. Khokhlov. Some problems of statistical physics of polymer-chains with volume interaction. Rev. Mod. Phys., 50:683–713, 1978.
OpenUrl CrossRef Web of Science
[12].↵
A. Yu. Grosberg and A. R. Khokhlov. Statistical Physics of Macromolecules. AIP Press, 1994.
[13].↵
O. B. Ptitsyn. How molten is the molten globule? Nat. Struct. Biol., 3:488–490, 1996.
OpenUrl CrossRef PubMed Web of Science
[14].
D. Thirumalai. From minimal models to real proteins: Time scales for protein folding kinetics. J. Phys. I (Fr.), 5:1457–1467, 1995.
OpenUrl
[15].
O. B. Ptitsyn and V. N. Uversky. The molten globule is a third thermodynamical state of protein molecules. FEBS Lett., 341(1):15–18, 1994.
OpenUrl CrossRef PubMed Web of Science
[16].
F. Ding, W. Guo, N. V. Dokholyan, E. I. Shakhnovich, and J. E. Shea. Reconstruction of the src-SH3 protein domain transition state ensemble using multiscale molecular dynamics simulations. J. Mol. Biol., 350:1035–1050, 2005.
OpenUrl CrossRef PubMed Web of Science
[17].↵
H. T. Tran, X. Wang, and R. V. Pappu. Reconciling observations of sequence-specific confor-mational propensities with the generic polymeric behavior of denatured proteins. Biochemistry, 44, 2005.
[18].↵
G. Ziv, D. Thirumalai, and G. Haran. Collapse transition in proteins. Phys. Chem. Chem. Phys., 11(1):83–93, 2009.
OpenUrl CrossRef PubMed Web of Science
[19].↵
T. Y. Yoo, S. P. Meisburger, J. Hinshaw, L. Pollack, G. Haran, T. R. Sosnick, and K. Plaxco. Small-angle X-ray scattering and single-molecule FRET spectroscopy produce highly divergent views of the low-denaturant unfolded state. J. Mol. Biol., 418:226–236, 2012.
OpenUrl CrossRef PubMed
[20].↵
A. Borgis, W. Zheng, K. Buholzer, M. B. Borgia, A. Schüler, H. Hofmann, A. Soranno, D. Net-tels, K. Gast, A. Grishaev, R. B. Best, and B. Schuler. Consistent view of polypeptide chain expansion in chemical denaturants from multiple experimental methods. Submitted to J. Am. Chem. Soc., 2016.
[21].↵
B. Schuler and W. A. Eaton. Protein folding studied by single-molecule FRET. Curr. Opin. Struct. Biol., 18(1):16–26, 2008.
OpenUrl CrossRef PubMed Web of Science
[22].↵
P. G. De Gennes. Kinetics of collapse for a flexible coil. J. Phys. Lett. (Fr.), 46(14):639–642, 1985.
OpenUrl
[23].↵
A. Y. Grosberg, S. K. Nechaev, and E. I. Shakhnovich. The role of topological constraints in the kinetics of collapse of macromolecules. J. de Physique, 49:2095–2100, 1988.
OpenUrl
[24].↵
C. J. Camacho and D. Thirumalai. Minimum energy compact structures of random sequences of heteropolymers. Phys. Rev. Lett., 71:2505, 1993.
OpenUrl CrossRef PubMed Web of Science
[25].↵
Z. Liu, G. Reddy, and D. Thirumalai. Folding PDZ2 domain using the molecular transfer model. J. Phys. Chem. B, 2016. in press.
[26].↵
C. J. Camacho and D. Thirumalai. Kinetics and thermodynamics of folding in model proteins. Proc. Natl. Acad. Sci. USA, 90:6369–6372, 1993
OpenUrl Abstract/FREE Full Text
[27].
M. S. Li, D. K. Klimov, and D. Thirumalai. Finite size effects on thermal denaturation of globular proteins. Phys. Rev. Lett., 93:268107, 2004.
OpenUrl CrossRef PubMed
[28].↵
K. A. Dill and D. Shortle. Denatured states of proteins. Ann. Rev. Biochem., 60(1):795–825, 1991.
OpenUrl CrossRef PubMed Web of Science
[29].↵
S. Akiyama, S. Takahashi, T. Kimura, K. Ishimori, I. Morishima, Y. Nishikawa, and T. Fu-jisawa. Conformational landscape of cytochrome c folding studied by microsecond-resolved small-angle X-ray scattering. Proc. Natl. Acad. Sci. USA, 99(3):1329–1334, 2002.
OpenUrl Abstract/FREE Full Text
[30].
T. Kimura S. Akiyama T. Uzawa K. Ishimori I. Morishima T. Fujisawa and S. Takahashi. Specifically collapsed intermediate in the early stage of the folding of ribonuclease A. J. Mol. Biol., 350(2):349–362, 2005.
OpenUrl CrossRef PubMed
[31].↵
R. R. Goluguri and J. B. Udgaonkar. Microsecond rearrangements of hydrophobic clusters in an initially collapsed globule prime structure formation during the folding of a small protein. J. Mol. Biol., 428(15):3102–3117, 2016.
OpenUrl
[32].↵
S. F. Edwards. The statistical mechanics of polymers with excluded volume. Proc. Phys. Soc., 85(4):613–624, 1965.
OpenUrl CrossRef
[33].↵
C. J. Camacho and D. Thirumalai. Modeling the role of disulfide bonds in protein folding: Entropic barriers and pathways. Proteins, 22:27–40, 1995.
OpenUrl CrossRef PubMed Web of Science
[34].↵
Klimov D. K. and D. Thirumalai. Multiple protein folding nuclei and the transition state ensemble in two-state proteins. Proteins, 43:465–475, 2001.
OpenUrl CrossRef PubMed Web of Science
[35].↵
R. B. Best, Hummer G., and W. A. Eaton. Native contacts determine protein folding mechanisms in atomistic simulations. Proc. Natl. Acad. Sci., 110:17874–17879, 2013.
OpenUrl Abstract/FREE Full Text
[36].↵
S. Yasuda T. Yoshidome H. Oshima R. Kodama Y. Harano and M. Kinoshita. Effects of side-chain packing on the formation of secondary structures in protein. J. Chem. Phys., 132:065105, 2010.
OpenUrl PubMed
[37].
A. Maritan C. Micheletti A. Trovato and J. R. Banavar. Optimal shapes of compact strings. Nature, 406:287, 2000.
OpenUrl CrossRef PubMed Web of Science
[38].
J. E. Magee, V. R. Vasquez, and L. Lue. Helical structures from an isotropic homoplymer model. Phys. Rev. Lett., 96:207802, 2006.
OpenUrl CrossRef PubMed
[39].
T. Skrbic T. X. hoang, and A. giacometti. Effective stiffness and formation of secondary structures in a protein-like model. J. Chem. Phys., 145:084905, 2016.
OpenUrl
[40].↵
Y. Snir and R. D. Kamien. Entropically driven helix formation. Science 307:1067, 2005.
OpenUrl Abstract/FREE Full Text
[41].
A. Craig and E. M. Terentjev. Auxiliary field theory of polymers with intrinsic carvature. Macromolecules, 39:4557–4565, 2006.
OpenUrl
[42].↵
C. Cardelli V. Binaco L. Rovigatti F. Nerattini L. tubiana C. Dellago and I. Coluzza. Universal criterion for designability of heteropolymers. arXiv:1606.05253v1, 2016.
[43].↵
Kudlay A., Cheung M. S., and D. Thirumalai. Crowding effects on the structural transitions in a flexible helical homopolymer. Phys. Rev. Lett., 102:118101, 2009.
OpenUrl CrossRef PubMed
[44].↵
A. M. Gutin and E. I. Shakhnovich. Statistical mechanics of polymers with distance constraints. J. Chem. Phys., 100:5920, 1994.
OpenUrl
[45].↵
J. D. Bryngelson and D. Thirumalai. Internal constraints induce localization in an isolated polymer molecule. Phys. Rev. Lett., 76:542, 1996.
OpenUrl PubMed
[46].
D. Thirumalai V. Ashwin and J. K. Bhattacharjee. Dynamics of random hydrophobic-hydrophilic copolymers with implications for protein folding. Phys. Rev. Lett., 77:5385, 1996.
[47].↵
Y. Kantor and M. Kardar. Collapse of randomly linked polymers. Phys. Rev. Lett., 77:4275, 1996.
OpenUrl PubMed
[48].↵
R. Zwanzig. Effect of close contacts on the radius of gyration of a polymer. J. Chem. Phys., 106:2824, 1997.
OpenUrl
[49].
C. J. Camacho and D. Thirumalai. A criterion that determines fast folding of proteins: A model study. Europhys. Lett., 35:627–632, 1996.
OpenUrl
[50].↵
C. J. Camacho and T. Schanke. From collapse to freezing in random heteropolymers. Europhys. Lett., 37(9):603, 1997.
OpenUrl
[51].↵
R. T. Deam and S. F. Edwards. The theory of rubber elasticity. Phil. Trans. R. Soc. A, 280(1296):317–353, 1976.
OpenUrl CrossRef
[52].
P. Goldbart and N. Goldenfeld. Rigidity and ergodicity of randomly cross-linked macro-molecules. Phys. Rev. Lett., 58(25):2676, 1987.
OpenUrl CrossRef PubMed Web of Science
[53].↵
P. Goldbart and N. Goldenfeld. Microscopic theory for cross-linked macromolecules. I. Broken symmetry, rigidity, and topology. Phys. Rev. A, 39(3):1402, 1989.
OpenUrl PubMed
[54].↵
H. E. Castillo, P. M. Goldbart, and A. Zippelius. Distribution of localisation lengths in ran-domly crosslinked macromolecular networks. Europhys. Lett., 28:519, 1994.
OpenUrl
[55].↵
S. F. Edwards and P. Singh. Size of a polymer molecule in solution. Part 1. Excluded volume problem. J. Chem. Soc. Faraday Trans., 75:1001–1019, 1979.
OpenUrl
[56].↵
D. Thirumalai. Isolated polymer molecule in a random environment. Phys. Rev. A, 37:269, 1988.
OpenUrl CrossRef PubMed
[57].↵
Duplantier B. Tricritical disorder transition of polymers in a cloudy solvent: Annealed randomness. Phys. Rev. A, 38:3647, 1988.
OpenUrl PubMed
[58].↵
P. L. Flory. Principles of polymer chemistry. Cornell University Press, 1986.
[59].↵
S. Griep and U. Hobohm. PDBselect [1992–2009 and PDBfilter-select. Nuclic Acids Res., 38:D318–D319, 2009.
OpenUrl CrossRef PubMed Web of Science
[60].↵
J. D. Honeycutt and D. Thirumalai. The nature of folded states of globular proteins. Biopoly-mers, 32:695–709, 1992.
OpenUrl CrossRef PubMed Web of Science
[61].↵
Dynamic visualization of the collapsibility of proteins in PDB is publicly available at https://sites.cns.utexas.edu/thirumalai/supplements. Pointing to each dot gives all the characteristics of a given protein.
[62].↵
D. Thirumalai and C. Hyeon. RNA and protein folding: Common themes and variations. Biochemistry, 44(13):4957–4970, 2005.
OpenUrl CrossRef PubMed Web of Science
[63].↵
C. Hyeon R. I. Dima, and D. Thirumalai. Size, shape and flexibility of RNA structures. J. Chem. Phys., 125:194905, 2006.
OpenUrl CrossRef PubMed
[64].↵
A. M. Yoffe, P. Prinsen A. Gopal C. M. Knobler, W. M. Gelbart, and A. Ben-Shaul. Predicting the sizes of large RNA molecules. Proc. Natl. Acad. Sci. USA, 105(42):16153, 2008.
OpenUrl Abstract/FREE Full Text
[65].↵
L. T. Fang, W. M. Gelbart, and A. Ben-Shaul. The size of RNA as an ideal branched polymer. J. Chem. Phys., 135(15):155105, 2011.
OpenUrl CrossRef PubMed
[66].↵
J. H. Roh, L. Guo J. D. Kilburn, R. M. Briber, T. Irving and S. A. Woodson. Multistage Collapse of a Bacterial Ribozyme Observed by Time-Resolved Small-Angle X-ray Scattering. J. Am. Chem. Soc., 132(29):10148–10154, 2010.
OpenUrl CrossRef PubMed Web of Science
[67].↵
R. Behrouzi J. H. Roh, D. Kilburn R. M. Briber, and S. A. Woodson. Cooperative tertiary interaction network guides RNA folding. Cell, 149:348–357, 2012.
OpenUrl CrossRef PubMed Web of Science
[68].↵
A. Gopal D. E. Egecioglu, A. M. Yoffe, A. Ben-Shaul, A. L. N. Rao, C. M. Knobler, and W. M. Gelbart. Viral RNAs are unusually compact. PLoS One, 9(9):e105875, 2014.
OpenUrl CrossRef PubMed
[69].↵
VP Denisov, BH Jonsson, and B Halle. Hydration of denatured and molten globule proteins. Nature Strutural & Mol. Biol., 6(3):253–260, 1999.
OpenUrl
[70].↵
H. J. Bussemaker, D. Thirumalai and J. K. Bhattacharjee. Thermodynamic stability of folded proteins against mutations. Phys. Rev. Lett., 79:3530–3533, 1997.
OpenUrl CrossRef Web of Science
[71].↵
H. Maity and G. Reddy. Folding of protein l with implications for collapse in the denatured state ensemble. J. Am. Chem. Soc., 2016.
[72].↵
V. I. Abkevich, A. M. Gutin, and E. I. Shakhnovich. Impact of local and non-local interactions on thermodynamics and kinetics of protein folding. J. Mol. Biol., 252(4):460–471, 1995.
OpenUrl CrossRef PubMed Web of Science
[73].↵
L. Tubiana A. L. Božič, C. Micheletti and R. Podgornik. Synonymous mutations reduce genome compactness in icosahedral ssRNA viruses. Biophys. J., 108(1):194–202, 2015.
OpenUrl CrossRef
[74].↵
N. B. Wilding, M. Muller and K. Binder. Chain length dependence of the polymer-solvent critical point parameters. J. Chem. Phys., 105:2, 1996.
OpenUrl
[75].↵
John D Weeks, David Chandler, and Hans C Andersen. Role of repulsive forces in determining the equilibrium structure of simple liquids. J. Chem. Phys., 1971.
[76].↵
C. Hyeon and D. Thirumalai. Mechanical unfolding of RNA: From hairpins to structures with internal multiloops. Biophys. J., 92(3):731–743, 2007.
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted December 12, 2016.

Download PDF

Citation Tools

Subject Area

Biophysics

Subject Areas

All Articles

Animal Behavior and Cognition (5201)
Biochemistry (11715)
Bioengineering (8723)
Bioinformatics (29128)
Biophysics (14935)
Cancer Biology (12049)
Cell Biology (17359)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14144)
Epidemiology (2067)
Evolutionary Biology (18268)
Genetics (12221)
Genomics (16767)
Immunology (11843)
Microbiology (28014)
Molecular Biology (11560)
Neuroscience (60810)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4940)
Plant Biology (10384)
Scientific Communication and Education (1680)
Synthetic Biology (2878)
Systems Biology (7333)
Zoology (1642)

[1] [1].↵
F. M. Richards. The interpretation of protein structures: Total volume, group volume distributions and packing density. J. Mol. Biol., 82:1–14, 1974.
OpenUrl CrossRef PubMed Web of Science

[2] [2].
F. M. Richards and W. A. Lim. An anlysis of packing in the protein-folding problem. Q. Rev. Biophys., 26:423–498, 1994.
OpenUrl Web of Science

[3] [3].↵
J. L. Finney. Volume occupation, environment and accessibility in proteins. the problem of the protein surface. J. Mol. Biol., 96:721–732, 1975.
OpenUrl CrossRef PubMed Web of Science

[4] [4].↵
W. A. Lim and R. T. Sauer. Alternative packing arrangements in the hydrophobic core of λ-repressor. Nature, 339:31–36, 1989.
OpenUrl CrossRef PubMed Web of Science

[5] [5].↵
J. Liang and K. A. Dill. Are proteins well-packed? Biophys. J., 81:751–756, 2001.
OpenUrl CrossRef PubMed Web of Science

[6] [6].↵
S. Bromberg and K. A. Dill. Side-chain entropy and packing in proteins. Protein Sci., 3(7):997⌃1009, 1994.
OpenUrl CrossRef PubMed Web of Science

[7] [7].↵
R. I. Dima and D. Thirumalai. Asymmetry in the shapes of folded and denatured states of proteins. J. Phys. Chem. B, 108:6564–6570, 2004.
OpenUrl CrossRef

[8] [8].↵
J. E. Kohn, I. S. Millett, J. Jacob, B. Azgrovic, T. M. Dillon, N. Cingel, R. S. Dothager, S. Seifert, P. Thiyagarajan, T. R. Sosnick, M. Z. Hasan an V. S. Pande, I. Ruczinski, S. Doniach, and K. W. Plaxco. Random-coil behavior and the dimensions of chemically unfolded proteins. Proc. Natl. Acad. Sci. USA, 101:12491–12496, 2004.
OpenUrl Abstract/FREE Full Text

[9] [9].↵
E. Sherman and G. Haran. Coil-globule transition in the denatured state of a small protein. Proc. Natl. Acad. Sci., 103:11539–11543, 2006.
OpenUrl Abstract/FREE Full Text

[10] [10].↵
G. Haran. How, when and why proteins collapse: the relation to folding. Curr. Opin. Struct. Biol, 22:14–20, 2012.
OpenUrl CrossRef PubMed

[11] [11].↵
I. M. Lifshitz, A. Yu. Grosberg, and A. R. Khokhlov. Some problems of statistical physics of polymer-chains with volume interaction. Rev. Mod. Phys., 50:683–713, 1978.
OpenUrl CrossRef Web of Science

[12] [12].↵
A. Yu. Grosberg and A. R. Khokhlov. Statistical Physics of Macromolecules. AIP Press, 1994.

[13] [13].↵
O. B. Ptitsyn. How molten is the molten globule? Nat. Struct. Biol., 3:488–490, 1996.
OpenUrl CrossRef PubMed Web of Science

[14] [14].
D. Thirumalai. From minimal models to real proteins: Time scales for protein folding kinetics. J. Phys. I (Fr.), 5:1457–1467, 1995.
OpenUrl

[15] [15].
O. B. Ptitsyn and V. N. Uversky. The molten globule is a third thermodynamical state of protein molecules. FEBS Lett., 341(1):15–18, 1994.
OpenUrl CrossRef PubMed Web of Science

[16] [16].
F. Ding, W. Guo, N. V. Dokholyan, E. I. Shakhnovich, and J. E. Shea. Reconstruction of the src-SH3 protein domain transition state ensemble using multiscale molecular dynamics simulations. J. Mol. Biol., 350:1035–1050, 2005.
OpenUrl CrossRef PubMed Web of Science

[17] [17].↵
H. T. Tran, X. Wang, and R. V. Pappu. Reconciling observations of sequence-specific confor-mational propensities with the generic polymeric behavior of denatured proteins. Biochemistry, 44, 2005.

[18] [18].↵
G. Ziv, D. Thirumalai, and G. Haran. Collapse transition in proteins. Phys. Chem. Chem. Phys., 11(1):83–93, 2009.
OpenUrl CrossRef PubMed Web of Science

[19] [19].↵
T. Y. Yoo, S. P. Meisburger, J. Hinshaw, L. Pollack, G. Haran, T. R. Sosnick, and K. Plaxco. Small-angle X-ray scattering and single-molecule FRET spectroscopy produce highly divergent views of the low-denaturant unfolded state. J. Mol. Biol., 418:226–236, 2012.
OpenUrl CrossRef PubMed

[20] [20].↵
A. Borgis, W. Zheng, K. Buholzer, M. B. Borgia, A. Schüler, H. Hofmann, A. Soranno, D. Net-tels, K. Gast, A. Grishaev, R. B. Best, and B. Schuler. Consistent view of polypeptide chain expansion in chemical denaturants from multiple experimental methods. Submitted to J. Am. Chem. Soc., 2016.

[21] [21].↵
B. Schuler and W. A. Eaton. Protein folding studied by single-molecule FRET. Curr. Opin. Struct. Biol., 18(1):16–26, 2008.
OpenUrl CrossRef PubMed Web of Science

[22] [22].↵
P. G. De Gennes. Kinetics of collapse for a flexible coil. J. Phys. Lett. (Fr.), 46(14):639–642, 1985.
OpenUrl

[23] [23].↵
A. Y. Grosberg, S. K. Nechaev, and E. I. Shakhnovich. The role of topological constraints in the kinetics of collapse of macromolecules. J. de Physique, 49:2095–2100, 1988.
OpenUrl

[24] [24].↵
C. J. Camacho and D. Thirumalai. Minimum energy compact structures of random sequences of heteropolymers. Phys. Rev. Lett., 71:2505, 1993.
OpenUrl CrossRef PubMed Web of Science

[25] [25].↵
Z. Liu, G. Reddy, and D. Thirumalai. Folding PDZ2 domain using the molecular transfer model. J. Phys. Chem. B, 2016. in press.

[26] [26].↵
C. J. Camacho and D. Thirumalai. Kinetics and thermodynamics of folding in model proteins. Proc. Natl. Acad. Sci. USA, 90:6369–6372, 1993
OpenUrl Abstract/FREE Full Text

[27] [27].
M. S. Li, D. K. Klimov, and D. Thirumalai. Finite size effects on thermal denaturation of globular proteins. Phys. Rev. Lett., 93:268107, 2004.
OpenUrl CrossRef PubMed

[28] [28].↵
K. A. Dill and D. Shortle. Denatured states of proteins. Ann. Rev. Biochem., 60(1):795–825, 1991.
OpenUrl CrossRef PubMed Web of Science

[29] [29].↵
S. Akiyama, S. Takahashi, T. Kimura, K. Ishimori, I. Morishima, Y. Nishikawa, and T. Fu-jisawa. Conformational landscape of cytochrome c folding studied by microsecond-resolved small-angle X-ray scattering. Proc. Natl. Acad. Sci. USA, 99(3):1329–1334, 2002.
OpenUrl Abstract/FREE Full Text

[30] [30].
T. Kimura S. Akiyama T. Uzawa K. Ishimori I. Morishima T. Fujisawa and S. Takahashi. Specifically collapsed intermediate in the early stage of the folding of ribonuclease A. J. Mol. Biol., 350(2):349–362, 2005.
OpenUrl CrossRef PubMed

[31] [31].↵
R. R. Goluguri and J. B. Udgaonkar. Microsecond rearrangements of hydrophobic clusters in an initially collapsed globule prime structure formation during the folding of a small protein. J. Mol. Biol., 428(15):3102–3117, 2016.
OpenUrl

[32] [32].↵
S. F. Edwards. The statistical mechanics of polymers with excluded volume. Proc. Phys. Soc., 85(4):613–624, 1965.
OpenUrl CrossRef

[33] [33].↵
C. J. Camacho and D. Thirumalai. Modeling the role of disulfide bonds in protein folding: Entropic barriers and pathways. Proteins, 22:27–40, 1995.
OpenUrl CrossRef PubMed Web of Science

[34] [34].↵
Klimov D. K. and D. Thirumalai. Multiple protein folding nuclei and the transition state ensemble in two-state proteins. Proteins, 43:465–475, 2001.
OpenUrl CrossRef PubMed Web of Science

[35] [35].↵
R. B. Best, Hummer G., and W. A. Eaton. Native contacts determine protein folding mechanisms in atomistic simulations. Proc. Natl. Acad. Sci., 110:17874–17879, 2013.
OpenUrl Abstract/FREE Full Text

[36] [36].↵
S. Yasuda T. Yoshidome H. Oshima R. Kodama Y. Harano and M. Kinoshita. Effects of side-chain packing on the formation of secondary structures in protein. J. Chem. Phys., 132:065105, 2010.
OpenUrl PubMed

[37] [37].
A. Maritan C. Micheletti A. Trovato and J. R. Banavar. Optimal shapes of compact strings. Nature, 406:287, 2000.
OpenUrl CrossRef PubMed Web of Science

[38] [38].
J. E. Magee, V. R. Vasquez, and L. Lue. Helical structures from an isotropic homoplymer model. Phys. Rev. Lett., 96:207802, 2006.
OpenUrl CrossRef PubMed

[39] [39].
T. Skrbic T. X. hoang, and A. giacometti. Effective stiffness and formation of secondary structures in a protein-like model. J. Chem. Phys., 145:084905, 2016.
OpenUrl

[40] [40].↵
Y. Snir and R. D. Kamien. Entropically driven helix formation. Science 307:1067, 2005.
OpenUrl Abstract/FREE Full Text

[41] [41].
A. Craig and E. M. Terentjev. Auxiliary field theory of polymers with intrinsic carvature. Macromolecules, 39:4557–4565, 2006.
OpenUrl

[42] [42].↵
C. Cardelli V. Binaco L. Rovigatti F. Nerattini L. tubiana C. Dellago and I. Coluzza. Universal criterion for designability of heteropolymers. arXiv:1606.05253v1, 2016.

[43] [43].↵
Kudlay A., Cheung M. S., and D. Thirumalai. Crowding effects on the structural transitions in a flexible helical homopolymer. Phys. Rev. Lett., 102:118101, 2009.
OpenUrl CrossRef PubMed

[44] [44].↵
A. M. Gutin and E. I. Shakhnovich. Statistical mechanics of polymers with distance constraints. J. Chem. Phys., 100:5920, 1994.
OpenUrl

[45] [45].↵
J. D. Bryngelson and D. Thirumalai. Internal constraints induce localization in an isolated polymer molecule. Phys. Rev. Lett., 76:542, 1996.
OpenUrl PubMed

[46] [46].
D. Thirumalai V. Ashwin and J. K. Bhattacharjee. Dynamics of random hydrophobic-hydrophilic copolymers with implications for protein folding. Phys. Rev. Lett., 77:5385, 1996.

[47] [47].↵
Y. Kantor and M. Kardar. Collapse of randomly linked polymers. Phys. Rev. Lett., 77:4275, 1996.
OpenUrl PubMed

[48] [48].↵
R. Zwanzig. Effect of close contacts on the radius of gyration of a polymer. J. Chem. Phys., 106:2824, 1997.
OpenUrl

[49] [49].
C. J. Camacho and D. Thirumalai. A criterion that determines fast folding of proteins: A model study. Europhys. Lett., 35:627–632, 1996.
OpenUrl

[50] [50].↵
C. J. Camacho and T. Schanke. From collapse to freezing in random heteropolymers. Europhys. Lett., 37(9):603, 1997.
OpenUrl

[51] [51].↵
R. T. Deam and S. F. Edwards. The theory of rubber elasticity. Phil. Trans. R. Soc. A, 280(1296):317–353, 1976.
OpenUrl CrossRef

[52] [52].
P. Goldbart and N. Goldenfeld. Rigidity and ergodicity of randomly cross-linked macro-molecules. Phys. Rev. Lett., 58(25):2676, 1987.
OpenUrl CrossRef PubMed Web of Science

[53] [53].↵
P. Goldbart and N. Goldenfeld. Microscopic theory for cross-linked macromolecules. I. Broken symmetry, rigidity, and topology. Phys. Rev. A, 39(3):1402, 1989.
OpenUrl PubMed

[54] [54].↵
H. E. Castillo, P. M. Goldbart, and A. Zippelius. Distribution of localisation lengths in ran-domly crosslinked macromolecular networks. Europhys. Lett., 28:519, 1994.
OpenUrl

[55] [55].↵
S. F. Edwards and P. Singh. Size of a polymer molecule in solution. Part 1. Excluded volume problem. J. Chem. Soc. Faraday Trans., 75:1001–1019, 1979.
OpenUrl

[56] [56].↵
D. Thirumalai. Isolated polymer molecule in a random environment. Phys. Rev. A, 37:269, 1988.
OpenUrl CrossRef PubMed

[57] [57].↵
Duplantier B. Tricritical disorder transition of polymers in a cloudy solvent: Annealed randomness. Phys. Rev. A, 38:3647, 1988.
OpenUrl PubMed

[58] [58].↵
P. L. Flory. Principles of polymer chemistry. Cornell University Press, 1986.

[59] [59].↵
S. Griep and U. Hobohm. PDBselect [1992–2009 and PDBfilter-select. Nuclic Acids Res., 38:D318–D319, 2009.
OpenUrl CrossRef PubMed Web of Science

[60] [60].↵
J. D. Honeycutt and D. Thirumalai. The nature of folded states of globular proteins. Biopoly-mers, 32:695–709, 1992.
OpenUrl CrossRef PubMed Web of Science

[61] [61].↵
Dynamic visualization of the collapsibility of proteins in PDB is publicly available at https://sites.cns.utexas.edu/thirumalai/supplements. Pointing to each dot gives all the characteristics of a given protein.

[62] [62].↵
D. Thirumalai and C. Hyeon. RNA and protein folding: Common themes and variations. Biochemistry, 44(13):4957–4970, 2005.
OpenUrl CrossRef PubMed Web of Science

[63] [63].↵
C. Hyeon R. I. Dima, and D. Thirumalai. Size, shape and flexibility of RNA structures. J. Chem. Phys., 125:194905, 2006.
OpenUrl CrossRef PubMed

[64] [64].↵
A. M. Yoffe, P. Prinsen A. Gopal C. M. Knobler, W. M. Gelbart, and A. Ben-Shaul. Predicting the sizes of large RNA molecules. Proc. Natl. Acad. Sci. USA, 105(42):16153, 2008.
OpenUrl Abstract/FREE Full Text

[65] [65].↵
L. T. Fang, W. M. Gelbart, and A. Ben-Shaul. The size of RNA as an ideal branched polymer. J. Chem. Phys., 135(15):155105, 2011.
OpenUrl CrossRef PubMed

[66] [66].↵
J. H. Roh, L. Guo J. D. Kilburn, R. M. Briber, T. Irving and S. A. Woodson. Multistage Collapse of a Bacterial Ribozyme Observed by Time-Resolved Small-Angle X-ray Scattering. J. Am. Chem. Soc., 132(29):10148–10154, 2010.
OpenUrl CrossRef PubMed Web of Science

[67] [67].↵
R. Behrouzi J. H. Roh, D. Kilburn R. M. Briber, and S. A. Woodson. Cooperative tertiary interaction network guides RNA folding. Cell, 149:348–357, 2012.
OpenUrl CrossRef PubMed Web of Science

[68] [68].↵
A. Gopal D. E. Egecioglu, A. M. Yoffe, A. Ben-Shaul, A. L. N. Rao, C. M. Knobler, and W. M. Gelbart. Viral RNAs are unusually compact. PLoS One, 9(9):e105875, 2014.
OpenUrl CrossRef PubMed

[69] [69].↵
VP Denisov, BH Jonsson, and B Halle. Hydration of denatured and molten globule proteins. Nature Strutural & Mol. Biol., 6(3):253–260, 1999.
OpenUrl

[70] [70].↵
H. J. Bussemaker, D. Thirumalai and J. K. Bhattacharjee. Thermodynamic stability of folded proteins against mutations. Phys. Rev. Lett., 79:3530–3533, 1997.
OpenUrl CrossRef Web of Science

[71] [71].↵
H. Maity and G. Reddy. Folding of protein l with implications for collapse in the denatured state ensemble. J. Am. Chem. Soc., 2016.

[72] [72].↵
V. I. Abkevich, A. M. Gutin, and E. I. Shakhnovich. Impact of local and non-local interactions on thermodynamics and kinetics of protein folding. J. Mol. Biol., 252(4):460–471, 1995.
OpenUrl CrossRef PubMed Web of Science

[73] [73].↵
L. Tubiana A. L. Božič, C. Micheletti and R. Podgornik. Synonymous mutations reduce genome compactness in icosahedral ssRNA viruses. Biophys. J., 108(1):194–202, 2015.
OpenUrl CrossRef

[74] [74].↵
N. B. Wilding, M. Muller and K. Binder. Chain length dependence of the polymer-solvent critical point parameters. J. Chem. Phys., 105:2, 1996.
OpenUrl

[75] [75].↵
John D Weeks, David Chandler, and Hans C Andersen. Role of repulsive forces in determining the equilibrium structure of simple liquids. J. Chem. Phys., 1971.

[76] [76].↵
C. Hyeon and D. Thirumalai. Mechanical unfolding of RNA: From hairpins to structures with internal multiloops. Biophys. J., 92(3):731–743, 2007.
OpenUrl CrossRef PubMed Web of Science