ABSTRACT
The dimensions that unfolded and intrinsically disordered proteins (IDPs) adopt at low or no denaturant remains controversial. We recently developed an innovative analysis procedure for small-angle X-ray scattering (SAXS) profiles and found that even relatively hydrophobic IDPs remain nearly as expanded as the chemically denatured ensemble, rendering them significantly more expanded than is inferred from many fluorescence resonance energy transfer (FRET) studies. Here we show that fluor-phores typical of those added to IDPs for FRET studies contribute to this discrepancy. Specifically, we find that labeling a highly expanded IDP with Alexa488 causes its ensemble to contract significantly.
We also tested the recent suggestion that FRET and SAXS results can be reconciled if, for unfolded proteins (and as opposed to the case for ideal random flight homopolymers), the radius of gyration (Rg) can vary independently from the chain’s end-to-end distance (Ree). Our analysis indicates, however, that SAXS is able to accurately extract Rg, ν and Ree even for heteropolymeric, protein-like sequences. From these studies we conclude that mild chain contraction and fluorophore-based interactions at lower denaturant concentrations, along with improved analysis procedures for both SAXS and FRET, can explain the preponderance of existing data regarding the nature of polypeptide chains unfolded in the absence of denaturant.
Significance Statement Proteins can adopt a disordered ensemble, either prior to folding or as a part of their function. Simulations and fluorescence resonance energy transfer (FRET) studies often describe these disordered conformations as more compact than the fully random-coil state, whereas small-angle X-ray scattering studies (SAXS) indicate an expanded ensemble closely approximating the dimensions expected for the random coil. Resolving this discrepancy will enable more accurate predictions of protein folding and function. Here we reconcile these views by showing that the addition of common FRET fluorophores reduces the apparent dimensions of a disordered protein. Detailed analysis of both techniques, along with accounting for a moderate amount of fluorophore-induced contraction, demonstrates that disordered and unfolded proteins often remain well solvated and largely expanded in the absence of denaturant, properties that presumably minimize misfolding and aggregation.
Protein disorder is an essential component of diverse cellular processes (1–4). Unlike well-folded proteins, which populate a well-defined functional state, intrinsically disordered proteins (IDPs) and regions (IDRs) sample a broad ensemble of rapidly interconverting conformations (3–9) with biases that are poorly understood and difficult to measure. Of particular interest is the extent to which IDPs undergo compaction under physiological conditions (i.e., in the absence of denaturants). Such compaction would have broad implications for our understanding of protein folding, interactions and stability as well as the action of denaturants. Moreover, understanding the extent of collapse in disordered ensembles has profound implications for the development of realistic simulations of protein folding and interpretation of SAXS and FRET measurements (10, 11).
Our current understanding of the physiochemical principles that underlie whether a given polypeptide chain will fold, adopt a collapsed but disordered ensemble, or behave as an expanded, fully-solvated self-avoiding random walk (SARW) is insufficient to explain the existing data. Most of this understanding is derived from studies of proteins unfolded by high concentrations of chemical denaturants such as urea and guanidine hydrochloride (Gdn). Under these conditions the consensus is that proteins behave as SARWs, corresponding to a Flory exponent (ν) of 0.60 in the relationship Rg∝Nν (where Rg = radius of gyration and N = polypeptide chain length). Consensus is lacking, in contrast, regarding the behavior of disordered polypeptide chains at lower or no denaturant. Specifically, while numerous FRET (18–21, 25–35) and computational studies (18–24, 36) have argued that the expanded, disordered ensemble seen at high denaturant collapses at low or no denaturant (ν < 0.5) (12–24), almost equally numerous SAXS studies report no or only minor collapse under these same conditions (11, 37–42).
A variety of recent studies have attempted to reconcile this widely-recognized and seemingly important discrepancy (Fig. 1A). For example, the application of more realistic simulations and analytical models has reduced the denaturant dependence of FRET-derived distances (Fig. 1A, bottom) (41, 43–45). Likewise, improved analysis of SAXS data provides evidence for minor contraction when the guanidine hydrochloride (Gdn) concentration falls below 2 M (Fig. 1A, bottom), further reducing the apparent disagreement between the two methods (46, 47). Significant discrepancies nevertheless persist between the two techniques, even when the same procedures are used to analyze a single protein by both SAXS and FRET (Figs. 1A-B, S1–S2; Movie S1).
To comprehensively compare results from recent SAXS and FRET analyses of IDPs, we collected published datasets for a variety of disordered proteins (Fig. 1C-D; Table S3). When analyzed using our simulations and molecular form factor (MFF), SAXS studies consistently find ν > 0.53 (mean = 0.55) whereas ν derived from FRET studies typically falls below the random walk value of 0.50 (mean = 0.46). This discrepancy of 0.09 is substantial given that the entire range of ν varies only from 0.6 (for a SARW) through 0.5 (for a non-self-avoiding random walk) to 0.33 (for a perfect sphere; it is somewhat higher for non-spherical compact states). In general, SAXS results suggest that the conformational ensembles of a majority of unfolded proteins and IDPs (of protein-like composition) are more expanded than a non-self-avoiding random walk, whereas FRET suggests otherwise (Fig 1D).
The above and other, similar results have led us and others to consider the factors that might contribute to the persistent discrepancy between SAXS- and FRET-based views of IDPs (36, 40, 43, 48). One alternative, herein denoted the “heteropolymer-decoupling hypothesis,” posits that the heteropolymeric nature of proteins leads to variation of the normally fixed relationship between Rg and the polypeptide chain end-to-end distance (Ree). Specifically, for a homopolymer adopting a SARW the ratio G = (Ree/Rg)2 is expected to be fixed at a value of 6.3 irrespective of the polymer’s length. Recent simulations, however, suggest that, unlike the case for homopolymers, this ratio can vary significantly for heteropolymers (36, 40, 43, 44), with this “decoupling” offering a possible explanation for the discrepancy between SAXS (which measures Rg) and FRET (which measures Ree). In contrast, a second hypothesis, herein denoted the “fluorophore-interaction hypothesis,” suggests that, in the absence of denaturant, the FRET fluorophores interact with each other and/or with the polypeptide chain, causing the conformational ensemble of dye-modified constructs to contract artifactually (10, 46, 48, 49).
To address these hypotheses we have used SAXS to characterize the radius of gyration of fluorophore-modified IDPs in the presence and absence of dye modification. Doing so we find that labeling with fluorophores commonly used for FRET studies alters the conformational ensemble populated in the absence of denaturant, decreasing its SAXS-measured dimensions by 10-20%. When coupled with improved analysis procedures employing realistic ensembles for both SAXS and FRET, this contraction is sufficient to bring results from SAXS and FRET studies into agreement. In parallel, we present SAXS measurements on polyethylene glycol (PEG), confirming prior reports that the addition of fluorophores likewise compacts this otherwise SARW polymer (10), a finding that was recently questioned (43). We also show that SAXS can extract Rg, ν and Ree with an accuracy of better than 3% when analyzed using a new MFF developed for heteropolymers. These simulations are accurate enough to reproduce scattering data without the need to select only a sub-ensemble of conformations, as commonly used in other data fitting procedures. Finally, we demonstrate the extent that one can use small deviations from ideality in SAXS data to infer biases within the heteropolymer conformational ensemble.
Results
Fluorophore-labeling induces collapse of a highly expanded ensemble
To directly test the fluorophore-interaction hypothesis we have measured SAXS profiles of an unmodified IDP and the same IDP when site-specifically modified with one or two copies of the commonly employed FRET dye Alexa488. We chose this dye because it is relatively hydrophilic (44), and thus considered less likely to form interactions that would alter the conformational ensemble than any of the other commonly used FRET fluorophores. As our test protein we used PNt, a well-behaved IDP comprising the amino terminal 334 residues of pertactin (50). To produce the mono-dye-modified protein we introduced a cysteine residue at position 117 and modified it with the appropriate thiol-reactive Alexa488 variant (PNtC-Alexa488). To produce a dye-pair-modified protein we introduced cysteines at positions 29 and 117 (PNtCC-Alexa488). Alkylation was used to produce dye-free constructs (PNtC-Alkd and PNtCC-Alkd) that we used as controls, in addition to the unmodified parent protein (PNt).
The addition of Alexa488 reduces the SAXS-measured dimensions of the IDP under both aqueous and intermediate denaturant conditions (Fig. 2A, Table S1). Specifically, Rg and ν decrease nearly twice as much for PNtCC-Alexa488 than for either PNtCC-Alkd or PNt (Fig. 2B; Table S1). These data indicate that labeling with Alexa488 leads to increased contraction of the PNt conformational ensemble, implying that stabilization of fluorophore-mediated interactions within the IDP conformational ensemble. Of note, whereas the unlabeled protein in 2 M Gdn is in a good solvent condition (ν > 0.5), with protein-protein interactions weaker than protein-solvent interactions, fluorophore-labeling leads to measurable intramolecular interactions even at this relatively high denaturant concentration. The magnitude of this denaturant-dependent chain expansion is qualitatively similar to that observed by FRET for a variety of other proteins, consistent with a common origin (Fig. 1B) (43, 44). We also observed a fluorophore-dependent decrease in average Rg and ν for the single-labeled construct PNtC-Alexa488 (Fig. 2). This result indicates that the PNt conformational ensemble is affected by fluorophore-protein interactions, not only fluorophore-fluorophore interactions. Of note, this contraction occurs despite the fact that the steady-state fluorescence anisotropy of PNtCC-Alexa488 is, at 0.11 and 0.08 in 0 and 2 M denaturant, respectively (Table S2), below the threshold usually used to indicate free rotation of protein-attached dyes (43, 51).
It thus appears that addition of even relatively hydrophilic fluorophores commonly employed for FRET measurements can significantly influence the dimensions of disordered polypeptide chains (43, 44, 51).
PEG dimensions are independent of polymer concentration
In an earlier study we reported that addition of Alexa488/594 to PEG resulted a denaturant-dependent change in FRET (10), similar to that seen in FRET measurements of unfolded proteins. No contraction was observed, however, when the equivalent unlabeled polymer was studied using small angle neutron scattering. It has been proposed that the high (3 mM) concentrations of PEG used in this scattering study masked what would otherwise be denaturant-dependent changes in Rg(43). To test this, we measured SAXS profiles over a range of PEG and denaturant concentrations and found no evidence for a significant change in polymer dimensions (Fig. 3). Likewise, under all conditions we observe a Flory exponent of 0.60, further confirming that PEG behaves as a SARW independent of denaturant concentration. The fluorophore-interaction hypothesis thus remains the simplest interpretation of the denaturant-dependent changes in FRET observed for fluorophore-labeled PEG (10).
Testing the heteropolymer-decoupling hypothesis
Taken together, the above observations indicate that fluorophores added to an IDP lead to significant contraction of its conformational ensemble, contributing to the different conclusions drawn from prior SAXS and FRET studies. These observations, however, do not rule out the possibility that heteropolymer-decoupling also contributes to the SAXS-FRET discrepancy. To investigate how amino acid sequence composition and pattern can alter the SAXS profile and test our ability to extract information from such deviations we used our Cβ-level polypeptide chain simulations (46) to simulate the scattering for unfolded ensembles of 50 protein sequences of 250-650 residues randomly chosen from the PDB. We treated each sequence as a binary hydrophobic/polar (HP) chain, where the only favorable interactions are between Cβ atoms of aliphatic and/or aromatic residues. For each of the 50 sequences, 30 different Cβ interaction strengths were used. These simulations yielded a range of deviations from G(ν) obtained from homopolymer simulations (Fig. S4A). From the 1500 resulting ensembles, we determined the Rg, ν, and Ree both directly from the atomic coordinates of the simulated ensemble and by fitting the hydrated SAXS profile (with added realistic random errors) of each ensemble using our MFF. The inferred end-to-end distance (Reeinf) was determined using the relationship Reeinf(Rg, ν)=G(ν)1/2*Rg, where G(ν) was obtained from homopolymer simulations.
Compared to the true values calculated from the atomic coordinates, fits obtained using our MFF yield values of Rg, ν, and Ree with a mean absolute deviation of only 1.3 Å, 0.011 and 4.2 Å, respectively, representing a 3%, 2% and 4% mean absolute error (Fig S3). The largest deviations are observed for more compact structures; for more extended conformations with ν >0.54, there is ~2% error.
To further reduce the small error associated with the application of our MFF derived from homopolymers to the scattering of heteropolymers, we generated a new molecular form factor, MFFhet, using the heteropolymer simulations described above. Application of this slightly modified MFF lowers errors in fitted Rg, ν, and Ree to 0.5 Å, 0.005 and 2.7 Å, respectively, representing 1%, 1% and 2% mean absolute error (Fig 4A-C). These results demonstrate that SAXS analysis returns accurate values of Ree, in addition to Rg and ν, even for disordered heteropolymers.
Measuring deviations from ideality in heteropolymers
MFFhet accurately captures the overall dimensions of disordered heteropolymers. Nevertheless, small but measureable deviations are observed for proteins in our test set with less well-mixed HP patterns (Fig. S4). These differences can be seen in the intra-molecular distance distribution plot, where the slope at separation distances |i–j|>N/2 can be different than the average slope, which defines the global ν value (Fig. 4D). We define change in slope as Δνend (Fig. 4D). Negative values of Δνend correlate with more hydrophobic residues at the ends of the polypeptide sequences (Figs. 4D, S4C) and with deviations in G(ν) (Fig. S4A) (R2~0.84). The SAXS profile is most sensitive to Δνend at low qRg (Fig. 4E).
To extract this extra information from the SAXS data we generated a more general three-parameter form factor, MFF(Rg, ν, Δνend) (Fig. 4E-F; Movie S2). To demonstrate its ability to yield useful information, we fit data from PNt, PNtCC-Alexa488 and a circularized (disulfide-bonded) PNtCC at 2 M Gdn (Fig. 4F). Δνend decreases from ~0 for PNt to ~ −0.1 for PNtCC-Alexa488 and PNtCC, consistent with the increase in interactions at the N-terminus of the chain. These data demonstrate that for disordered polymers, SAXS is sensitive to heteropolymer deviations (Fig. 4E-F) while still able to measure Rg and ν accurately (Fig. 4A-C).
Discussion
Labeling with Alexa488, a fluorophore commonly used to support FRET measurements, can alter the conformational ensemble of a disordered protein, decreasing Rg and ν even when the fluorescence anisotropy is low relative to accepted limits for free dye rotation (43, 51). In combination with prior studies (10), similar conclusions can be inferred for PEG, a known SARW. These findings, along with our prior result that disordered chains undergo a mild expansion in denaturants (46) and improved methods for extracting Rg values from FRET data, provide a sufficient framework for resolving discrepancies between SAXS and FRET on the dimensions of disordered proteins.
Consistent with our findings of fluorophore-induced effects, others have found that molecular dimensions inferred by FRET can be dependent on the fluorophore pair used, with more hydrophobic fluorophores suggesting a more collapsed state (44). All-atom molecular dynamics simulations with a Alexa488/594 fluorophore pair, for example, resulted in a 10% contraction of an IDP even in 1 M urea (68). Likewise, a recent study found that smFRET signals from both DNA and PEG, which are often referred to as “spectroscopic rulers,” are dependent on solvent conditions under which the dimensions of the chains were expected to be invariant (49). In apparent disagreement with our data, however, Fuertes et al. (43) conducted SAXS measurements on five IDPs with and without Alexa488/594 and concluded that, on average, the alterations seen upon the addition of fluorophores were minimal. When considered for each protein separately, however, the differences appear significant relative to the narrow range of all possible values of ν. Specifically, for the five proteins characterized in that study, νunlabel-νlabel =0.08, 0.03, 0.03, −0.02, −0.04 (or 0.09, 0.06, 0.03, −0.02, −0.08 when analyzed using our procedures). That is, more than half of these IDPs exhibit a fluorophore-induced contraction similar in magnitude to the contraction we observe for fluorophore-labeled PNt in water. Together, these data suggest a consistent picture of fluorophore-induced contraction contributing to differences in the magnitude and denaturant dependence of Rg inferred from SAXS and FRET.
The other factor that has been suggested to contribute to the discrepancy between SAXS and FRET results is deviations from the proportional relationship between Rg and Ree that arise when moving from homopolymers to heteropolymers (43). Underlying this view is the observation that, if one reweighs the ensemble (i.e., calculates Rg using only a subset of conformations), many possible values of Ree are consistent with any given Rg (and vice-versa). Rather than selecting a sub-ensemble to fit the data, however, we have instead taken an alternative approach (46). We generate physically plausible ensembles at the outset and examine whether they fit the data in their entirety. We find that the MFF derived from our ensembles accurately matches the entire scattering profile (rather than just the Rg), which provides strong support for our procedure. Since we can determine the values of Rg and Ree directly from the underlying ensembles, we have a procedure to obtain these two parameters by fitting the SAXS data with our MFF.
This MFF is imperfect in the sense that slightly different ensembles can be fit using the same Rg and ν parameters. But the error is very low for these two parameters relative to their true values (Fig. 4A-C). Inclusion of heteropolymer effects does not change this conclusion. From this result, we conclude that SAXS is well suited to extract both Rg and Ree for disordered heteropolymers, while circumventing potential artifacts due to fluorophore interactions with polypeptide chains. This conclusion does not negate the potential of FRET to measure dynamics, binding and conformational changes; it does, however, reinforce that caution must be exercised when employing FRET to infer quantitative distances in the original, unlabeled biomolecule.
Nearly a dozen IDP SAXS datasets reported here and previously (46) have been shown to fit well to our general MFF (Tables S1, S3). This finding suggests that the interactions that drive chain contraction are spread along protein sequences. Water-soluble, well-folded protein sequences tend to be well-mixed heteropolymers, with relatively small stretches of consecutive hydrophobic residues (69). These well-mixed sequences tend to behave as homopolymers when measured by global, low resolution methods such as SAXS. Indeed, we have demonstrated that, with sufficient data quality, poorly mixed sequences can be identified by their deviation from our MFF (Fig. 4D-F). Larger deviations can occur for some IDPs, especially those with partial folding, unusual sequence patterning (e.g., block copolymers) and/or under crowded conditions that may serve specific functions (70, 71).
These results and analysis presented here demonstrate that water is a good solvent for many foldable protein sequences, rendering them expanded with ν ≥ 0.54. This property is likely beneficial to the cell as it helps to reduce misfolding and unwanted associations, while simultaneously facilitating protein synthesis and transport. Moreover, many proteins fold (in both a thermodynamic and kinetic sense) even in the presence of modest amounts of denaturants – and some even in 6 M Gdn (42, 72), where ν~0.6. Hence, folding is a robust process that does not critically depend on solvent quality as long as the native protein structure is stable relative to the unfolded ensemble.
Materials and Methods
Protein purification
PNtCC and PNtC were expressed in E. coli BL21(DE3)pLysS and purified from inclusion bodies as described previously (46, 50, 73), with the following modifications. After inclusion body solubilization, PNt constructs were refolded in 50 mM Tris pH 7.2 with 50 mM ß-mercaptoethanol (ßME). Prior to the final size exclusion chromatography step, 20 mM ßME was added to the protein stock solution.
Alkylation
Purified PNtCC or PNtC (70 uM) in 50 mM Tris pH 8, 5 mM EDTA was reduced with 5 mM TCEP for 30 min at room temperature while stirring. Alkylation was initiated by addition of 10 mM iodoacetamide in water. PNt constructs were incubated for 30 min at room temperature while stirring in the dark. The alkylation reaction was quenched with the addition of 20 mM fresh DTT. Excess reagents were removed by size exclusion chromatography (Superdex 16/60 S200). Alkylation efficiency (100%) was determined by mass spectrometry (MS).
Alexa488 labeling
PNt constructs were reduced as above. Alexa488 C5 maleimide (ThermoFisher Scientific) was resuspended in DMSO to 5 mg/ml and added to the reduced protein dropwise while stirring, to a final ratio of 5:1 fluorophore:cysteine. The reaction proceeded overnight at 4°C while stirring in the dark. Free fluorophore was separated from fluorophore-labeled protein by size-exclusion chromatography (Superdex 16/60 S200), protected from light at all times. Labeling efficiency (100% for PNtCC; >50% for PNtC) was determined by mass spectrometry.
Steady-state anisotropy
Steady-state anisotropy measurements were performed on a QM-6 T-format fluorometer (Horiba) at room temperature in 50 mM Tris pH 7.5 and 2 M Gdn where indicated. Samples containing 1 uM Alexa-488 were excited with 494 nm vertically polarized light. Anisotropy (r) was calculated using the emission at 516 nm by the equation
Where G is the instrument correction factor. G was calculated from the emission of free-Alexa488 at 516 nm after excitation with 494 nm horizontally polarized light by the equation
SAXS
Data were collected at the BioCAT beamline at the Advanced Photon Source (Argonne National Lab) using a GE Lifesciences Superdex 200 SEC column with the scattering presented as q=4π sinθ/λ, where 2θ is the scattering angle and λ is the X-ray wavelength (1 Å).
Code for simulations and fitting scattering data
Code and associated files necessary to produce simulations and the analysis can be accessed at www. (to be added prior to publication). Additionally, our webserver http://sosnick.uchicago.edu/SAXSonIDPs is available for fitting SAXS data with our MFF(ν,Rg).
Simulations and generation of MFF
Calculations were performed at the University of Chicago Research Computing Center using a version of our Upside molecular dynamics program (arXiv:1610.07277) modified for Cβ-level interactions, as done previously (46).
Acknowledgements
We thank Srinivas Chakravarthy for assistance in the SAXS measurements and Matthew Champion for assistance with the mass spectrometry experiments. This work was supported by National Institutes of Health (NIH) Research Grant R01 GM055694 (TRS), the W. M. Keck Foundation (PLC) and NSF grants GRF DGE-1144082 (JAR) and MCB 1516959 (CR Matthews). Use of the Advanced Photon Source, an Office of Science User Facility, operated for the Department of Energy (DOE) Office of Science by Argonne National Laboratory, was supported by the DOE under Contract No. DEAC02-06CH11357. This project was supported by the NIH (2P41RR008630-18 and 9 P41 GM103622-18).