Abstract
Polyglutamine (polyQ) tracts are regions of low sequence complexity of variable length found in more than one hundred human proteins. In transcription factors, where they are frequent, tract length can correlate with transcriptional activity. In addition, in nine proteins, their elongation beyond specific thresholds is the cause of polyQ disorders. To investigate the structural basis of the association between tract length, biological function and disease we studied how the conformation of the polyQ tract of the androgen receptor, a transcription factor associated with a polyQ disease, depends on its length. We found that the tract folds into a helical structure stabilized by unconventional hydrogen bonds between glutamine side chains and main chain carbonyl groups that are bifurcate with the conventional main chain to main chain hydrogen bonds stabilizing α-helices. In addition, since tract elongation provides additional interactions, the helicity of the polyglutamine tract directly correlates with its length. These findings provide a structural basis for the association between polyglutamine tract length, transcriptional activity, and the onset of polyglutamine disorders.
Introduction
Polyglutamine (polyQ) tracts are low complexity regions containing almost exclusively Gln residues. They are frequent in the human proteome, particularly in the intrinsically disordered domains of proteins involved in the regulation of transcription such as the activation domains of transcription factors1. The functions of polyQ tracts are not well-understood but it has been suggested that they regulate the activity of the proteins that harbor them by modulating the stability of the complexes that they form2. The lengths of polyQ tracts are variable because their coding DNA sequences tend to adopt secondary structures that hamper replication and repair3. Contractions and expansions in polyQ tracts can have functional consequences and the lengths of polyQ tracts may have been subject to natural selection4. As an example it has been proposed that the length of the polyQ tract present in huntingtin correlates with the intellectual coefficient5, presumably because this protein plays important although still not well-defined roles in neural plasticity6.
For nine specific proteins, including huntingtin and the androgen receptor (AR), the variability in the lengths of polyQ tracts has pathogenic implications. Expansions beyond specific thresholds is associated with nine hereditary rare neurodegenerative diseases known as polyQ diseases7. The mechanistic basis of this phenomenon is a matter of debate: some have suggested that the actual expanded transcripts are the neurotoxic species8 due to their propensity to phase separate9, while others have suggested that expanded polyQ proteins are inherently neurotoxic10. It is generally thought, however, that polyQ expansions decrease protein solubility, leading to the formation of cytosolic or nuclear aggregates that interfere with proteasomal protein degradation11 and sequester the transcriptional machinery12. This is supported by experiments carried out in vitro and in cells, that showed that polyQ expansion decreases protein solubility13 and causes cell death14, as well as in vivo, that showed that promoting the clearance of polyQ aggregates led to improvements in polyQ expansion phenotypes15.
Since the disease-specific thresholds of polyQ diseases are similar16 it has been hypothesized that polyQ tracts have a generic propensity to undergo a tract-length-dependent conformational change producing a highly insoluble structure. A substantial number of theoretical, computational and experimental studies have investigated how the conformational properties of polyQ tracts change with their length. Some of these studies have suggested that expansions of the polyQ tract of huntingtin confer the ability to adopt extended conformations with β secondary structure17. By contrast, most experimental studies carried report that polyQ tracts are collapsed disordered coils that barely change conformation upon expansion18. This led to alternative hypotheses that proposes that expansion leads to toxicity by increasing the affinity that polyQ tracts have for their interactors, regardless of conformation19.
AR is the nuclear receptor that regulates the development of the male phenotype. It harbors a polyQ tract associated with the neuromuscular disease spinobulbar muscular atrophy (SBMA)20, that affects men with AR genetic variants coding for tracts with more than 37 residues, that form fibrillar cytotoxic aggregates21. The length of this tract also anti-correlates with the risk of suffering prostate cancer 22 due to its influence on AR transcriptional activity23. It seems, therefore, that the length of the polyQ tract of AR must be in a specific range to prevent the over-activation of the receptor and simultaneously minimize its propensity to form cytotoxic aggregates. This trade-off is reflected in the distribution of AR polyQ tract lengths in the population, despite some variations between ethnic groups24.
Despite their relevance for understanding the causes of two diseases, the structural basis of these sequence-activity relationships has not been established, in part due to the difficulty of obtaining atomic-resolution structures on these poorly soluble repetitive sequences. By establishing robust assays we characterized the conformation of the polyQ tract of AR25 as a function of tract length by using circular dichroism (CD) and solution nuclear magnetic resonance (NMR) spectroscopy, as well as molecular dynamics (MD) and QM/MM (quantum mechanics/molecular mechanics) calculations. We found that its stability directly depends on tract length due to the accumulation of unconventional interactions where Gln side chains donate a hydrogen bond to the main chain COs of residues at relative position i-4. By coupling the conformation of the polyQ tract to that of its flanking region these interactions provide a plausible explanation of how changes in tract length cause changes in gene expression and solubility, thus providing a rationale for the range of tract lengths observed in men.
Results
The polyQ tract of AR folds into a helix that gains stability upon elongation
We used CD to analyze the secondary structure of synthetic peptides uQ25, uL4Q25 and L4Q25 (Fig. 1a). Peptide uQ25, where the letter u stands for uncapped, represents a polyQ tract of length 25 flanked by Lys residues, used to enhance solubility at physiological pH26. uL4Q25 possesses four Leu residues found N-terminally to the polyQ region in AR and peptide L4Q25 contains four additional AR residues (Pro-Gly-Ala-Ser) predicted to act as N-capping motif27 (Fig. S1). As shown in Figure S2, the CD spectra of both uL4Q25 and L4Q25, measured at pH 7.4 and 277K, have well-defined minima at ca. 205-208 and 222 nm, especially for L4Q25, indicating that they are 40 and 55% helical, respectively, in contrast to peptide uQ25, which is 20% helical. These results indicate that the helicity of this polyQ tract stems from interactions involving eight residues flanking it at the N-terminus, including a predicted N-capping motif and four Leu residues25.
To quantify how helicity depends on tract length we studied polyQ peptides equivalent to L4Q25 but with tract lengths 4, 8, 12, 16 and 20 (L4Qn, Fig. 1a) by CD and observed that they are strongly correlated (Fig. 1b). Helicity increased abruptly from L4Q4 to L4Q8 and L4Q12, from ca 5% to ca 40%, and then increased slightly upon further elongation. Since the CD signal depends both on the amount and length of helical structures, and to determine the residue-specific distribution of helicity, we measured the backbone chemical shifts of the peptides by solution NMR (Figs. S3 and S4) and analyzed them with the algorithm δ2D28. We found an increase in helical propensity upon polyQ tract elongation, in agreement with the results obtained by CD (Fig. 1c), concomitant with a change in the identity of the residue with the highest helicity: whereas for peptide L4Q4 this is L3, with ca 20 % helicity, it shifts to L4, with 50% helicity, for peptides L4Q8 and L4Q12 and to Q1, with ca 80% helicity, for peptides L4Q16 and L4Q20 (Fig. 1c). We conclude that the stability of the conformation of this polyQ tract depends on its length and that for physiological tract lengths24 the residue of highest helicity can be part of the tract.
The side chains of the first residues of the polyQ tract have a distinct rotameric state
To rationalize the stability of the polyQ helix we extended our NMR analysis to the side chains and initially focused on the carboxamide groups of the Gln residues. We found that the 15N side chain resonances of the homopolymeric polyQ sequences are surprisingly well-dispersed and that the associated chemical shifts correlate with their position in the sequence i.e. that the resonances of the first residue of the tract appear upfield (111.75 ppm for Q1 in L4Q20) (Fig. 2a) and shift to lower fields towards the C-terminus of the tract (113.15 ppm for Q20 in L4Q20). Remarkably the first four residues (Q1 to Q4) have chemical shifts that are markedly lower; e.g. in L4Q20 the difference in side chain 15N chemical shift between Q4 and Q5 is 0.22 ppm whereas the resonances of Q5 and Q6 overlap. This indicates that the chemical environment of the Gln side chains varies along the polyQ tract, especially for the first residues.
We then analyzed the 1H resonances of the Gln side chains. Especially in the first residues of the tract the resonances of the γ protons, adjacent to the carboxamide group (Fig. 2b), overlap in the peptide with the shortest tract but gradually split as the length of the tract increases to 20. The behavior of the β protons, that are instead adjacent to the peptide backbone (Fig. 2c and S6), is more complex: in L4Q4 they are split, upon tract elongation to L4Q12 they collapse in one peak but they split again in L4Q16 and, especially, in L4Q20. These effects, caused by redistributions of side chain rotameric states, correlate with the increases in helicity that occur upon tract elongation reported in Figure 1c, indicating that the conformations of the main chain and side chain of these residues are coupled. Although these effects are particularly marked for the first three or four residues of the tract they can also be observed in the residues following them in the sequence, particularly in L4Q16 and L4Q20 (Fig. S6); this indicates that, in a given peptide, the population of the side chain conformation causing the effects gradually decreases along the sequence. In summary, we find that the side chains of residues with high helicity have a conformation that is different to those that are less ordered.
Hydrogen bonds between Gln side chain NH2 groups and main chain COs in helical conformers
To rationalize these observations we carried out molecular dynamics (MD) simulations. For this, since these peptides have fractional helicity, we generated fully helical conformations for peptides L4Q4 to L4Q20 and produced MD trajectories at 300K. We observed that the helical starting structures had a lifetime that depended on the length of the polyQ tract and that partially helical conformations were re-populated after unfolding (Fig. S7). Although this is not evidence for convergence it indicates that both the helical and unfolded states of the peptide have been sampled. An analysis of the helicity in the trajectory as a function of residue number showed that the residues with highest helicity were the four Leu residues flanking the polyQ tract and that the helicity of the Gln residues decreased along the tract (Fig. 3a). However, in contrast to the experiments (Fig. 1b), the overall helicity did not increase upon tract elongation (Fig. 3a).
To obtain representations of the structural properties of peptides L4Q4 to L4Q20 that are in quantitative agreement with experiment we used the Cα and CO backbone chemical shifts to reweight the trajectories with a Bayesian/Maximum Entropy (BME) algorithm29. In this procedure the degree of re-weighting and, therefore, the extent to which the back-calculated chemical shifts agree with those measured experimentally, is controlled by the parameter θ, which determines the balance between the prior information, encoded in the MD trajectory, and the experimental data (Figs. S8 and S9 and, for θ = 4, Fig. 3b). We analyzed the secondary structure of the reweighted trajectories and obtained that their overall helicity increased with the length of the polyQ tract (Fig. 3a), as observed by CD (Fig. 1b) and that the effect of elongation on the helicity of the various residues of the peptide was equivalent to that observed by NMR, indicating that the reweighted trajectories are useful models of the conformational properties of polyQ peptides (Fig. 3c).
The 15N chemical shifts of backbone amides depend on the hydrogen bonding status of both the HN group and the adjacent CO30. We thus hypothesized that the high dispersion of 15N Gln side chain resonances (Fig. 2) is due to hydrogen bonding interactions of the carboxamide group of the Gln side chains. The primary amide (NH2) groups of Gln and Asn side chains are good donors31 and, in surveys of hydrogen bonds involving them in protein structures, Gln residues can donate hydrogens to the backbone COs preceding them in the sequence32. To investigate this possibility we analyzed the hydrogen bonds formed by Gln side chains in the reweighted trajectories and found that the most common hydrogen bond is one where the side chain of a Gln residue donates a hydrogen to the main chain CO group of the residue at relative position i-4 in the sequence (Fig. 4a). This specific interaction, that we term i→i-4 side chain to main chain (sci→mci-4) hydrogen bond has been observed in protein structures deposited in the protein data bank (PDB); interestingly, it occurs almost exclusively in α-helices both in the PDB32 and in the trajectories (Fig. S10), suggesting that it plays a role in stabilizing this structure. In addition, for the reweighted MD ensembles of all peptides we observed that the population of this specific hydrogen bond progressively decreased along the polyQ tract (Fig. 4b).
We analyzed the rotamers populated by Gln residues involved in these hydrogen bonds in the reweighted trajectories and observed that they constrain the range of values of χ1 and χ3 that they can adopt (Fig. 4c). Note that while the distribution of χ1 in Gln residues in α-helices is generally bimodal33 only χ1 values around −60° are compatible with the suggested H-bonding motif, which also results in an enrichment of χ3 values around 90°. This is in agreement with the NMR results, which point towards the adoption of a specific conformation state by these side chains (Fig. 2c). As an example, we show a frame of the trajectory obtained for peptide L4Q16 in which two such hydrogen bonds occur simultaneously (involving residues Q1 and Q4 but not Q2 and Q3; Fig. 4d). The NMR-derived structural ensembles thus suggest that sci→mci-4 hydrogen bonds can be part of a hydrogen bonding motif where the CO group accepts two hydrogen bonds donated by the Gln side (purple) and main (yellow) chains.
The sci→mci-4 hydrogen bonds stabilize the helical structure of the polyQ tract
To test the importance of sci→mci-4 hydrogen bonds we used CD to analyse the secondary structure of peptides based on the L4Q16 sequence but with Gln residues substituted with Glu (Fig. 5a,b). Gln and Glu have similar structures and helical propensities34 but the side chain of Glu is deprotonated at pH 7.4 and cannot act as hydrogen bond donor. Decreases in helicity after mutation of Gln residues are thus compatible with their involvement in helix stabilization via sci→mci-4 hydrogen bonds. Since in the NMR-derived ensembles the population of such hydrogen bonds is highest at the N-terminus of the tract (Fig. 4b) we analyzed the effect of mutating, one at a time, each of the first five Gln residues (peptides Q1E to Q5E) and found that the helicity of Q1E to Q4E was lower than that of L4Q16: we observed a shift of the minimum at ca 205-208 nm to lower wavelengths and a relative decrease in the ellipticity at 222 nm that, together, accounted for a decrease in helicity from 40 to 30%. By contrast we found that the helicity of Q5E was very similar to that of L4Q16 (Fig. 5c and S11), suggesting that the propensity of the first four Gln residues to donate a hydrogen is higher than that of the fifth one. This is in agreement with the 15N Gln side chain chemical shifts, where we observed especially low values for the first four residues, which could be caused by particularly strong hydrogen bonding interactions (Fig. 2a). We also analyzed a mutant where the first four hydrogen bonded Gln residues were simultaneously mutated to Glu (Q1-4E) and found that in this case the loss of helicity was larger, from 40 % to 20 %, similar to the value found in uQ25 (Fig. 5b,c and S11).
Since the pKa of Glu side chains is ca 4, decreasing the pH of solutions of peptide Q1-4E to 2 should lead to their protonation and re-establish their ability to form sci→mci-4 hydrogen bonds. To investigate this hypothesis, we analyzed the secondary structure of peptides L4Q16 and Q1-4E at pH 2 by CD. For peptide L4Q16 we observed, as expected, no change in secondary structure, whereas for peptide Q1-4E we instead observed that it was strongly helical at low pH, more so than L4Q16 (Fig. S14). This suggested that, when protonated, Glu side chains, due to their acidic character, have an even higher propensity than Gln residues to donate a hydrogen bond to the main chain CO of the residue at position i-4. These results validate our approach to investigate side chain to main hydrogen bonds by Gln to Glu mutations and in addition contribute to explaining the high helical propensity observed in host-guest experiments for protonated Glu residues, where it is more helical than any other amino acid except Ala34.
It is remarkable that the first side chains of the polyQ tract have a particularly high propensity to form sci→mci-4 hydrogen bonds. The other Gln residues do so but with lower propensity, as suggested for example by their side chain chemical shifts. One difference between these two sets of Gln residues is that the former are at position i+4 relative to Leu residues whereas the latter are instead at position i+4 relative to Gln residues (Fig. 5b). Since the strength of hydrogen bonds depends on their degree of shielding from water35 we hypothesized that the sci→mci-4 hydrogen bonds between Gln and Leu residues are stronger, at least in part, due to shielding of water by Leu side chains. Indeed, as α-helices have 3.6 residues per turn the sci→mci-4 hydrogen bond between residues L1 and Q1 can be shielded by the side chain of residue i (L1) (Fig. 5b). To investigate this we measured the helicity of a peptide based on the sequence of L4Q16 but with all Leu residues mutated to Ala (L1-4A), an amino acid that has a smaller side chain and, presumably, a lower ability to shield this hydrogen bond. We found that, despite the higher intrinsic helical propensity of Ala compared to Leu34 and the higher predicted helicity of L1-4A compared to L4Q16 (Fig. S13), the helicity of L1-4A was only ca. 20%, as low as that of Q1-4E (Fig. 5c). This confirms that the shielding properties of the Leu side chains are indeed key for the strength of this interaction and for its ability to stabilize polyQ helices, and in addition indicates that accounting for the sci→mci-4 hydrogen bond revealed in this work will be important to reliably predict the helicity of polyQ peptides from their sequences (Fig. S13).
To confirm that the shielding provided by Leu is relevant for the ability of Gln to donate a hydrogen bond to the residue at relative position i-4, we characterized the synthetic peptide L1-4A by NMR. We compared the side chain 1H,15N resonances of peptide L1-4A with those of L4Q16 by carrying out 1H,15N-HSQC experiments at natural 15N abundance and observed that there was a complete loss of dispersion in the 15N chemical shift dimension for L1-4A: except for the last three Gln residues, all other residues in the tract have the same 15N chemical shift (Fig. 5d). We then analyzed the side chain 1H resonances of the Gln side chains and observed that, in contrast to L4Q16, the signals of Q1 to Q4 in L1-4A display collapsed γ and split β resonances, indicating that these side chains do not have the same conformation as in L4Q16.
The hydrogen bonds between Gln side chain NH2 groups and main chain COs are bifurcate
Our results suggest that the side and main chain of Gln can simultaneously donate a hydrogen to the CO of the residue at relative position i-4 (Fig. 4d). This can generate a type of bifurcate hydrogen bonding, shown to occur experimentally36 and in QM calculations37, that takes advantage of the directionality of the lone pairs of the acceptor group. This type of interactions are not accurately represented in the atom-centric representation of electrostatic interactions used in molecular simulation force fields, which may explain the problems we had to reproduce the experimental helicity in the classical MD simulations (Fig. 3a). To more accurately model the sci→mci-4 hydrogen bond we performed MD simulations by making use of the hybrid QM/MM methodology, which can account for a series of effects ignored in classical force fields such as lone pair directionality and electronic polarization. Specifically, given our results (Figs. 5b,c), the side chain carboxamide of the Gln residue at position i and the main chain CO group of Leu at position i-4 in peptide L4Q16 were included in the QM subsystem that was described at the DFT level of theory (see Fig. 6a). We performed a simulation of 150 ps at 300 K for the L4Q16 peptide started from a specific frame of the classical MD trajectory where the bifurcate bond is formed (Fig. 6) and focused our analysis in the interaction between Q1 and L1 (Fig. 5b).
Our analysis showed that the main chain to main chain hydrogen bond between Q1 and L1 (mcQ1→mcL1) is stable, that the the scQ1→mcL1 bond can form reversibly and that its breakage is caused by deviations of χ3 from the value required for the donor and acceptor to interact (+ 90 ± 30°, Fig. 4c,d, 6b,c). To analyze how the scQ1→mcL1 bond affects the mcQ1→mcL1 interaction we compared the effect of the former on the distribution of donor to acceptor distances in the latter. We found that it caused the distribution to shift to longer distances, by 0.17 Å, thus weakening the hydrogen bond, indicating that the main and side chains of Q1 compete for the main chain CO group of L1 (Fig. S15). We then evaluated the strength of these interactions in terms of electron density at the interaction’s natural bond critical point, ρ(r)38,39. We obtained that in the absence of the scQ1→mcL1 bond the mcQ1→mcL1 bond has an average density of 0.014 au and, in its presence, of 0.008 au. By contrast, even in the presence of the mcQ1→mcL1 bond, the value for the scQ1→mcL1 interaction is instead, on average, 0.017 au, in agreement with the notion that the Gln sidechain can be a better donor than the main chain31. Importantly, the total density to the bifurcate hydrogen bond is on average 0.025 au (Fig. 6c) indicating that the interaction between Q1 and L1 is strong. These results show that the unconventional sci→mci-4 hydrogen bonding interactions revealed in this work are bifurcate with the conventional mcQ1→mcL1 interactions and strong, thus enhancing the stability of polyQ helices.
Discussion
By combining experiments and simulations we have found that unconventional sci→mci-4 hydrogen bonds donated by Gln side chains can stabilize the α-helices formed by polyQ tracts. We also found, moreover, that their strength depends on the residue type of the acceptor: Leu residues are good acceptors while Ala residues are not. These results help rationalize the structural properties of polyQ tracts reported in the recent literature25,40,41. In the AR we found that the four Leu residues flanking the polyQ tract of the AR at its N-terminus are key for helicity25, which we attribute to their high propensity to accept sci→mci-4 hydrogen bonds. The tract of huntingtin, associated with Huntington’s disease, also displays some helicity at low pH40,41, although lower than that observed in the AR. Even though the ability of each particular natural residue type to act as a sci→mci-4 hydrogen bond acceptor remains to be determined, that only the first position in the four residue stretch preceding the polyQ tract in huntingtin is a Leu could explain its lower secondary structure content.
Both in the AR and in huntingtin the helical character of the polyQ tract is not homogeneously distributed and is instead found to gradually decrease from the N to the C-terminus of the tract25,40,41. Our results indicate that this can be explained by a low propensity of Gln residues, relative to that of residues flanking the tracts at their N-terminus, to accept sci→mci-4 hydrogen bonds: unless interrupted by residues, such as Leu, with a high propensity to accept such bonds, helicity will decay towards the C-terminus of the tract. In addition our results provide a mechanistic interpretation of the results obtained by Kandel, Hendrikson and co-workers in their investigation of the effect of increasing the coiled coil character of polyQ tracts by interrupting them with Leu residues2. These authors found that the peptides were fully helical and remained so after dissociation of the coiled coil upon heating to temperatures as high as 348 K due, we propose, to the presence of sci→mci-4 hydrogen bonds with Leu acting as acceptor.
We attribute the high propensity of Leu residues to accept sci→mci-4 hydrogen bonds to the close proximity between the hydrogen bond and the Leu side chain. This can prevent water molecules from hydrogen bonding the interacting moieties and strengthen the sci→mci-4 interaction due to the energetic costs associated to unpaired hydrogen bonding partners35. Dry environments where this can occur include the core of globular proteins42, the interior of cell membranes43 as well as as amyloid fibrils, where equivalent interactions, parallel to the fibril axis, contribute to the stability of the quaternary structure44. In addition it has been shown that both exon 1 of huntingtin45 and the transactivation domain of AR46 can form condensates that define environments of low dielectric constant, where electrostatic interactions may be strongly favored47. It will be interesting to investigate whether interactions such as those described here play a role in the phase separation process of these and similar proteins.
PolyQ tracts are frequently found in transcriptional regulators, particularly in transcription factors1. In several cases their transcriptional activity has been found to depend on the length of the polyQ tracts that they harbor but the physical basis of this phenomenon has not yet been firmly established1,48. Our results provide a possible rationale as they suggest that variations in the length of polyQ tracts would result in changes in the secondary structure of the transactivation domain of transcription factors. Indeed, these can affect the strength of the protein-protein interactions that regulate transcription49, that include interactions with transcriptional co-regulators and with general transcription factors. Whether a certain change in tract length causes a decrease or an increase in activity might depend on whether the polyQ tract and its flanking regions are involved in interactions with transcriptional co-activators or co-repressors and should therefore be context-dependent, as found experimentally48.
A number of highly detailed in vitro experiments have established that the formation of fibrillar aggregates by proteins bearing polyQ tracts can proceed via oligomers50, potentially liquid-like51 stabilized intermolecular interactions between flanking regions of polyQ tracts and equivalent to those stabilizing coiled coils2,52. Since extending the length of the tract increases the helicity of both the tract and its N-terminal flanking region it is conceivable that this will change the secondary structure and, therefore, the strength of the interactions that stabilize o these oligomers as well as, potentially, the rate at which they convert into fibrils. Our data, therefore, suggests that tract elongation can alter the structure and the stability of the oligomers populated on the fibrillization pathway and, as a consequence, modify the rate at which toxic fibrillar species build up14.
In summary we have shown that side chain to main chain hydrogen bonds donated by Gln side chains can cause polyQ tracts to form helices and that the stability of these helices directly correlates with the tract length. This unconventional interaction, due to the high propensity of the carboxamide group of the Gln side chain to donate hydrogens, is so energetically favoured that it can offset the entropic cost of constraining the range of conformations available to the side chain. In addition we have shown that the strength of these interactions depends on the degree to which the Gln side chains are exposed to water, implying that the secondary structure of polyQ tracts may vary depending on solution conditions, oligomerization state and interactions with other molecules. Our findings provide a mechanistic basis for the link that exists between polyQ tract length and transcriptional activity in transcription factors such as the AR and, more generally, between tract length and aggregation via helical oligomeric intermediates in polyQ diseases.
Author contributions
A.E, B.T., J.G., J.A., G.C., D.M. and X.S. performed experiments and simulations, analyzed and interpreted the results. M.B.A.K., G.B., B.E., M.G., R.P., I.F., T.D., O.M., M.O. and R.C. contributed to performing, analyzing and interpreting the results. M.B.A.K, K.L.L., J.A. and M.O. contributed tools. A.E., B.T., J.G., R.C., K.L.L. and X.S. established the hypothesis, designed the experiments and lead their analysis and interpretation. X.S. conceived and led the project and wrote the first draft of the manuscript. All authors contributed to the final version.
Online methods
CD experiments
All synthetic peptides were obtained as lyophilized powders with > 95% purity from Genscript (Piscataway, NJ) with free N and C-termini. They were dissolved in 6 M guanidine thiocyanate (Merck KGaA, Darmstadt, Germany) and incubated under these conditions overnight at 298 K to ensure that the resulting solutions were monomeric. The denaturant was removed by size exclusion chromatography (SEC) in a Äkta Purifier system (GE Healthcare, Chicago, IL) equipped with a Superdex Peptide 10/300 gl column equilibrated in milliQ water with 0.1% trifluoroacetic acid (TFA). The fractions corresponding to the monomeric peptides were collected, pooled and centrifuged at 104000 rpm for 3 h in an Optima TLX tabletop ultracentrifuge equipped with a TLA 120.1 rotor (Beckman Coulter, Atlanta, GA). Sodium phosphate buffer was added to a final concentration of 20 mM and the samples were adjusted to pH 7.4 prior to quantification and analysis by CD. The former was performed by reversed-phase chromatography (RPC), in an Agilent 1200 HPLC system (Agilent Technologies, Santa Clara, CA) equipped with a Phenomenex Jupiter 5μm C18 300 Å column (Torrance, CA) or, for the peptides with a Tyr residue, by measuring the absorbance at 280 nm; the value of the Tyr molar extinction coefficient was 1490 cm−1 M−1. The CD spectra were acquired on 30 μM samples in a Jasco 815 UV spectrophotopolarimeter at 277 K with a 1 mm optical path cuvette and their deconvolution to determine secondary structure propensities was performed with the analysis programme CONTIN together with reference set 7 hosted at DichroWeb1 (dichroweb.cryst.bbk.ac.uk). To estimate the uncertainty in the helicity values obtained in this deconvolution, which relies on an accurate quantification of the peptide concentration, in Figure 5d we plot, in addition to the value obtained without scaling the experimental spectrum, those obtained after scaling it by factors 0.9 and 1.
NMR experiments
Synthetic genes coding for peptides L4Q4 to L4Q20 (Fig. 1A) fused to His6-SUMO and codon-optimized for expression in E. coli were obtained cloned in a pDEST-17 expression vector from GeneArt (Thermo Fisher Scientific, Waltham, MA). The corresponding constructs were expressed in Rosetta E. coli cells in M9 medium containing 15NH4Cl and 13C-glucose as sole nitrogen and carbon sources, obtained from Cambridge Isotope Laboratories, Inc (Tewksbury, MA). After cell lysis, the soluble fractions were purified by IMAC in a Äkta Purifier system (GE Healthcare, Chicago, IL) equipped with a HisTrap HP 5 mL column. The eluted fractions containing the His6-SUMO-tagged peptides were pooled and dialyzed to remove imidazole before digesting them with SUMO protease (0.05 mg/mL). Cleaved peptides were further purified by a second IMAC step and dialyzed against pure milliQ water before lyophilization. The lyophilized recombinant 15N-13C-enriched peptides were treated as the synthetic ones to prepare 100 μM samples for the NMR experiments, which were in all cases carried out in a 600 MHz Bruker Avance spectrometer equipped with a cryoprobe. The samples contained 10 μM DSS for chemical shift referencing. The backbone resonances of peptides L4Q4 to L4Q20 were assigned by using 3D triple resonance experiments (HNCO, HN(CA)CO, HN(CO)CA, HN(CO)CACB) acquired with NUS at 278K. The side chain resonances were assigned with 3D H(CC)(CO)NH, (H)CC(CO)NH experiments. NMR experimental data were processed using qMDD2 for non-uniform sampled data and NMRPipe3 for all uniformly collected experiments. Synthetic peptide L1-4A was prepared as detailed above to a final concentration of 250 μM and characterized by two-dimensional homonuclear (TOCSY and NOESY) and heteronuclear (1H15N HSQC, at natural 15N abundance) experiments. The TOCSY and NOESY mixing times were 70 and 200 ms, respectively.
Molecular dynamics, analysis and trajectory reweighting by maximum entropy
Input coordinates were generated using MacPyMOL in fully helical conformations. All simulations were performed in MD simulation software ACEMD4 by using the CHARMM22*5, that was designed to have an accurate helix-coil balance force field. Each system was explicitly solvated in TIP3P water model inside cubic boxes from 25 Å to 40 Å distance around the peptides, depending on their length, and neutralized with Cl−and Na+ ions. Initial conformations were minimized and equilibrated under NPT conditions at 1 atm and 300K for 1 ns. Production simulations were performed at 300K in the NVT ensemble using a 4 fs time-step for 5μs. The analysis of the secondary structure of individual frames was carried out with DSSP6 and the chemical shifts were back-calculated with the predictor PPM7. The reweighting of the trajectories to match the experimental chemical shifts was carried out by using a Bayesian/Maximum Entropy method8 (code available at: github.com/sbottaro/BME). The BME approach contains a single, free parameter (θ) that determines the balance between fitting the experimental data and not deviating too much from the prior information encoded in the force field. We chose θ=4 for the analysis shown in the main text based on an analysis showing this value to provide a good balance between the two terms (Fig. S9), and show results for other values of θ in Fig. S8.
Hydrogen Bond Criteria
To classify whether two atoms are hydrogen bonded we used angle and distance criteria. Specifically, we define hydrogen bonds as those where the distance between the donor and the acceptor is shorter than 3.4 Å (2.4 Å between H and heavy atom) and the donor hydrogen-acceptor angle is greater than 120°.
Model Structures
After reweighting, we calculated the residue-specific helicity for all of the peptides by using the algorithm DSSP6. For model structure selection, residues that are in the helical conformation more than 50% of the simulation are defined as helical and the rest as random coil. From the simulation the structures that fit to this definition are selected and colored by their average helicity from Figure 3c. Color scale goes from dark blue (0% helicity) to dark red (78% helicity).
QM/MM calculations
The starting structure was selected from the classical MD simulations of L4Q16, preserving the previously defined box of water and ions. The AMBER 16 program9 interfaced to the Terachem 1.9 program (www.petachem.com, accessed June 1, 2017) was used for the QM/MM simulation. QM atoms were described at the BLYP/6-31G* level including a dispersion correction10. The classical subsystem was described with the CHARMM22*5 force field by making use of the Chamber keyword of Parmed program included in AMBERTOOLS 169. The link atoms procedure as implemented in AMBER program was used to saturate the valence of the frontier atoms. Periodic boundary conditions were employed with an electrostatic cutoff of 12 Å. A time step of 1 fs was employed. The structure was first minimized and then equilibrated for 10 ps in a QM/MM-MD run. Then, a production run was performed with a total simulation time of 150 ps. The Natural Bond Critical Point analysis11,12 was performed with NBO 6.0 program13.
Acknowledgements
The authors wish to thank Sandro Bottaro, Ernest Giralt, Gerhard Hummer, Víctor Muñoz and Huan-Xiang Zhou for helpful discussions and the ICTS NMR facility, managed by the scientific and technological centers of the University of Barcelona (CCiT UB), for their help in NMR. K.L.-L. and M.B.A.K acknowledge funding from the Lundbeck Foundation and the BRAINSTRUC initiative. B.T. and J.A. acknowledge, respectively, FPI and Juan de la Cierva fellowships from MINECO. R.C. acknowledges funding from MINECO (CTQ2016-78636-P). X.S. acknowledges funding from AGAUR (2017 SGR 324), Marató TV3 (102030), MINECO (BIO2012-31043 and BIO2015-70092-R) and the European Research Council (CONCERT, contract number 648201). IRB Barcelona is the recipient of a Severo Ochoa Award of Excellence from MINECO (Government of Spain).
References
References
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.