Abstract
PolyProline-II (PPII) helices are defined as a continuous stretch of a protein chain in which the constituent residues have the backbone torsion angle (φ,ψ) values of (-75°, 145°) and take up extended left handed conformation, lacking any intra-helical hydrogen bonds. They are found to occur very frequently in protein structures with their number exceeding that of π-helices, though it is considerably less than that of α-helices and β-strands. A relatively new procedure, ASSP, for the identification of regular secondary structures using Cαtrace identifies 3597 PPII helices in 3582 protein chains, solved at resolution ≤ 2.5Å. Taking advantage of this significantly expanded database of PPII-helices, we have analyzed the functional and structural roles of PPII helices as well as determined the amino acid propensity within and around them. Though Pro residues are highly preferred, it is not a mandatory condition for the formation of PPII-helices, since ~40% PPII-helices were found to contain no Proline residues. Aromatic amino acids are avoided within this helix, while Gly, Asn and Asp residues are preferred in the proximal flanking regions. These helices range from 3 to 13 residues in length with the average twist and rise being -121.2°±9.2° and 3.0ű0.1Å respectively. A majority (~72%) of PPII-helices were found to occur in conjunction with α-helices and β-strands, and serve as linkers as well. The analysis of various intra-helical non-bonded interactions revealed frequent presence of C-H…O H-bonds. PPII-helices participate in maintaining the three-dimensional structure of proteins and are important constituents of binding motifs involved in various biological functions.
- ABBREVIATIONS
- MM
- main chain-main chain
- SM
- side chain-main chain
- SS
- side chain-side chain
- SSE
- secondary structural elements
- H-bond
- hydrogen bond
- ASSP
- assignment of secondary structures in proteins
- STRIDE
- Structural identification
- std. dev.
- standard deviation
- MBA
- Maximum bending angle
- PPII-helix
- PolyProlineII-Helix
Introduction
Regular secondary structure elements (SSEs) play a vital role in the analysis and understanding of the structure and function of proteins. Among these SSEs, the most abundant α-helices and β-strands were first predicted from theoretical studies (Pauling and Corey, 1951; Pauling et al., 1951) and subsequently confirmed by X-ray diffraction analysis (Blake et al., 1965; Perutz, 1951). It is also a well-known fact that a majority of the helices occurring in globular proteins are the right-handed α-helices with 310-helices being a distant second. Though other types of helices such as π-helices and 2.27 helices are also found in protein structures, their abundance is comparatively very less (Kumar and Bansal, 2015a). The left-handed counterparts of α and 310-helices are rarely found (Hung et al., 1998; Novotny and Kleywegt, 2005; Zawadzke and Berg, 1993). However, PPII-helix is an important class of left-handed helices, which contain about three residues per turn (n=−3) and rise per residue being is ~3.2 Å. PPII-helices are extended structures and the backbone amino as well as carbonyl groups of the constituent residues point away from the helical axis leading to the lack of any intra-helical mainchain-mainchain (MM) N-H…O H-bonds. The residues adopt an average backbone dihedral angles (φ,ψ) of (-75°,+145°) that overlap well with that of the allowed conformational space for Pro residues. Hence, Proline becomes an obvious choice as a constituent residue and Pro-rich regions are often observed to take PPII-helix in protein structures. However, it has been shown that the PPII-helices are also found in the regions of protein or polypeptide that lack significant presence or are completely devoid of Proline residues (Adzhubei et al., 2013). Soman and Ramakrishnan in 1983 (Soman and Ramakrishnan, 1983) reported the presence of PPII-helix in bacteriochlorophyll a-protein and have been since found to occur very frequently in peptide (Makarov et al., 1992) and protein structures (Adzhubei et al., 1987; Stapley and Creamer, 1999). Even the analyses of backbone conformations of individual residues in protein structures deposited in Protein Data Bank (PDB) (Berman et al., 2000) had shown that the PPII and β-strand conformation have comparable occurrences (Adzhubei et al., 1987). Spectroscopic methods have also shown that PPII-helix conformation is a major backbone conformation in unstructured or unordered proteins (Krimm and Tiffany, 1974; Shi et al., 2002). Moreover, it has also been shown that the switch between PPII-helix to either of right-handed α-helix or β-strands can be enabled at the residue level by changing only one dihedral angle (Adzhubei and Sternberg, 1993).
Structural (Beck and Brodsky, 1998; Kieliszewski and Lamport, 1994; Wu et al., 2001) and functional (Chellgren and Creamer, 2004; Kay et al., 2000; Siligardi and Drake, 1995; Stapley and Creamer, 1999) roles of PPII-helices have been discussed in various reports (Williamson, 1994). For example, PPII-helices are shown to play an important role in protein-protein (Kay et al., 2000; Siligardi and Drake, 1995) as well as protein-nucleic acid (Hicks and Hsu, 2004) interactions. The growing number of protein structures with functional and structural roles mediated by PPII-helices has attracted the development of different algorithms like PROSS (Srinivasan and Rose, 1999), XTLSSTR (King and Johnson, 1999), SEGNO (Cubellis et al., 2005), DSSP-PPII (Mansiaux et al., 2011) and ASSP (Kumar and Bansal, 2015b). A dedicated database, PolyprOline, for PPII-helix assignment and analysis has also been developed (Chebrek et al., 2014). Theoretical surveys of PPII-helices in proteins have highlighted their importance (Adzhubei et al., 2013; Brown and Zondlo, 2012; Stapley and Creamer, 1999; Vila et al., 2004). Amino acid propensities for each residue to occur in PPII-helix are also analyzed and compared with those in different SSEs like α-helix, 310-helix and β-strands (Berisio et al., 2006). Host-guest propensity values at the residue level have been analyzed (Kelly et al., 2001), but there is high probability of getting a single residue in PPII conformation, rather than a PPII-helix. However, amino acid propensities in a coiled library made by very restricted values of dihedral angles (φ,ψ) for each SSE showed a good correlation with experimentally derived values (Jha et al., 2005).
Though the importance of PPII-helices in different proteins have been highlighted, a detailed and systematic sequence and structural analysis is essential for the understanding of the principles governing their functions and structural importance. In the present study, we have addressed various questions: i) What are the structural and functional roles played by PPII-helices? ii) What are the factors stabilizing these PPII-helices? iii) What is the position-wise residue propensity within and around PPII-helices? iv) How dissimilar are the position-wise residue propensities within and around PPII-helices and isolated strands? v) Do flanking major secondary structures like α-helices and β-strands affect the position-wise amino acid propensity within and around PPII-helices and finally vi) How different are the PPII-helices, which contain Proline or lack Proline residues? We believe that analyses of these features will provide a better understanding of the PPII-helices and their biological roles.
Methods
Composition of Dataset
A dataset of 7957 high-resolution, quality-filtered protein chains termed as top8000 was taken from the Richardson lab (http://kinemage.biochem.duke.edu/databases/top8000.php). Protein chains with length 70 residues and sequence identity >25% (Wang and Dunbrack, 2003) were removed, reducing the number to 3582 chains with 837027 residues.
Calculation of various local geometric parameters
The computation of various local parameters viz. twist, vtor, rise and radius for each step of four Cα atoms has been described in earlier publications (Kumar and Bansal, 2012; Kumar and Bansal, 2015b). A window of four Cα atoms slides along the length of the protein, one Cα atom at a time. The bending angle at a residue ‘i’ is defined as an angle between the two local helix axes of helical turns encompassing residues from (i-3 to i) and (i to i+3). These bending angles help in identifying the geometry of the overall helix.
Softwares used
Various helices and β-strands were identified using ASSP and STRIDE. The non-bonded interactions were calculated using MolBridge (Kumar et al., 2014) with default cut-off values. Interactions were further divided into three categories, namely i) main-chain…main-chain (MM), ii) main-chain…side-chain (MS) and iii) side-chain…side-chain (SS). Figures were generated using PyMOL (Schrodinger, 2010) and MATLAB (MATLAB, 2010). Solvent accessibility was calculated using NACCESS (Hubbard and Thornton, 1993).
PPII-Helix assignment and position nomenclature
Various helices were identified using ASSP and the corresponding numbers are listed in Table 1 along with the median and mean length. A total of 3597 PPII-helices were identified. Algorithm for the identification of PPII-helices has been discussed in the supplementary file. Since the majority of PPII-helices were three residues long, only five positions (N', Ncap, N1, Mid, C1, Ccap and C') within and around them were considered (Figure 1). Three positions (N1, Mid and C1) constitute a helix while the remaining four flanking positions are non-PPII-helical. Unless otherwise stated, N2, N3, N4, C2, C3 and C4 should be considered as a part of ‘Mid’. Unlike other positions, ‘Mid’ may contain one or more residues per helix.
Refinement of the dataset consisting of PPII-helices
The comparative analyses of residues in PPII-helices and those in STRIDE or DSSP assigned β-strands have been shown to have small overlap (Kumar and Bansal, 2015b; Mansiaux et al., 2011). To get rid of any ambiguity in the dataset, we compared the residues in PPII-helices to their corresponding assignment by STRIDE. A PPII-helix was considered for further analysis only when more than ‘N/2’ residues of a PPII-helix of length ‘N’ are not identified as a part of β-strand(s) by STRIDE. Representative examples are given in Supplementary Figure SF1. The final dataset consisted of 2879 PPII-helices. Position-wise residue propensity within and around 3516 PPII-helices in the remaining 4375 protein chains were also calculated and compared with those of 2879 PPII-helices in the 3582 protein chains. Both were found to be similar.
Different categories of PPII-helices
Since the majority of PPII-helices were 3 and 4 residues long, they were divided into 2 sets namely i) PPII3-4(PPII-helices of length 3 and 4 residues) and ii) PPII>4 (PPII-helices of length > 4 residues). A majority of the PPII-helices were found in conjunction with major SSEs like α-helices or β-strands. Hence, the PPII-helices were divided into four categories based on their occurrence with respect to these major SSEs in a protein chain (Figure 2). If a residue at position C1 of a PPII-helix is the Ncap/N' of the succeeding α-helix or β-strand, the PPII-helix is classified as ‘Nter’, whereas those PPII-helices with N1 being the C'/Ccap of the preceding α-helix or β-strands are termed ‘Cter’. A PPII-helix, sandwiched between two α-helices/β-strands, was categorized as ‘Inter’ (Interspersed), while remaining helices were grouped under ‘Ind’ (Independent) category. Total number of PPII-helices in each category has been listed in Table 2. PPII-helices were also categorized based on the presence of Proline residues. PPII-helices with Pro residues were termed as Pro+, while those not containing any Proline were named Pro-. A total of 1798 (62.5%) PPII-helices were part of Pro+ dataset, while other 1081(37.5%) formed Pro-dataset.
Identification of isolated strands
Isolated strands are not part of any β-sheet and like PPII-helices, they also lack any intra MM N-H…O H-bond patterns as the backbone N-H and C=O groups of the constituent residuespoint away from the helix axis. ASSP identified strands of length > 4 residues (βASSP) were compared to those determined by STRIDE. The βASSP, which are not being identified partially or entirely by STRIDE, were considered and subsequently checked for the presence of MM H-bonds also. A total of 393 βASSP were found to occur in isolation in protein structures.
Position-wise propensity of residues
Distribution of residues was computed for the 2879 PPII-helices for all 5 positions. Position-wise propensities (Pij) for these positions (N'-C') were calculated using the formula used by Kumar and Bansal (Kumar and Bansal, 1998). At a particular position, the preference of residues was determined by ranking them in decreasing order of their propensity values. In order to identify significant changes in the proportion of residues occurring at a particular position, we followed the same methodology as described previously (Kumar and Bansal, 1998).
Results and Discussion
PPII-helices commonly occur in globular proteins
ASSP identifies 2879 PPII-helices in 1678 protein chains out of a dataset of 3582 protein chains, with the lactoperoxidase protein from Bos taurus (PDB ID: 3NYH: A) containing nine, the maximum number of PPII-helices. Almost every second protein chain was found to have at least one PPII-helix. A total of 9801 (1.2%) residues were involved in the 2879 PPII-helices, which is more than the 3946 (0.5%) residues occurring in 659 π-helices in the same dataset. The length of PPII-helices identified by ASSP ranges from 3 to 13 amino acids, with three residue long PPII-helices constituting almost 72% (2080/2879) of the total. The length and (φ, ψ) distribution of the residues from N1 to C1 positions along with the plot of twist vs. rise and twist vs. radius for 2879 PPII-helices are shown in Figure 3. The mean (μ) and std. dev. (σ) of various parameters are listed in Table 3. The mean twist and rise values were found to be -121.8° and 3.0Å with std. dev. of 8.6° and 0.1Å respectively.
PPII-helices were also checked for their location with respect to the two major SSEs namely, α-helix and β-strand. A total of 531 and 330 PPII-helices were found at the N- and C-terminus of α-helix respectively, while 556 and 650 PPII-helices occurred at the N- and C-terminus of β- strand respectively. Approximately 26% (750) PPII-helices were found to occur independently in protein chains. The presence of PPII-helices interspersed between two major SSEs (Figure 2) suggests that they can often serve as a linker between two SSEs.
Stabilization of helix termini of PPII-helices
Due to their short length as well as both backbone N-H and C=O groups pointing away from the helical axis, residues in PPII-helices are not involved in intra-helical MM N-H…O H-bonds. These free N-H and C=O groups are often found to form non-bonded interactions with the side chains of flanking non-helical residues, giving rise to characteristic motifs with specific H-bonds and/or (φ-ψ) patterns. Apart from this, the exposed carbonyl or amino groups often extend into the solvent and hence get stabilized by forming a regular network of H-bonds (Kelly et al., 2001; Sreerama and Woody, 1999). The presence of PPII-helix conformation in a seven residue long oligopeptide in a solvent suggests that apart from the electrostatic forces, interactions with the solvent also play a significant role in their stabilization (Rucker and Creamer, 2002). In this article we have discussed C-H…N, C-H…O and N-H…O interactions involving the constituent and flanking residues of the PPII-helices.
The importance of C-H…N non-bonded interactions in determining the crystal structures (Mazik et al., 2000) or molecular packing and conformation (Berkovitch-Yellin and Leiserowitz, 1984; Pickering et al., 2005) of small molecules has been reported, but their role in macromolecules has not been studied. We find that the a total of 2244 Cβ atoms of (i-1)th residue form C-H…N non-bonded interactions with the backbone amino group of the ith residue (Figure 4A and B). Whereas, 1335 Cγ of (i+2)th residues were found to be involved in C-H…O H-bonds with the ith residue. Average donor (D)-Acceptor (A) distances for C-H…N and C-H…O interactions were found to be 3.2Å and 3.7Å respectively, whereas D-H…A angles were found to be 95.6° and 161.3° for C-H…N and C-H…O interactions respectively.
The importance of capping motifs in stabilizing the C-terminus of various right handed helices has been studied in detail (Kumar and Bansal, 2015a; Kumar and Bansal, 1998). However, the capping of the PPII-helices is not analyzed to the best of our knowledge. Quite often, the C=O group of C2 residues is capped by the neighboring residues primarily at Ccap and C' positions. At the same time, C1 is shielded by the residues at C' and C''. A total of 230 residues at C2 and 683 residues at C1 were found to be forming MM NH(i+3)→Oi N-H…O H-bond with flanking non-helical C' and C'' positions respectively (Table 4), suggesting a possibility of β-turns. In order to validate, residues from C2-C' (first case) and C1-C'' (second case) showing NH(i+3)→O iMM N-H…O H-bond were considered. Turns are characterized by a specific combination of (φ,ψ) values for the 2nd and 3rd constituent residues. Hence, C1 and Ccap have been taken into account in first case, while Ccap and C' in the second case (Supplementary Figure SF2). The average (φ,ψ) of non-Pro and non-Gly residues at Ccap and C' in 2nd case was found to be (-56.6°,-18.2°) and (-64.2˚,-24.7˚) respectively. The mean values for both the positions are similar to the (φ,ψ) values required to form type-III β-turn. Hence, it confirms the role of these turns in capping of the C-terminus of PPII-helices (Figure 4C-D). Additionally, the involvement of C2 and C1 residues in SM N-H…O H-bonds was also investigated (Table 4). C' and C'' residues were found to be forming the most number of SM N-H…O H-bonds with C2 and C1 respectively. We conclude that these non-bonded interactions at the C-terminus shield the exposed carbonyl group of the residues and hence stabilize the PPII-helix. However, at the N-terminus of PPII-helix, no such motif was observed and residues were found to be forming H-bonds with the spatially nearby residues or solvent.
Functional and structural roles played by PPII-helices in proteins
PPII-helices either directly or indirectly play a major role in facilitating the biological functions and help in maintaining the three-dimensional structure of proteins. The free backbone N-H and C=O groups and extended structure of these helices are essential in facilitating their specific roles. The structural and functional roles of PPII-helices, especially Pro+, have been highlighted in various reports (Hicks and Hsu, 2004; Siligardi and Drake, 1995) and reviews (Adzhubei et al., 2013; Williamson, 1994). However, here, we have emphasized on various roles played by the ASSP identified PPII-helices in few representative protein structures.
Functional importance of PPII-helices
PPII-helices are often found at the surface of the protein and many a time assist protein-protein interactions. For example, a 36-residue polypeptide, avian pancreatic polypeptide from Meleagris gallopavo (PDB ID: 1PPT) shows hormonal properties and consists of an α-helix as well as a PPII-helix (Blundell et al., 1981). Both α-helix and PPII-helix run anti-parallel to each other and mainly consist of hydrophobic residues. Two monomers of the protein interact to each other to give a dimer with a hydrophobic core. The formation of hydrophobic core in this molecule is facilitated by the packing together of nonpolar groups, which are constituent residues of PPII-helix and α-helix (Figure 5A).
PPII-helix also facilitates the formation of tetramerization domain of acetylcholinesterase (AchE) by allowing the four-fold interaction of a WWW motif (PDB ID: 1VZJ) (Dvir et al., 2004). AChE enzymes hydrolyze the acetylcholine and hence terminate the signal transmission at cholinergic synapses. Functional localization of AChE in vertebrate muscle and brain is facilitated by the interaction of the tryptophan amphiphilic tetramerization (WAT) sequence, at the C‐terminus of its major splice variant (T), with a Pro‐rich attachment domain (PRAD; Chain: D), of the anchoring proteins, collagenous (ColQ) and Pro‐rich membrane anchor (Dvir et al., 2004). ASSP assigns PPII-helical conformation to the PRAD (Thr3-Pro13), (Figure 5B), as suggested earlier (Dvir et al., 2004). The main-chain C=O of the residues from PPII-helices were found to form H-bond interactions with the side-chain of the constituent residues of the WAT coiled coils and hence contribute to the overall stability of the WAT/PRAD complex. It has also been reported that the mutation in the PRAD domain disrupts the crucial WAT–WAT and WAT–PRAD interactions (Dvir et al., 2004). Programs like SEGNO and PROSS identified the part of PRAD as PPII-helix, while XTLSSTR was failed to do so.
PPII-helices are even observed to be a part of the structural motif that constitutes DNA-binding regions. Type-II restriction endonuclease HincII enzyme identifies specific short DNA sequence GTYRAC (Y and R indicate any pyrimidine and purine respectively) and perform the endonucleolytic cleavage to give specific double-stranded fragments with terminal 5'-phosphates. The side chain of Gln138 intercalates into the bound DNA just outside of the 6 bp recognition sequence that induces distortions like bending, unwinding, and a shifting of the base planes into the minor groove (Joshi et al., 2006). HincII uses both direct and indirect readout for its activity. The structure of Q138F mutant (PDB ID: 3E45) elucidates the mechanism of indirect readout (Babic et al., 2008). ASSP identifies four residues long PPII-helix from Phe138-Asn141 with π…π stacking between the aromatic ring of Phe138 and Cyt10. Ala139 and Asn141 were also found to form specific non-bonded interactions with the bases Cyt10 (N-H…O) and Ade9 (N-H…N) respectively (Figure 5C). The extended conformation of PPII-helix facilitates specific protein-DNA interactions.
The Pro-rich gliadin peptides naturally adopt a PPII helical conformation and bind to MHC class II molecules and upon deamidation affect the binding as well as increase the immunogenicity (Kim et al., 2004). The protein complex of HLA-DQ2 with an immunogenic epitope from gluten (PDB ID: 1S9V) deciphered the immune-pathogenic basis of celiac disease by studying the interactions between them (Kim et al., 2004). Nine residues long PPII-helix (Gln2-Pro10) binds to the peptide binding groove at the N-terminal domains of DQ2 (Figure 5D). Residues Gln6, Glu8, and Leu9 were found to occupy the P4, P6 and P7 pockets of DQ2 respectively (Kim et al., 2004). The constituent residues, especially Glu8, were found to be involved in various non-bonded interactions. The conformation of the peptide along with the polar nature of component residues facilitates the interaction and hence makes this epitope an excellent ligand for HLA-DQ2.
The structure of TREX1 enzyme (PDB ID: 2OA8) with 3'→5' exonuclease activity reveals an 8-amino acid PPII-helix, suggesting a mechanism for interactions with other protein complexes (De Silva et al., 2007). ASSP assigns PPII-helix to 6 residues (Pro55-Pro60) long Pro-rich segment with a 3-fold symmetry (Figure 5E). PPII-helices in TREX1 dimer have been illustrated to function as interaction motifs with other proteins containing SH3, WW or EVH1 domains (Zarrinpar et al., 2003). The distance between PPII-helix of each monomer of a dimer was found to be 20Å and positioned on the same face of the dimer. Hence, the positioning of Pro-rich PPII-helix plays a key role in protein-protein interaction for the TREX1 protein (De Silva et al., 2007).
Structural importance of PPII-helices
PPII-helices can act as a linker between two structural or functional domains. For example, parvalbumin molecule belongs to EF-hand_7 (PF13499) family (Punta et al., 2012) and has helix-loop-helix topology. In a crystal structure of rat α-parvalbumin (PDB ID: 1RWY: B) (Bottoms et al., 2004), ASSP identified PPII-helix (Ala74-Ser78) that is a part of linker connecting the two functional domains namely CD and EF (Figure 6A). These domains show Ca2+ binding activity and they have been reported to be related by a twofold symmetry axis (Kretsinger and Nockolds, 1973). The extended conformation of the PPII-helix enables the residue Arg75 to form a salt-bridge with Glu81 that plays a vital role in stabilizing the loop region joining two helical segments.
Formylmethanofuran dehydrogenases are multi-subunit enzymes that catalyze the first step in the formation of CH4 from CO2 in methanogenic and sulfate-reducing microorganisms (Thauer et al., 2008). They also contain tungsten or molybdenum as well as iron–sulfur clusters. Crystal structure of tungsten formylmethanofuran dehydrogenase subunit-e (PDB ID: 3D00:A; fmde; Pfam ID: PF02663)-like protein from syntrophus aciditrophicus have been reported to have PPII-helix (Axelrod et al., 2010). The C-terminal domain is anchored to the N-terminal domain with the help of an eleven residues linker and ASSP assigns PPII-helix (Gln155-Lys161) to a part of it (Figure 6B). The extended structure of the PPII-helix allows the two domains to be separated optimally to form a dimer with swapped-domains. To our surprise, in both the cases, the PPII-helices were found to contain only non-Pro residues.
PPII-helices have also been shown to provide local order, flexibility or chain hydration. It is also one of the predominant conformational states. Spectroscopic data analyses (Shi et al., 2006), CD (Whittington et al., 2005) and solid-state NMR analyses (Hu et al., 2009) have indicated that the PPII-helices maintain local order in largely unfolded proteins or peptides. Additionally, PPII-helices also play a vital role in maintaining a 3D structure of the proteins associated with conformational diseases (Blanch et al., 2000; Blanch et al., 2004; Syme et al., 2002) such as Alzheimers.
PPII-helices are also shown to be an integral part of snow flea antifreeze protein (PDB ID: 3BOI) (Pentelute et al., 2008). The structure consists of 6 PPII-helices that are stacked in two sets of three and form a compact brick-like structure. In another example, the Ala-rich domain and Pro-rich segment in A3VP1 of AgI/II of Streptococcus mutans (PDB ID: 3IPK) adopt an extended α-helix and PPII-helix respectively (Larson et al., 2010). Both α-helix and PPII-helix interlock with each other to form a highly extended stalk-like structure (Larson et al., 2010) that can extend over 50 nm in length.
Residue preferences in PPII-helices depend on the presence of Pro
The position-wise propensity of 20 residues to occur at each of the five positions of PPII-helices of different lengths (PPII 3-4and PPII>4; defined in Methods) were calculated and analyzed (Figure 7).
PPII>4 showed a preference for Cys and Asn at N'. However, at Ncap position, Gly was found to be the most preferred residue for both length dependent categories of PPII-helices. Additionally, PPII>4 showed a preference for His also. As expected, our analyses showed a remarkably high preference for Pro residues at all the helical positions (N1-C1) of PPII-helices as suggested earlier (Berisio et al., 2006; Jha et al., 2005; Kelly et al., 2001; Stapley and Creamer, 1999). However, in the case of PPII>4, Arg is also preferred at N1, while Ala and Lys are preferred at Mid positions. At C1 position of PPII>4 Asn, Ser and Thr were also preferred. Surprisingly, both PPII3-4 and PPII>4 showed a preference for Gly at Ccap position. Additionally, PPII>4 were found to prefer Gln and Val also. Polar residues like Asn and Glu are preferred by both PPII3-4 and PPII>4 at C' position. In addition to it, PPII3-4 showed a preference for Asp and Ser also. Contrary to the previously reported significant correlation between the amino acid propensities for PPII-helices and α-helices (Berisio et al., 2006), results from our analyses suggest that the position-wise amino acid propensity at various helical positions in PPII-helices is different than that of α-helices (Kumar and Bansal, 2015a; Kumar and Bansal, 1998) or π-helices (Kumar and Bansal, 2015a). α-helices have unique preference for amino acids at various positions, while PPII-helices have the overwhelming preference for Proline at almost all the positions. However, π-helices prefer to have hydrophobic or aromatic residues.
Almost 40% of the total PPII-helices were found to contain residues other than Pro (Pro-). The longest Pro- in our dataset was found to be 11 residues long (Gly15-Gly25) in antifreeze protein (PDB ID: 3BOI). The structural and functional roles of such Pro- have been discussed in different proteins (Adzhubei et al., 2013). The position-wise propensity of 19 remaining residues in such helices was calculated, analyzed and compared with Pro+ and α-helices of length < 9 residues (α4-8). It was observed that the higher preference for Pro residues in Pro+ has been compensated by the abundance of polar residues in Pro- (Figure 8). Ncap position for both Pro+ and Pro-showed a preference for Gly residue; whereas mainly polar residues were preferred in (α4-8). However, Asn and Lys are also preferred at Ncap of Pro-. At N1, Pro-showed a higher preference for Arg, Glu, Lys and Met residues, while α4-8 preferred Pro and Trp. At C1, the elevated preference for Pro in Pro+ was compensated by higher propensity for Asn, Ser and Thr. However, Leu, Asn and Gln were favored in α4-8. Lys, Glu and Ala have higher position-wise amino acid propensity value at Mid position of Pro-, while α4-8 preferred Leu, Glu and Ala. Both Pro- and Pro+, preferred Pro at the Ccap position, while Gly has a significantly high preference in α4-8.Surprisingly, at C', both categories of PPII-helix did not show any significant difference. However, Pro with position-wise propensity, Pij=1.9 was found to be the most preferred residue at C' of α4-8.
The preference for polar residues in PPII-helices led us to investigate their solvent accessibility. A residue was considered solvent exposed when the relative accessibility value is ≥ 7 for all atoms. It was observed that in ~87% (2489/2879) of PPII-helices, more than 50% constituent residues are solvent exposed. A total of 390 PPII-helices were found to be solvent occluded. Even, these 390 PPII helices have a preference for Pro at different helical positions. Additionally, Met and Cys at N1 and Mid respectively have substantial position-wise amino acid propensity values. Surprisingly, at Ncap, the propensity for Pro was not found to be significant. It was compensated by a variety of residues like Gly, Thr, Phe, Asn, Arg and Gln. However, Ccap showed high position-wise propensity values for Gly and Pro.
Comparison of PPII helices and isolated extended strands
Isolated strands (βASSP) are not part of any β-sheet and lack MM N-H…O H-bonds. Hence, βASSP is similar to the PPII-helices in terms of absence of any MM N-H…O H-bond pattern. It motivated us to compare the position-wise amino acid propensity within and around βASSP and PPII-helices. A total of 225 PPII-helices and 393 βASSP of length > 4 residues were considered (Figure 9). Since both PPII >4and βASSP are of minimum five residues length, five helical (N1, N2, Mid, C2 and C1) as well as four non-helical (N', Ncap, Ccap and C') positions were considered for the comparison of amino acid propensity values. Interestingly both SSEs show a high preference for Proline residues at all the helical positions, but it was invariably found to be higher for PPII>4.
Lys and Pro showed significantly higher propensity at N' for βASSP compared to Cys in PPII>4. However, Gly and Asn are preferred in both. At Ncap both showed the preference for Gly. N1 position in βASSP preferred Asn and His residues. N2 showed a higher preference for hydrophobic residues (Ile, Leu and Val) and aromatic amino acids (Thr, Tyr and Trp) in βASSP compared to His in PPII>4. At Mid, β-branched residues like Ile and Val are preferred in βASSP along with aromatic residues Thr and Tyr. Val and Phe are preferred at C2 in βASSP, while Lys in PPII>4. Unlike other intra-βASSP positions, C1 and Ccap showed a preference for polar residues. Asp is preferred at C1, while Asn, Ser and Thr are favored at Ccap. At C', Asp and His were found to be preferred. The preference of polar residues at helical positions N1 and C1 of βASSPcan be attributed to the formation of SM H-bonds leading to their stability.
Conclusions
A total of 2879 PPII-helices were identified by ASSP in a dataset of 3582 protein chains, suggesting that PPII-helices occur quite frequently in proteins. Though Pro residues are preferred in PPII-helices, almost 40% of total helices do not have any Proline. Our analysis suggests that the type III β-turn is a common capping motif at the C-terminus of PPII-helices. PPII-helices in proteins mediate various structural and functional roles. PPII-helices show characteristic preference for specific residues at various helical and flanking non-helical positions. It is different from that of other right-handed helices like α or π. Gly was found to be preferred at both N- and C-terminus of PPII-helix. The higher preference of Pro in Pro+helices was compensated by the preference for polar residues in Pro-. Though isolated strands and PPII-helices both lack any intra-helical MM N-H…O H-bonds, they differ from each other in their amino acid preference at various positions.