Abstract
K13 is an essential Plasmodium falciparum protein that plays a key role in malaria resistance to artemisinins. Although K13 resembles BTB- and Kelch/propeller-containing proteins involved in ubiquitin ligase complexes, its functional sites remain uncharacterized. Using evolutionary and structural information, we searched for the most conserved K13 sites across Apicomplexa species evolution to identify sub-regions of K13 that are likely functional. An amino acid electropositive ‘patch’ in the K13 propeller domain has a dense concentration of extraordinarily conserved positions located at a shallow pocket, suggesting a role as binding surface. When applied to experimentally-characterized BTB-Kelch proteins, our strategy successfully identifies the validated substrate-binding residues within their own propeller shallow pocket. Another patch of slowly evolving sites is identified in the K13 BTB domain which partially overlaps the surface that binds to Cullin proteins in BTB-Cullin complexes. We provide candidate binding sites in K13 propeller and BTB domains for functional follow-up studies.
Introduction
Current efforts to control malaria are threatened by the spread in Southeast Asia (SEA) of Plasmodium falciparum parasites that are resistant to artemisinin derivatives (ARTs)1. Treatment failures are now reported in some geographic areas of SEA for the current front-line ART-based combination therapies1–3. In Africa, ART resistance (ART-R) is not yet established4,5, although some P. falciparum parasites from Uganda and Equatorial Guinea exhibiting high survival rate have been described6,7.
In parasites from SEA, ART-R is primarily conferred by single non-synonymous mutations in the P. falciparum k13 (pfk13) gene8,9. Multiple pfk13 ART-R mutations have emerged concomitantly in the early 2000’s until a specific, multidrug-resistant lineage carrying the C580Y mutation became the most common, especially in the East Thailand-Cambodia-Lao PDR-Vietnam region4,8,10–12. The ART-R phenotype is defined as parasites exhibiting in vivo a delayed clearance time following an ART-based treatment1 and in vitro an increased survival rate following a brief exposure to a high dose of ART13.
The pfk13 gene encodes a 726 amino acid protein (PfK13) which is essential at least during the intraerythrocytic parasite blood stage14,15. Both the gene ontology annotation of pfk13 and the study of ART-resistant parasites carrying a pfk13 mutation suggest that this protein has some regulatory functions at the protein level16,17. At the cell level, pfk13 mutant parasites decelerate their development during the early intraerythrocytic stage13,18. At the molecular level, they exhibit an increased expression of unfolded protein response pathways19, lower levels of ubiquitinated proteins17,18, and phosphorylation of the parasite eukaryotic initiation factor-2α (eIF2α) which correlates with ART-induced latency20. There are some indications of the interactors with PfK13 that partially clarify its function. For example, PfK13 was immunoprecipitated with the phosphatidylinositol 3-kinase (PI3K)17. Ubiquitination of PI3K is also decreased in pfk13 C580Y mutants, resulting in wide phosphatidylinositol 3-phosphate-related cellular alterations17,21 which roles in ART-R are still to be characterized.
PfK13 is related to the BTB-Kelch structural subgroup of Kelch-repeat proteins8,22,23. It possesses a BTB domain (Broad-complex, tramtrack and bric-à-brac; also known as BTB/POZ; amino acids 350-437) and a C-terminal propeller domain (also known as Kelch domain; amino acids 443-726) composed of six repeated Kelch motifs (PDB code 4YY8, unpublished). However, PfK13 exhibits specific features when compared to typical BTB-Kelch proteins, such as a poorly conserved Apicomplexa-specific N-terminal region, a coiled-coil-containing (CCC) domain (amino acids 212-341) located upstream of BTB, and an absence of BACK domain (for BTB And C-terminal Kelch) often found between BTB and Kelch domains23–25. Both BTB and propeller are protein domains known to carry binding functions. Nearly all pfk13 mutations associated with ART-R – including the C580Y mutation – cluster in the propeller domain8,11, suggesting a major functional role of this domain.
Many proteins harboring a BTB domain are found in multi-subunit Cullin-RING E3 ligase complexes in which a substrate protein will be ubiquitinated and then degraded by the proteasome24,26–28. In those complexes, BTB mediates varying oligomerization architectures and also contributes to Cullin recruitment. The propeller domain often serves as the substrate protein receptor in BTB-Kelch and other proteins23,24. It usually comprises four to six repeated β-stranded Kelch motifs – also named blades – arranged in a circular symmetry (the six blades in PfK13 are named I to VI). The loops protruding at the bottom face of propeller form a shallow pocket involved in the binding of the substrate protein subsequently ubiquitinated and targeted for degradation24. For example, the propeller shallow pocket of BTB-Kelch KEAP1 and KLHL3 directly binds to the transcription factor Nrf2 and the kinase WNK, respectively, and controls their ubiquitination29,30. PfK13 may exhibit similar functions, however, its binding regions and functionally important sites remain poorly characterized.
Here, we hypothesize that functionally important sites of K13 have evolved under stronger purifying selection and could be identified by the analysis of k13 molecular evolution across 43 Apicomplexa species. To examine this, we inferred substitution rates for each amino acid site of the K13 sequence, taking into account the k13 phylogeny and the spatial correlation of site-specific substitution rates in the protein tertiary structure when possible. We identified a major functional patch of slowly evolving sites located at the bottom face of K13 propeller that form part of the shallow pocket. To show the relevance of our approach, we applied it to the propeller domain of four well-characterized BTB-Kelch proteins (KEAP1, KLHL2, KLHL3 and KLHL12) and successfully identified the functionally and structurally validated substrate-binding residues located in the pocket of these propeller domains. Another patch of slowly evolving sites was also identified in the BTB domain of K13 which partially overlaps with the surface that binds to Cullin proteins in known BTB-Cullin complexes27. Altogether, these findings support a crucial role of K13 in binding partner molecules and predict specific binding sites.
Results
K13 sequence sampling, multiple alignment and phylogenetic reconstruction
Forty-three complete amino acid sequences from distinct Apicomplexa species were unambiguously identified as orthologous to PfK13 in sequence databases, encompassing 21 Plasmodium and 22 other Apicomplexa K13 sequences (Cryptosporidia, n = 7; Piroplasmida, n = 7; and Coccidia, n = 8). The length of K13 protein sequences ranged from 506 (Eimeria brunetti) to 820 (Hammondia hammondi) amino acids (Supplementary Table 1). By visual inspection, the three annotated domains of K13 (CCC, BTB and propeller) were conserved (see the K13 sequence alignment in Supplementary Fig. 1), whereas the N-terminal region preceding the CCC domain appeared much more variable among sequences, even being absent in some of them. Since K13 sequences aligned poorly in that region, the first 234 amino acid positions of the K13 multiple alignment were removed, along with other positions showing a high divergence level and/or a gap enrichment among sequences (32 amino acid positions; Supplementary Fig. 1). The final K13 multiple sequence alignment contained 514 amino acid positions which covered the whole CCC, BTB and propeller domains. The average pairwise sequence identity in that cleaned K13 sequence alignment was 64.6%, ranging from 48.6% for the Babesia bigemina-Cryptosporidium ubiquitum comparison to 99.2% for the P. chabaudi spp. pair.
The maximum-likelihood phylogenetic tree built from the curated alignment of the corresponding Apicomplexa k13 cDNA sequences revealed four monophyletic groups: Cryptosporidia, Plasmodium, Piroplasmida and Coccidia, all supported by high bootstrap values (≥ 98%; Supplementary Fig. 2). The group of Hematozoa k13 sequences appeared as paraphyletic (bootstrap value = 100%), with Piroplasmida unexpectedly clustering with Coccidia (Supplementary Fig. 2). The phylogenetic relationships of Plasmodium k13 sequences were largely consistent with the acknowledged phylogeny of Plasmodium species, except for bird-infecting Plasmodium k13 sequences (P. gallinaceum and P. relictum), which appeared related to human-infecting P. ovale spp. sequences, although this grouping was poorly supported (bootstrap value = 47%; Supplementary Fig. 2).
The k13 sequence has evolved under strong purifying selection
To evaluate the selective pressure acting on k13, we used codon substitution models to estimate the rate of non-synonymous to synonymous substitutions, ω = dN/dS, across codon sites of the k13 sequence (site models) and branches of the k13 phylogeny (branch models). A series of nested likelihood ratio tests (LRTs) using different sets of site and branch models were carried out using the codeml tool from the PAML package31. When applied to the k13 codon alignment, LRTs of codon and branch models indicated varying substitution rates ω both among codon sites of the k13 sequence (M0-M3 comparison, p = 3.3 × 10−225) and among branches of the k13 phylogeny (M0-FR, p = 1.9 × 10−53; Table 1 and Supplementary Table 2). This suggests that k13 has evolved under a variable selective regime both across codon sites and lineages. No evidence of positive selection was found in any of the tree branches (Supplementary Fig. 3). Similarly, site models incorporating positive selection (M2a and M8) provided no better fit to the data than those including only purifying selection and neutral evolution (M1a and M7: 2ΔlnL = 0 in both cases; Table 1), thus supporting an absence of detectable adaptive selection events at any k13 codon site over the long time scale of Apicomplexa evolution. Altogether, the data indicate that much of the K13 protein, except the N-terminal region, has been strongly conserved over evolutionary time.
When considering the one-ratio PAML values of 3,256 protein-coding genes previously estimated by Jeffares and colleagues using six Plasmodium species32, k13 ranked among the 5% most conserved protein-coding genes of the Plasmodium proteome (rank: 140/3,256; Supplementary Fig. 4a). Since a significant correlation between protein length and one-ratio PAML value was evidenced in the whole dataset (Spearman’s rank correlation: p = 9.2 × 10−82, r = 0.33; Supplementary Fig. 4b), we repeated the analysis by considering only those protein sequences whose length was included in an interval of ± 100 amino acids centered on the PfK13 protein length (Spearman’s rank correlation: p = 0.83, r = 0.01). Again, k13 ranked among the most conserved protein-coding genes of the Plasmodium proteome (sized-rank: 6/393), whereas four other five-or six-bladed Kelch protein-coding sequences showed much less intense levels of conservation than K13 (Fig. 1).
Variable levels of amino acid conservation between the annotated domains of K13
We next compared the conservation level between the annotated domains of K13 (CCC, BTB and propeller) using ω estimates obtained under the best fitted PAML model that indicates a variable selective regime among sites (model M3; Supplementary Table 2). First, we noted that the three domains have evolved under strong purifying selection with most sites being highly conserved during evolution (Fig. 2). BTB was however found to evolve under more intense purifying selection than either CCC (p = 1.6 × 10−4, Mann-Whitney U test) or propeller (p = 1.0 × 10−3, Mann-Whitney U test), but no difference in ω estimates was detected between CCC and propeller (p = 0.75, Mann-Whitney U test; Fig. 2 and Supplementary Table 3). To confirm these results, we inferred the site-specific substitution rate at the protein level using the FuncPatch server which takes into account the spatial correlation of site-specific substitution rates in the protein tertiary structure33 (hereafter called λ substitution rate). λ could not be inferred for the CCC domain (because of the lack of a resolved 3D structure), but the analysis confirmed that BTB was more conserved than propeller over evolutionary time (p = 5.4 × 10−5, Mann Whitney U test; Supplementary Table 4).
The BTB domain of K13 resembles that of KCTD proteins and exhibits a predicted functional patch
We then performed a more extensive study of the BTB and propeller domains of K13 because of their likely role in mediating K13 functions and the availability of their tertiary structures. To detect patches of slowly evolving amino acid sites in the BTB-propeller structure, we focused on the site-specific, spatially correlated, substitution rate λ at the amino acid level. This rate has been shown to provide a more reliable estimation of the conservation level at amino acid sites compared to standard substitution estimates, especially in the case of highly conserved proteins33,34.
Although K13 is related to the BTB-Kelch structural subgroup of proteins8, the BTB domain of K13 exhibits atypical features compared to BTB-Kelch proteins. First, the K13 BTB fold appeared shortened, lacking the A6 helix and the N-and C-terminal extensions27, similar to Elongin C (Fig. 3a). Second, the primary sequence of K13 BTB grouped with those of the KCTD protein family rather than of other BTB-containing protein families (Fig. 3b). Finally, K13 BTB exhibited a higher similarity in tertiary structure with the BTB domain of KCTD17 compared to those of Elongin C and KEAP1: the root-mean-square deviations (RMSDs) of atomic positions for BTB domains of K13-KCTD17, K13-Elongin C and K13-KEAP1 were 1.13 Ångström (Å), 2.33 Å and 2.17 Å, respectively.
To identify putative functional sites in K13 BTB, we examined whether a spatial correlation of the site-specific substitution rates λ is present in the K13 BTB-propeller tertiary structure. Despite a low standard deviation of substitution rates across amino acid sites, a significant spatial correlation was found with a log Bayes factor drastically > 8 and a 5 Å characteristic length (Table 2). The 10% most conserved sites predicted by FuncPatch formed one clearly bounded patch located at the surface of BTB (Fig. 3c). The BTB patch contained sites located at both B2-B3 and B4-A4 loops and at the A4 helix (positions/residues 355-358/NVGG, 397-399/DRD and 402-403/LF using the PfK13 sequence numbering; Fig. 3c). To test whether a similarly located, conserved patch was also found in the BTB domain of KCTD proteins, to which K13 BTB is the most similar, we inferred site-specific substitution rates λ from 124 and 139 orthologous sequences of SHKBP1 and KCTD17, respectively. For both proteins, the 10% most conserved BTB positions formed a patch that partially overlapped with the one of K13, with positions 357 (B2-B3 loop), 397 and 398 (B4-A4 loop) being shared between the three patches (PfK13 sequence numbering; Fig. 3d and Table 2). These positions are usually involved in BTB-BTB interactions in KCTDs28 or in BTB-Cullin interactions in some other BTB-containing protein families27, as in the X-ray structure of the Elongin C-Cullin2 complex (Supplementary Fig. 5). By functional annotation transfer, this indicates that these K13 BTB sites may be implied in some protein-protein interactions.
The propeller domain of K13 exhibits a conserved shallow pocket
In Kelch-containing proteins, the propeller domain usually serves as the receptor for a substrate further ubiquitinated23,24. Before examining its conservation level, we first re-evaluated the architecture of the K13 propeller domain using its resolved 3D structure (PDB code 4YY8, chain A, unpublished). The PfK13 propeller structure is composed of six repeats of the Kelch motif (or blade)8,22. As expected, each blade is a β-sheet secondary structure involving four twisted antiparallel β-strands (numbered A to D). The innermost A strands line the central channel of the propeller fold whereas the outermost D strands are part of the propeller outside surface. The top face of the domain, containing the central channel opening, is formed by residues from several strands and AB and CD loops. The bottom face of propeller is composed of residues from the DA and BC loops and contains a shallow pocket, similar to other propeller structures24. Since there is no conventional definition for the shallow pocket delineation in the propeller fold, we defined it as the amino acids forming the surface plan of the pocket and protruding out of the plan (n = 19 positions; Fig. 4a and Supplementary Fig. 6).
To characterize the pattern of conservation within the domain, we superimposed the site-specific substitution rates λ inferred by FuncPatch onto an amino acid sequence alignment of the six blades (Fig. 4b), custom-produced from a structural alignment of the blades using PyMOL (Fig. 4a and Supplementary Fig. 7). We found that: i) the conservation level significantly differed between the six blades of propeller (p = 3.0 × 10−3, Kruskal-Wallis H test), with blade VI exhibiting the lowest conservation (Fig. 4b, Supplementary Fig. 8 and Supplementary Table 5); ii) loops were more conserved than strands (p = 5.0 × 10−3, Mann-Whitney U test; Fig. 4b and Supplementary Table 5); iii) the solvent-exposed A and D strands were less conserved than the buried B and C strands (p = 1.3 × 10−6, Kruskal-Wallis H test; Fig. 4b and Supplementary Table 5); and iv) the conservation level was the strongest at the blade positions 7-10 (DA loops) and 23-25 (BC loops; Fig. 4b), which altogether formed the surface and underlying layer of the shallow pocket in the PfK13 propeller tertiary structure (Fig. 4a). Of note, similar results were obtained with the ω estimates inferred under the best fitted PAML model (Supplementary Table 5).
With one exception (position PfK13 514), the 10% most conserved sites in K13 propeller (n = 29 out of 284) were all located at the bottom side of the propeller fold (Fig. 5a). Among them, the sites exposed at the surface (n = 9) formed a statistically significant patch located within the shallow pocket (Table 2 and Fig. 5a). The nineteen positions forming the shallow pocket of K13 propeller were significantly enriched in the 10% most conserved propeller amino acid sites (p = 1.6 × 10−5, chi-squared test; Fig. 5a, Tables 3 and 4). A similar trend was observed using the ω PAML values (Supplementary Fig. 9).
Using the PfK13 propeller structure as reference, we also identified two remarkable features of this conserved patch. First, it overlapped with a region of the shallow pocket harboring an electropositive surface potential energy, in contrast to the overall electronegative one of the propeller bottom surface (Fig. 5b). Second, it contained several arginine and serine residues strictly conserved in Apicomplexa (PfK13 R529, S576, S577, R597 and S623; Fig. 5a and Table 3), which are known to mediate protein-protein interactions in the pocket of other propeller domains35,36. Altogether, our evolutionary analysis of K13 propeller revealed that the shallow pocket is extremely conserved and may bind a substrate molecule.
The conserved propeller patch predicted by FuncPatch is related to propeller binding activities in well-characterized BTB-Kelch proteins
To evaluate the reliability of FuncPatch to infer conserved functional sites in the context of propeller domains, we studied four other BTB-Kelch proteins found in mammals and which are functionally and structurally well-characterized (KEAP1, KLHL2, KLHL3 and KLHL12; Supplementary Fig. 10). All these proteins are known to bind a substrate protein through validated binding residues located in their propeller shallow pocket (substrates: Nrf2 for KEAP137, WNK for KLHL2 and KLHL335, and Dishevelled for KLHL1238). Using large sets of orthologous amino acid sequences (ranging from 129 sequences for KLHL12 to 162 sequences for KLHL2), a statistically significant spatial correlation of site-specific substitution rates λ was detected for each propeller fold (Table 2). In each case, the 10% most conserved propeller positions clustered in the shallow pocket (highest p = 1.4 × 10−7 for KEAP1, chi-squared test; Fig. 5c and Table 4). The shallow pocket of KEAP1, KLHL2 and KLHL3 propeller structures also showed a markedly electropositive surface potential energy while the one of KLHL12 was much more variable (Fig. 5c).
Therefore, the conserved functional sites predicted by FuncPatch in propeller domains from several BTB-Kelch proteins are consistent with the findings of previous experimental studies, demonstrating the reliability of our approach.
Lack of association of ART-R mutations with some long-term evolutionary or structural facets of K13 propeller
Numerous non-synonymous mutations in PfK13 propeller segregating at variable frequencies have been reported to confer ART-R to P. falciparum parasites from SEA4,8–11. Here, we hypothesized that they may be associated with specific patterns of evolutionary-and/or structure-based parameters despite their large distribution across the propeller fold (Supplementary Fig 11). K13 propeller positions were classified as associated (n = 27) or not (n = 257) with an ART-R mutation, on the basis of the last World Health Organization (WHO) status report on ART-R5. No difference in the inter-species site-specific substitution rates λ was observed between the two groups (p = 0.96, Mann-Whitney U test; Supplementary Fig. 12). Importantly, we noted that no ART-R mutation has been reported at the positions located at the surface of the shallow pocket, although this trend was not statistically confirmed (p = 0.23, Fisher’s exact test; 0/19 ART-R mutations for the shallow pocket positions, and 27/265 ART-R mutations for the remaining K13 propeller positions). Two structure-based parameters associated with site evolutionary rate were also estimated: the relative solvent accessibility or RSA, and the side-chain weighted contact number or WCNsc39. Again, no association was found between these structural parameters and the propeller positions associated with ART-R mutations (p = 0.44 and p = 0.46, respectively, Mann-Whitney U test; Supplementary Fig. 12). In conclusion, none of the evolutionary- and structure-based parameters tested here were associated with ART-R K13 propeller positions.
Discussion
Combining evolutionary and tertiary structure information provides a powerful and efficient way to gain insight into the functionality of protein sites39. The favored strategy usually screens sites that have evolved more rapidly than expected under a neutral model and interprets them as a signature of adaptive evolution corresponding to a gain of new function(s)40,41. Here, we focused on the most slowly evolving, 3D-bounded sites to identify highly conserved sub-regions of K13 across species evolution that are likely to play a functional role. Because of the extreme conservation of the k13 gene10 and its essential function14,15,42,43, we made inter-rather than intra-species comparisons. In the context of a sustained and intense purifying selection operating in all annotated domains of K13, our analysis of sequence evolution coupled to BTB-propeller tertiary structure identified two patches of particularly slowly evolving sites.
The most striking conserved patch is found in the K13 propeller domain. It includes specific positions from the inter-blade DA and intra-blade BC loops, within a shallow pocket on the bottom face of the propeller structure. Several lines of evidence suggest that this patch contains functional sites. First, the shallow pocket is highly enriched in conserved positions, whereas solvent-exposed sites in proteins are usually less conserved than buried ones39. Second, the shallow pocket of well-characterized BTB-Kelch proteins directly mediates propeller’s binding activities (KEAP1, KLHL2, KLHL3, KLHL12)24,35,36 and are also predicted as conserved patches in our analyses. Finally, the conserved patches at the shallow pocket of BTB-Kelch proteins and K13 share interesting properties: they display a markedly electropositive surface potential energy and they are enriched in arginine and serine residues (PfK13 R529, R597, S576, S577 and S623, all strictly conserved in Apicomplexa; Table 3). In KEAP1, KLHL2 and KLHL3, the corresponding residues bind to acidic peptides derived from their substrates, Nrf2 and WNK, respectively35,36 (Fig. 5c). According to the electropositive region surrounding the K13 propeller conserved patch, the K13 substrate molecule(s) may harbor an acidic binding motif. Altogether, these results indicate that the shallow pocket of K13 propeller exhibits several properties of a binding surface, and we speculate that it may be critical for the recognition of substrate molecule(s).
In P. falciparum, PI3K is a likely candidate as it is immunoprecipitated with full-length PfK13, and its ubiquitination and proteasomal degradation are altered by the pfk13 propeller C580Y mutation17. Another candidate may be the PK4 kinase which phosphorylates eIF2α, a key mediator of translation-mediated latency involved in ART-R20.
However, whether the pfk13-mediated ART-R mechanism in P. falciparum is related to the predicted binding activity of the propeller shallow pocket remains elusive. Based on the last WHO status report on ART-R5, none of the 27 validated or candidate ART-R mutations is located at the surface of the propeller shallow pocket. Several candidates are found in its underlayer (Supplementary Fig. 11) and some others have a preferential localization at positions proximal to the A and B strands10. Polymorphisms which have an uncharacterized phenotype were reported at 9 out of the 19 positions forming the shallow pocket, but found in only one or two parasite samples from population surveys (Table 3). Therefore, we speculate that amino acid changes at the positions contributing directly to the pocket surface are too damaging for the K13 native function to provide a long-term competitive advantage. In this context, rather than directly altering the binding residues, ART-R mutations may either induce long-range conformational changes propagating to the pocket surface or decrease the propeller domain stability, as reported for pathogenic mutations in KEAP1 W544C and KLHL3 S410L, respectively44,45. Furthermore, putative propeller conformational changes associated with ART-R mutations might be induced by specific cellular conditions, for example increased oxidative stress46.
BTB appeared here as the most conserved domain of K13 during Apicomplexa evolution, and therefore likely carries critical activities. It most resembles the BTB of the KCTD protein family in primary sequence, tertiary structure and short domain size. The shortened BTB of KCTDs could still mediate protein oligomerization28,47, consistent with the dimer observed in the solved PfK13 BTB-propeller crystal structures (PDB codes 4YY8 and 4ZGC, unpublished). K13 BTB harbors a predicted, functional patch – located at the B2-B3 and B4-A4 loops and at the A4 helix – that overlaps with the one of KCTD proteins, suggesting that they share similar functional sites. In KCTDs, these sites make BTB-BTB contacts in tetrameric or pentameric assemblies when BTB is solved as an isolated domain28. However, the PfK13 BTB-propeller structure forms a dimer and none of the highly conserved positions make BTB-BTB contacts (PDB codes 4YY8 and 4ZGC, unpublished). Finally, amino acids of the B4-A4 loop (corresponding to positions 397-399 in PfK13) are exposed at the BTB-Cullin binding interface in several solved complexes27. These discrepancies in the role of the predicted BTB patch could be due to the fact that additional domains (such as propeller) or partner proteins might constraint the folding of the BTB domain into oligomers or complexes. Altogether, data from the literature however support that the BTB predicted patch of K13 mediates protein-protein interactions, possibly with a Cullin protein.
Interestingly, K13 also contains a highly conserved CCC domain, located before BTB, which therefore likely carries critical activities. Consistent with this hypothesis, three pfk13 mutations conferring a moderate ART-R are located in CCC (PfK13 R239Q, E252Q and D281V)3,11,48. Coiled-coils are ubiquitous protein-protein interaction domains composed of two or more α-helices coiling together49. A CCC domain was reported in a few Kelch-containing proteins (including some KCTDs) involved in cell morphogenesis but these CCC have a different domain organization than the one of K1322. The CCC of K13 may participate in K13 oligomerization and/or serve as a binding interface with other proteins50.
In conclusion, through evolutionary and structural analyses, we identified the shallow pocket of the K13 propeller domain as a candidate surface for binding a substrate molecule. We also detected in the BTB domain of K13 a conserved patch of sites that are involved in protein-protein interactions in known BTB-Cullin and BTB-BTB complexes. Efforts should now focus on the identification of molecule(s) that bind(s) to the pocket of K13 propeller, which may help clarify the link between K13 function and ART-R.
Materials and Methods
Collection of k13 orthologous sequences from genomic databases
The amino acid sequence of PfK13 (PlasmoDB code PF3D7_1343700) was queried against the specialized eukaryotic pathogen database (EuPathDB release 33)51 and the NCBI non-redundant protein database using blastp and tblastn searches52 (BLOSUM62 matrix). A protein was considered as a likely orthologous sequence if the sequence identity was ≥ 30% and the e-value below the 10−3 cutoff. Forty three K13 sequences – and corresponding k13 cDNA sequences – were retrieved from distinct Apicomplexa species including 21 Plasmodium species. A detailed bioinformatics analysis was performed on each protein sequence to confirm the presence of the three annotated domains of K13 (CCC, BTB and propeller) using InterPro53.
k13 sequence alignment
Considering the greater divergence of coding nucleotide sequences as compared to protein sequences due to the genetic code redundancy, a K13 protein sequence alignment was first generated using Mafft version 754 (E-INS-I strategy with BLOSUM62 scoring matrix, gap opening penalty 2.0 and offset 0.1). The output alignment was visually inspected and manually edited with BioEdit v7.2.555. The positions containing gaps in at least 30% of all sequences were removed, as suggested by PAML’ authors31. Then, the k13 nucleotide sequence alignment was generated with PAL2NAL56 using the cleaned K13 amino acid alignment as template.
Phylogenetic analysis of k13
The phylogenetic relationships of k13 nucleotide sequences were inferred using the maximum-likelihood method implemented in PhyML v3.057, after determining the best-fitting nucleotide substitution model using the Smart Model Selection (SMS) package58. A general time-reversible model with optimized equilibrium frequencies, gamma distributed among-site rate variation and estimated proportion of invariable sites (GTR + G + I) was used, as selected by the Akaike Information Criterion. The nearest neighbor interchange approach was chosen for tree improving, and branch supports were estimated using the approximate likelihood ratio aLRT SH-like method59. The k13 phylogeny was rooted using Cryptosporidia species as outgroup.
Molecular evolutionary analysis of k13
To investigate the evolutionary regime that has shaped the k13 protein-coding DNA sequence during species evolution, we analyzed the non-synonymous (dN) to synonymous dS) substitution rate ratio ω (= dN/dS), estimated by maximum-likelihood using the codeml tool from PAML v.4.831,61. ω provides a sensitive measure of selective pressure at the amino acid level by comparing substitution rates with statistical distribution and considering the phylogenetic tree topology. Typically, ω < 1 indicates purifying selection, while ω = 1 and ω > 1 indicate neutral evolution and positive selection, respectively.
The heterogeneity of ω among lineages of the k13 phylogenetic tree (branch models) was tested by comparing the free-ratio (FR) model, which assumes as many ω parameters as the number of branches in the tree, to the one-ratio (M0) model which supposes only one ω value for all branches62,63.
The variation of ω among codon sites was then evaluated using codon models M1a, M2a, M3, M7 and M862,63. M1a allows codon sites to fall into two site classes, either with ω < 1 (purifying selection) or ω = 1 (neutral evolution), whereas model M2a extends model M1a with a further site class as ω > 1 (positive selection). Model M3 includes a discrete distribution of independent ω with k classes of sites (k = [3, 4, 5] in this study), with ω values and corresponding proportions estimated from the dataset. Model M7 assumed a β-distribution of ten ω ratios limited to the interval [0, 1] with two shape parameters p and q, whereas model M8 adds an additional site class with ω possibly > 1 as M2a does. The heterogeneity of ω across codon sites was tested by comparison of models M0-M3, while comparison of paired models M1a-M2a and M7-M8 allowed to detect positive selection62.
Model comparisons were made using likelihood ratio tests (LRTs)64. For each of the LRTs, twice the log-likelihood difference between alternative and null models (2ΔlnL) was compared to critical values from a chi-squared distribution with degrees of freedom equal to the difference in the number of estimated parameters between both models40. Candidate sites for positive selection were pinpointed using the Bayes empirical Bayes (BEB) inference which calculates the posterior probability that each codon site falls into a site class affected by positive selection (in models M2a and M8), as described by Yang and colleagues65. For model M3, in which no BEB approach is implemented yet, the naive empirical Bayes (NEB) approach was used to identify those sites evolving under positive selection.
Three codon substitution models were used and compared for all models: F1×4 and F3×4, which assume equal nucleotide frequencies and individual codon frequencies at all codon positions, respectively, and the parameter-rich model F61, which estimates codon frequencies separately for each codon61,66. Since the three codon substitution models yielded similar results (Supplementary Fig. 13), we only presented those obtained with the most widely used F3×4 codon model. The analyses were run multiple times with different ω starting values to check the consistency of the results.
For PAML model M3 with k site classes of ω ratios, the posterior mean of ω value at each codon site was calculated as the average of the ω ratios across the k ω site classes weighted by their posterior probabilities31.
In addition to k13, four other Kelch protein-coding sequences were considered to compare their ω with those estimated for the whole Plasmodium proteome. We used the ω values previously estimated by Jeffares and colleagues32 with PAML under the one-ratio model for each of the 3,256 orthologous protein-coding genes among six Plasmodium species: P. falciparum, P. berghei, P. chabaudi, P. vivax, P. yoelii and P. knowlesi. A full description of the procedure is presented in the original paper32.
Inferring site-specific substitution rates considering their spatial correlation in the K13 BTB-propeller tertiary structure
Most methods – including PAML – assume that site-specific substitution rates are independently distributed across sites67. However, it is widely acknowledged that amino acids located close to each other in protein tertiary structures are more likely to carry out similar functions, suggesting a site interdependence in amino acid sequence evolution attributed to tertiary structure67,68. Consequently, the substitution rate at the protein level (named λ in this study) was inferred using the FuncPatch server33. FuncPatch requires an amino acid sequence alignment, a phylogenetic tree and a protein tertiary structure to estimate the conservation level during species evolution and the characteristic scale (in Å) of spatially correlated site-specific substitution rates λ. We used the X-ray structure at 1.81 Å resolution of PfK13 BTB-propeller as the reference structure which does not contain the conserved CCC domain (PDB code 4YY8, unpublished). Beforehand, a Ramachandran analysis was performed to validate the quality of the structure using MolProbity69: 96.9% and 3.1% of the amino acids were in favored and allowed regions, respectively, and there were no outliers. FuncPatch only accepts monomeric proteins as input whereas BTB-propeller of PfK13 dimerizes in crystal structure. To take into account the dimeric organization of PfK13, its tertiary structure was edited using customized python scripts (Python v2.7.13) in order to merge the two monomers (chains A and B) and the K13 sequence was duplicated in the K13 protein sequence alignment. The analysis was also done using either one of the other monomeric BTB-propeller tertiary structure and also using a disulfide-bonded version of PfK13 BTB-propeller (PDB code 4ZGC, unpublished). All these control analyses yielded similar results (data not shown). The spatial correlation of the site-specific substitution rates λ in the K13 tertiary structure was tested using a Bayesian model comparison, where a null model (model 0), in which no spatial correlation of site-specific substitution rates λ is present, is compared to the alternative model (model 1). As suggested by FuncPatch’ authors, the spatial correlation was considered as significant if the estimated log Bayes factor (model 1 versus model 0) was larger than 8 in the dataset (conservative cutoff)33.
Delineation of K13 propeller blades and secondary structures
The propeller domain of PfK13 is composed of six blades having slightly different amino acid lengths. To get an accurate blade alignment at the primary amino acid sequence level, we first sought to align the six blade structures. The PDB propeller structure was obtained from the PfK13 BTB-propeller structure (PDB code 4YY8, chain A, unpublished) and was then divided into six parts, each one containing the atomic coordinates of one blade. The six blade structures were then aligned by minimizing the RMSD of atomic positions using the align function in PyMOL Molecular Graphics System70 so as to identify the amino acids from the six blades that are located at exactly the same blade position. This structure alignment was then used to align the six blades at the primary amino acid sequence level. The delineation of the strands and loops was obtained directly from the PDB file (PDB code 4YY8, chain A, unpublished).
Definition of ART-R mutations
We used the last status report on ART-R provided by the WHO to classify the positions of the PfK13 propeller domain as associated or not with an ART-R mutation5.
Evolutionary analysis of the BTB and propeller domains in other BTB-and Kelch-containing proteins
To characterize the BTB domain of K13, we arbitrarily retrieved some members belonging to the main BTB-containing protein families (BTB-ZF, BTB-Kelch, RhoBTB, BTB-NPH3, MATH-BTB, KCTD, KCNA and SKP1 and Elongin C proteins; full list provided in Supplementary Table 6). A multiple protein alignment was generated using Mafft version 754 (default parameters) and was then manually edited with BioEdit v7.2.555 to retain only the region referring to the BTB core fold. The phylogenetic relationships were inferred with the PhyML57 procedure using the best-fitting protein substitution model as determined by the SMS package58.
For further comparisons with the K13 BTB and propeller domains, site-specific substitution rates λ were inferred with FuncPatch for the BTB and propeller domains of several mammalian KCTD and BTB-Kelch proteins, respectively. In the present study, the proteins were selected on the basis of their sequence homology with K13, the availability of a solved 3D structure, and their known implication in a Cullin-RING E3 ligase complex as suspected for K13. In addition, only well-characterized ligand-binding function and the presence of a six-bladed propeller structure similar to the one of K13 were considered for BTB-Kelch proteins. After a careful review of the literature, we selected two KCTD proteins: SHKBP1 (UniProt code Q8TBC3) which regulates the epidermal growth factor receptor (EGFR) signaling pathway71; and KCTD17 (Q8N5Z5) which mediates the ubiquitination and proteasomal degradation of the ciliogenesis down-regulation TCHP protein72. Considering BTB-Kelch proteins, we focused on: KEAP1 (Q14145) which interacts with its client protein Nrf2 for the induction of cytoprotective responses to oxidative stress37; KLHL2 (O95198) and KLHL3 (Q9UH77) which both participate in the ubiquitination and degradation of WNK substrates regulating blood pressure35; and KLHL12 (Q53G59) which negatively regulates the WNT-beta-catenin pathway through the degradation of Dishevelled proteins38. First, each Homo sapiens KCTD and BTB-Kelch sequence was successively submitted as query sequence for a blastp search52 (BLOSUM62 scoring matrix, max target sequences fixed at 1,000) against the NCBI non-redundant protein database to retrieve orthologous sequences from a large amount of species. The output lists were then filtered according to specific criteria so as to keep only sequences having an unambiguous description (i.e. a description that includes the name of the queried KCTD or BTB-Kelch protein), and that aligned with ≥ 80% sequence coverage and had ≥ 60% sequence identity with the query sequence. The multiple protein alignment of each set of orthologous sequences was then generated using Mafft version 754 (E-INS-I strategy with BLOSUM62 scoring matrix, gap opening penalty 2.0 and offset 0.1). A second filtering step was performed to remove incomplete or miss-annotated sequences, i.e. the sequences that did not contain all the annotated domains (using the domain annotation automatically generated by the Uniprot Knowledgebase) and/or that included a gapped position located in one of the annotated domains. The final multiple protein alignments included: i) 124 sequences × 103 aligned positions for SHKBP1; ii) 139 sequences × 102 aligned positions for KCTD17; iii) 135 sequences × 285 aligned positions for KEAP1; iv) 162 sequences × 286 aligned positions for KLHL2; v) 158 sequences × 286 aligned positions for KLHL3; and vi) 129 sequences × 289 aligned positions for KLHL12. The full list of orthologous sequences used for each mammalian KCTD and BTB-Kelch protein is provided in Supplementary Table 7. Then, the phylogenetic relationships were inferred using PhyML57 after determining the best-fitting protein substitution model with the SMS package58. The 3D structures of KCTD BTB and BTB-Kelch propeller domains were retrieved from the PDBsum database under the following accession numbers: 4CRH for SHKBP1 (resolution: 1.72 Å)28, 5A6R for KCTD17 (resolution: 2.85 Å)28, 2FLU for KEAP1 (resolution: 1.50 Å, in complex with a Nrf2 peptide)73, 4CHB for KLHL2 (resolution: 1.56 Å, in complex with a WNK4 peptide)35, 4CH9 for KLHL3 (resolution: 1. 84 Å, in complex with a WNK4 peptide)35, and 2VPJ for KLHL12 (resolution: 1.85 Å)24. Beforehand, the quality of each structure was validated using MolProbity69: none of the structures had amino acids identified as outliers, and approximately 98% of the amino acids of each structure were in favored regions.
Evaluation of structural properties
The electrostatic potential energy of each propeller structure was calculated using the Adaptive Poisson-Boltzmann Solver (APBS) method74. Beforehand, the required pqr input files were prepared using PDB2PQR v.2.1.175. The missing charges were added using the Add Charge function implemented in USCF Chimera76. A grid-based method was used to solve the linearized Poisson-Boltzmann equation at 298 K, with solute (protein) and solvent dielectric constant values fixed at 2 and 78.5, respectively. The contact surface selection was mapped using a radius of 1.4 Å in a scale of −8 kT/e to +8 kT/e.
The relative solvent accessibility (RSA) was estimated as the accessible surface area of amino acids using DSSP77, then normalized with the maximal accessible surface area of each amino acid78. The side-chain weighted contact number (WCNsc) of each amino acid was calculated using a customized python script provided by Sydykova and colleagues79. All structural properties were assessed using the aforementioned PDB files.
Structure visualization
All molecular drawings were generated using the UCSF Chimera software76.
Statistical analyses
Substitution rates among partitions were compared using non-parametric Mann-Whitney U or Kruskal-Wallis H tests. When focusing on the propeller shallow pocket, contingency tables were produced and statistically tested with chi-squared test. We used p < 0.05 as the cutoff for significance in all statistical tests.
Author contributions
R.C., A.S. and J.C. designed the study. D.J. provided data. R.C. performed most of the analyses. R.C., A.S., and J.C. contributed to interpretations. All authors have participated in paper writing and approved it prior to submission.
Competing interests
The authors declare no competing financial interests.
Acknowledgements
We thank YF. Huang and GB. Golding for help with FuncPatch analysis. We thank O. Mercereau-Puijalon, M. Miteva, and R. Duval for helpful discussions. We thank A. Sissoko and F. Palstra for proofreading the manuscript. R. Coppée is supported by PhD grant funding from the École Doctorale MTCI.
References
- 1.↵
- 2.
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.
- 81.