Abstract
K13 is an essential Plasmodium falciparum protein that plays a key role in malaria resistance to artemisinins. Although K13 resembles to BTB- and Kelch/propeller-containing proteins involved in ubiquitin ligase complexes, its functional sites remain uncharacterized. Using bioinformatics analyses combining evolutionary and protein structural information, we find evidence of strong purifying selection acting on the Apicomplexa k13 gene. An electropositive amino acid “patch” in the propeller domain bears a dense concentration of extraordinarily conserved positions located at a shallow pocket, suggesting a role in mediating protein-protein interactions. When applied to experimentally-characterized BTB-Kelch proteins, our strategy successfully identifies the validated substrate-binding residues within their own propeller shallow pocket. Another patch of slowly evolving sites is identified in the K13 BTB domain which partially overlaps the surface that binds to Cullin proteins in BTB-Cullin complexes. We provide candidate binding sites in K13 propeller and BTB domains for functional follow-up studies.
Introduction
Current efforts to control malaria are threatened by the spread in Southeast Asia (SEA) of Plasmodium falciparum parasites that are resistant to artemisinin derivatives (ARTs)1. Treatment failures are now reported in some geographic areas of SEA for the current front-line ART-based combination therapies1–3. In Africa, ART resistance (ART-R) is not yet established4,5, although some P. falciparum parasites from Uganda and Equatorial Guinea exhibiting high survival rate have been described 6,7.
In parasites from SEA, ART-R is primarily conferred by single non-synonymous mutations in the P. falciparum k13 (pfk13) gene8,9. Multiple pfk13 ART-R mutations have emerged concomitantly in the early 2000’s until a specific, multidrug-resistant lineage carrying the C580Y mutation became the most common, especially in the East Thailand-Cambodia-Lao PDR-Vietnam region4,8,10,11. The ART-R phenotype is defined as parasites exhibiting in vivo a delayed clearance time following an ART-based treatment1 and in vitro an increased survival rate following a brief exposure to a high dose of ART12.
The pfk13 gene encodes a 726 amino acid protein (PfK13) which is essential at least during the intraerythrocytic parasite blood stage13,14. Both the gene ontology annotation of pfk13 and the study of ART-resistant parasites carrying a pfk13 mutation suggest that this protein has some regulatory functions at the protein level15,16. At the cell level, pfk13 mutant parasites decelerate their development during the early intraerythrocytic stage12,17. At the molecular level, they exhibit an increased expression of unfolded protein response pathways18, lower levels of ubiquitinated proteins16,17, and phosphorylation of the parasite eukaryotic initiation factor-2α (eIF2α) which correlates with ART-induced latency19. There are some indications of the interactors with PfK13 that partially clarify its function. For example, PfK13 was immunoprecipitated with the phosphatidylinositol 3-kinase (PI3K)17. Ubiquitination of PI3K is also decreased in pfk13 C580Y mutants, resulting in wide phosphatidylinositol 3-phosphate (PI3P)-related cellular alterations16,20 which roles in ART-R are still to be characterized.
PfK13 is reported to fall within the BTB-Kelch structural subgroup of Kelch-repeat proteins8,21,22 because it possesses a BTB domain (Broad-complex, tramtrack and bric-à-brac; also known as BTB/POZ; amino acids 350-437) and a C-terminal propeller domain (also known as Kelch domain; amino acids 443-726) composed of repeated Kelch motifs (PDB code 4YY8, unpublished). Of note, PfK13 exhibits specific features when compared to typical BTB-Kelch proteins, such as a poorly conserved Apicomplexa-specific N-terminal region, a coiled-coil-containing (CCC) domain (amino acids 212-341) located upstream of BTB, and an absence of BACK domain (for BTB And C-terminal Kelch) often found between BTB and Kelch domains22,23. Both BTB and propeller domains are known to carry important functions. Most proteins harboring a BTB domain, such as members of the BTB-Kelch, BTB-Zinc Finger (ZF) and potassium (K+) channel tetramerization (KCTD) families, are found in multi-subunit Cullin-RING E3 ligase complexes in which a substrate protein will be ubiquitinated and then degraded by the proteasome23–26. In those complexes, BTB mediates varying oligomerization architectures and also contributes to Cullin recruitment. The propeller domain serves as the substrate receptor in BTB-Kelch and other proteins22,23, and usually comprises four to six repeated β-stranded Kelch motifs – also named blades – arranged in a circular symmetry (the six blades in PfK13 are named I to VI). In some well-characterized Kelch-containing proteins, the loops protruding at the bottom face of propeller form a shallow pocket often involved in the binding of a substrate protein further targeted for degradation23. For example, the propeller shallow pocket of BTB-Kelch KEAP1 and KLHL3 directly binds to the transcription factor Nrf2 and the kinase WNK, respectively, and controls their cellular levels27,28. Consequently, PfK13 may exhibit similar functions mediated by its BTB and propeller domains. However, the PfK13 binding regions and functionally important sites remain poorly characterized. It is noteworthy that nearly all pfk13 mutations associated with ART-R – including the C580Y mutation – cluster in the propeller domain8,11, suggesting a major functional role of this domain.
Here, we hypothesize that functionally important sites of K13 have evolved under stronger purifying selection and could be identified by the analysis of k13 molecular evolution across Apicomplexa species. To test this, we inferred substitution rates for each amino acid site of the K13 sequence, taking into account the k13 phylogeny and – whenever possible – the spatial correlation of site-specific substitution rates in the protein tertiary structure. We identified a major functional patch of slowly evolving sites located at the bottom face of K13 propeller that form part of the shallow pocket. To show the relevance of our approach, we applied it to the propeller domain of four well-characterized BTB-Kelch proteins (KEAP1, KLHL2, KLHL3 and KLHL12) and successfully identified the functionally and structurally validated substrate-binding residues located in the pocket of these propeller domains. Another patch of slowly evolving sites was also identified in the BTB domain of K13 which partially overlaps with the surface that binds to Cullin proteins in known BTB-Cullin complexes25. Altogether, these findings support a crucial role of K13 as a core element of an ubiquitin ligase complex, and suggest that its propeller domain likely mediates protein-protein interactions through its shallow pocket.
Results
K13 sequence sampling, multiple alignment and phylogenetic reconstruction
Forty-three complete amino acid sequences from distinct Apicomplexa species were unambiguously identified as orthologous to PfK13 in sequence databases, encompassing 21 Plasmodium and 22 other Apicomplexa K13 sequences (Cryptosporidia, n = 7; Piroplasmida, n = 7; and Coccidia, n = 8). The length of K13 protein sequences ranged from 506 (Eimeria brunetti) to 820 (Hammondia hammondi) amino acids (Supplementary Table 1). By visual inspection, the three annotated domains of K13 (CCC, BTB and propeller) were conserved, whereas the N-terminal region preceding the CCC domain appeared much more variable among sequences, even being absent in some of them (Supplementary Fig. 1). Since K13 sequences aligned poorly in that region, the first 234 amino acid positions of the K13 multiple alignment were removed, along with other positions showing a high divergence level and/or a gap enrichment among sequences (32 amino acid positions; Supplementary Fig. 1). The final K13 multiple sequence alignment contained 514 amino acid positions which covered the whole CCC, BTB and propeller domains. The average pairwise sequence identity in that cleaned K13 sequence alignment was 64.6%, ranging from 48.6% for the Babesia bigemina–Cryptosporidium ubiquitum comparison to 99.2% for the P. chabaudi spp. pair.
The maximum-likelihood phylogenetic tree built from the curated alignment of the corresponding Apicomplexa k13 cDNA sequences revealed four monophyletic groups: Cryptosporidia, Plasmodium, Piroplasmida and Coccidia, all supported by high bootstrap values (≥ 98%; Supplementary Fig. 2).
The group of Hematozoa k13 sequences appeared as paraphyletic (bootstrap value = 100%), with Piroplasmida unexpectedly clustering with Coccidia (Supplementary Fig. 2). The phylogenetic relationships of Plasmodium k13 sequences were largely consistent with the acknowledged phylogeny of Plasmodium species, except for bird-infecting Plasmodium k13 sequences (P. gallinaceum and P. relictum), which appeared related to human-infecting P. ovale spp. sequences, although this grouping was poorly supported (bootstrap value = 47%; Supplementary Fig. 2).
The k13 sequence has evolved under strong purifying selection
To evaluate the selective pressure acting on k13, we used codon substitution models to estimate the rate of non-synonymous to synonymous substitutions, ω = dN/dS, across codon sites of the k13 sequence (site models) and branches of the k13 phylogeny (branch models). A series of nested likelihood ratio tests (LRTs) using different sets of site and branch models were carried out using the codeml tool from the PAML package29. When applied to the k13 codon alignment, LRTs of codon and branch models indicated varying site-specific substitution rates ω both among codon sites of the k13 sequence (M0:M3 model comparison, p = 3.3 × 10−225) and among branches of the k13 phylogeny (M0:FR, p = 1.9 × 10−53; Table 1 and Supplementary Table 2). This suggests that k13 has evolved under a variable selective regime both across codon sites and lineages. No evidence of positive selection was found in any of the tree branches (Supplementary Fig. 3). Similarly, site models incorporating positive selection (M2a and M8) provided no better fit to the data than those including only purifying selection and neutral evolution (M1a and M7: 2ΔlnL = 0 in both cases; Table 1), thus supporting an absence of detectable adaptive selection events during Apicomplexa evolution at any k13 codon site. Altogether, the data indicate that much of the K13 protein, except the N-terminal region, has been strongly conserved over evolutionary time.
When considering the PAML ωM0 values of 3,256 protein-coding genes previously estimated by Jeffares and colleagues using six Plasmodium species30, k13 ranked among the 5% most conserved protein-coding genes of the Plasmodium proteome (rank: 140/3,256; Supplementary Fig. 4a). Since a significant correlation between protein length and ωM0 was evidenced in the whole dataset (Spearman’s rank correlation: p = 9.2 × 10−82, r = 0.33; Supplementary Fig. 4b), we repeated the analysis by considering only those protein sequences whose length was included in an interval of ± 100 amino acids centered on the PfK13 protein length (Spearman’s rank correlation: p = 0.83, r = 0.01). Again, k13 ranked among the most conserved protein-coding genes of the Plasmodium proteome (sized-rank: 6/393), whereas four other five- or six-bladed Kelch protein-coding sequences showed much less intense levels of conservation than K13 (Fig. 1).
Variable levels of amino acid conservation between the annotated domains of K13
We next compared the conservation level between the annotated domains of K13 (CCC, BTB and propeller) using ω estimates obtained under the best fitted PAML model M3 (Supplementary Table 2). First, we noted that all have evolved under strong purifying selection with most sites being highly conserved during evolution (Fig. 2). BTB was however found to evolve under more intense purifying selection than either CCC (p = 1.6 × 10−4, Mann-Whitney U test) or propeller (p = 1.0 × 10−3, Mann-Whitney U test), but no difference in ω estimates was detected between CCC and propeller (p = 0.75, Mann-Whitney U test; Fig. 2 and Supplementary Table 3). To confirm these results, we inferred the site-specific substitution rate at the protein level using the FuncPatch server which takes into account the spatial correlation of site-specific substitution rates in the protein tertiary structure31 (hereafter called λ substitution rate). λ could not be inferred for the CCC domain (because of the lack of a resolved 3D structure), but the analysis confirmed that BTB was more conserved than propeller over evolutionary time (p = 5.4 × 10−5, Mann Whitney U test; Supplementary Table 4).
We then performed a more extensive study of the BTB and propeller domains of K13 because of their likely role in mediating K13 functions and the availability of their tertiary structures. To detect patches of slowly evolving amino acid sites in the BTB-propeller structure, we focused on the site-specific substitution rate λ at the amino acid level because it has been shown to provide a more reliable estimation of the conservation level at amino acid sites compared to standard substitution estimates, especially in the case of highly conserved proteins31,32.
The BTB domain of K13 resembles that of KCTD proteins and exhibits a predicted functional patch
Although K13 was reported to fall within the BTB-Kelch structural subgroup of proteins8, the BTB domain of K13 exhibits atypical features compared to the one of BTB-Kelch proteins. First, the K13 BTB fold appeared shortened, lacking the A6 helix and the N- and C-terminal extensions25, similar to Elongin C (Fig. 3a). Second, the primary sequence of K13 BTB grouped with those of the KCTD protein family rather than of other BTB-containing protein families (Fig. 3b). Finally, K13 BTB exhibited a higher similarity in tertiary structure with the BTB domain of KCTD17 compared to those of Elongin C and KEAP1: the root-mean-square deviations (RMSDs) of atomic positions for BTB domains of K13-KCTD17, K13-Elongin C and K13-KEAP1 were 1.13 Ångström (Å), 2.33 Å and 2.17 Å, respectively.
To identify putative functional sites in K13 BTB, we first examined whether a spatial correlation of the site-specific substitution rates λ is present in the K13 BTB-propeller tertiary structure. Despite a low standard deviation of substitution rates across amino acid sites, a significant spatial correlation was found with a log Bayes factor drastically > 8 with a 5 Å characteristic length (Table 2). The 10% most conserved sites predicted by FuncPatch formed one clearly bounded patch located at the surface of BTB (Fig. 3c). The BTB patch contained sites located at both B2-B3 and B4-A4 loops and at the A4 helix (positions/residues 355-358/NVGG, 397-399/DRD and 402-403/LF using the PfK13 sequence numbering; Fig. 3c). To test whether a similarly located, conserved patch was also found in the BTB domain of KCTD proteins, to which K13 BTB is the most similar, we inferred site-specific substitution rates λ from 124 and 139 orthologous sequences of SHKBP1 and KCTD17, respectively. For both proteins, the 10% most conserved BTB positions formed a patch that partially overlapped with the one of K13, with positions 357 (B2-B3 loop), 397 and 398 (B4-A4 loop) being shared between the three patches (PfK13 sequence numbering; Fig. 3d and Table 2). These positions are usually involved in BTB-BTB interactions in KCTDs26 or in BTB-Cullin interactions in some other BTB-containing protein families25, as in the X-ray structure of the Elongin C-Cullin2 complex (Supplementary Fig. 5).
The propeller domain of K13 exhibits a conserved shallow pocket similarly to other well-characterized, substrate-binding propeller domains
In Kelch-containing proteins, the propeller domain usually serves as the receptor for a substrate further targeted for degradation22,23. Before examining its conservation level, we first re-evaluated the architecture of the K13 propeller domain using its resolved 3D structure (PDB code 4YY8, chain A, unpublished). The PfK13 propeller structure is composed of six repeats of the Kelch motif (or blade)8,21. As expected, each blade is a β-sheet secondary structure involving four twisted antiparallel β-strands (numbered A to D). The innermost A strands line the central channel of the propeller fold whereas the outermost D strands are part of the propeller outside surface. The top face of the domain, containing the central channel opening, is formed by residues from several strands and AB and CD loops. The bottom face of propeller is composed of residues from the DA and BC loops and contains a shallow pocket, similar to other propeller structures23. Since there is no conventional definition for the shallow pocket delineation in the propeller fold, we defined it as the amino acids forming the surface plan of the pocket and protruding out of the plan (n = 19 positions; Supplementary Fig. 6). To characterize the pattern of conservation within the domain, we superimposed the site-specific substitution rates λ inferred by FuncPatch onto an amino acid sequence alignment of the six blades (Fig. 4b), custom-produced from a structural alignment of the blades using PyMOL (Fig. 4a and Supplementary Fig. 7). We found that: i) the conservation level significantly differed between the six blades of propeller (p = 3.0 × 10−3, Kruskal-Wallis H test), with blade VI exhibiting the lowest conservation (Fig. 4b, Supplementary Fig. 8 and Supplementary Table 5); ii) loops were more conserved than strands (p = 5.0 × 10−3, Mann-Whitney U test; Fig. 4b and Supplementary Table 5); iii) the solvent-exposed A and D strands were less conserved than the buried B and C strands (p = 1.3 × 10−6, Kruskal-Wallis H test; Fig. 4b and Supplementary Table 5); and iv) the conservation level was the strongest at the blade positions 7-10 (DA loops) and 23-25 (BC loops; Fig. 4b), which altogether formed the surface and underlying layer of the shallow pocket in the PfK13 propeller tertiary structure (Fig. 4a). Of note, similar results were obtained with the ω estimates inferred with the best fitted PAML model M3 (Supplementary Table 5). With one exception (position 514 using the PfK13 sequence numbering), the 10% most conserved sites in K13 propeller (n = 29 out of 284) were all located at the bottom side of the propeller fold (Fig. 5a). Among them, the sites exposed at the surface (n = 9) formed a statistically significant patch located within the shallow pocket (Table 2, Fig. 5a and Supplementary Table 3). The nineteen positions forming the shallow pocket of K13 propeller were significantly enriched in the 10% most conserved propeller amino acid sites (p = 1.6 × 10−5, chi-squared test; Fig. 5a, Tables 3 and 4). A similar trend was observed with the ω estimates from the best fitted PAML model M3 (Supplementary Fig. 9). Using the PfK13 propeller structure as reference, we also identified two remarkable features of this conserved patch: it overlapped with a region of the shallow pocket harboring an electropositive surface potential energy, in contrast to the overall electronegative one of the propeller bottom surface (Fig. 5b); and it contained several arginine and serine residues, strictly conserved in Apicomplexa, which are known to mediate protein-protein interactions in other propeller domains (R529, S576, S577, R597 and S623 using the PfK13 sequence numbering; Fig. 5a and Table 3)33,34. Altogether, our evolutionary analysis of K13 propeller revealed that the shallow pocket is extremely conserved and may mediate protein-protein interactions.
The conserved propeller patch predicted by FuncPatch is related to propeller activities in well-characterized BTB-Kelch proteins
To evaluate the reliability of FuncPatch to infer conserved functional sites in the context of propeller domains, we studied four other BTB-Kelch proteins found in mammals and which are functionally and structurally well-characterized (KEAP1, KLHL2, KLHL3 and KLHL12; Supplementary Fig. 10). All these proteins are known to bind a substrate protein through validated substrate-binding residues located in their propeller shallow pocket (substrates: Nrf2 for KEAP135, WNK for KLHL2 and KLHL333, and Dishevelled for KLHL1236). Using large sets of orthologous amino acid sequences (ranging from 129 sequences for KLHL12 to 162 sequences for KLHL2), a statistically significant spatial correlation of site-specific substitution rates λ was detected for each propeller fold (Table 2), with the 10% most conserved propeller positions clustering in the shallow pocket (highest p = 1.4 × 10−7 for KEAP1, chi-squared test; Fig. 5c and Table 4). The shallow pocket of KEAP1, KLHL2 and KLHL3 propeller structures also showed a markedly electropositive surface potential energy while the one of KLHL12 was much more variable (Fig. 5c).
Therefore, the conserved functional sites predicted by FuncPatch in propeller domains from several BTB-Kelch proteins are consistent with the findings of previous experimental studies, demonstrating the reliability of our approach.
Lack of association of ART-R mutations with some long-term evolutionary or structural facets of K13 propeller
Several mutations in PfK13 propeller segregating at variable frequencies have been reported to confer ART-R to P. falciparum parasites from SEA4,8–11. Here, we hypothesized that they may be associated with specific patterns of evolutionary- and/or structure-based parameters despite their distribution in the whole propeller fold (Supplementary Fig 11). K13 propeller positions were thus classified as associated (n = 27) or not (n = 257) with a reported ART-R mutation, on the basis of the last World Health Organization (WHO) status report on ART-R5. No difference in the “inter-species” site-specific substitution rates λ was observed between the two groups (p = 0.96, Mann-Whitney U test; Supplementary Fig. 12). Importantly, we noted that no ART-R mutation has been reported at the positions located at the surface of the shallow pocket, although this trend was not statistically confirmed (p = 0.23, Fisher’s exact test; 0/19 ART-R mutations for the shallow pocket positions, and 27/265 ART-R mutations for the remaining K13 propeller positions). Two structure-based parameters associated with site evolutionary rate were also estimated: the relative solvent accessibility or RSA, and the side-chain weighted contact number or WCNsc37. Again, no association was found between these structural parameters and the propeller positions associated with ART-R mutations (p = 0.44 and p = 0.46, respectively, Mann-Whitney U test; Supplementary Fig. 12). In conclusion, none of the evolutionary- and structure-based parameters tested here were associated with ART-R K13 propeller positions.
Discussion
Combining evolutionary and tertiary structure information provides a powerful and efficient way to gain insight into the functionality of protein sites37. The favored strategy usually screens sites that have evolved more rapidly than expected under a neutral model and interprets them as a signature of adaptive evolution corresponding to a gain of new function(s)38,39. Here, we focused on the most slowly evolving, 3D-bounded sites as a mean to identify highly conserved sub-regions of K13 across species evolution that are likely to play a functional role. Because of the extreme conservation of the k13 gene10 and its essential function13,14,40,41, we made inter-rather than intra-species comparisons. In the context of a sustained and intense purifying selection operating in all annotated domains of K13, our analysis of sequence evolution coupled to tertiary structure information identified two patches of particularly slowly evolving sites located at the BTB and propeller domains, respectively.
The K13 propeller conserved patch includes specific positions from the inter-blade DA and intra-blade BC loops, within a shallow pocket on the bottom face of the propeller structure. Several lines of evidence suggest that this patch contains functional sites. First, the shallow pocket of K13 propeller is highly enriched in conserved positions, whereas solvent-exposed sites in proteins are usually less conserved than buried ones37. Second, the propeller shallow pocket of well-characterized BTB-Kelch proteins directly mediates propeller’s binding activities (KEAP1, KLHL2, KLHL3, KLHL12)23,33,34 and are also predicted as conserved patches in our analyses. Finally, the conserved patches at the propeller shallow pocket of K13 and BTB-Kelch share interesting properties: they display a markedly electropositive surface potential energy and they are enriched in arginine and serine residues. In KEAP1, KLHL2 and KLHL3, these arginine and serine residues bind to acidic peptides derived from their substrates, Nrf2 and WNK, respectively33,34 (Fig. 5c). The K13 propeller patch contains two arginine (R529 and R597) and three serine (S576, S577 and S623) residues that are strictly conserved in Apicomplexa species. Altogether, these results indicate that the shallow pocket of K13 propeller exhibits several properties of a binding surface, and we speculate that it may be critical for the recognition of substrate protein(s) further targeted for degradation. According to the electropositive region surrounding the K13 propeller conserved patch, the putative K13 substrate may harbor an acidic binding motif.
The identity of proteins that bind to the K13 propeller through its shallow pocket is currently unknown. In P. falciparum, PfPI3K is a likely candidate as it is immunoprecipitated with full-length PfK13, and its ubiquitination and proteasomal degradation are altered by the pfk13 propeller C580Y mutation17. Another candidate may be the PK4 kinase which phosphorylates eIF2α, a key mediator of translation-mediated latency involved in ART-R19.
However, whether the pfk13-mediated ART-R mechanism in P. falciparum is related to the predicted activity of the propeller shallow pocket remains elusive. Remarkably, pfk13 ART-R mutations are distributed across the whole propeller fold and none has been specifically located at the surface of the propeller shallow pocket (Supplementary Fig. 11). Mutations that are not associated with ART-R or which have an uncharacterized phenotype were reported at 9 out of the 19 positions forming the shallow pocket. All of these mutations seem to be rare, as they are carried by only one or two parasite samples (Table 3). Therefore, mutations at the positions contributing to the pocket surface may be either not involved in ART-R, or too damaging for the K13 native function to provide a long-term competitive advantage. One possibility is that some ART-R mutations may alter the overall amount of bound substrate at the shallow pocket through long-range conformational changes propagating to the pocket or by altering the propeller domain stability, as reported for pathogenic mutations in KEAP1 W544C and KLHL3 S410L, respectively42,43. Furthermore, propeller conformational changes associated with ART-R might be associated with specific cellular conditions, for example increased oxidative stress as suggested by others44.
BTB appeared here as the most conserved domain of K13 during Apicomplexa evolution, and therefore likely carries critical activities. Remarkably, the BTB domain of K13 most resembles to the one of the KCTD protein family regarding its primary sequence, tertiary structure and shortened domain size. The shortened BTB domains found in the KCTD protein family could still mediate protein oligomerization26,45. This is consistent with the dimer observed in the solved PfK13 BTB-propeller crystal structures (PDB codes 4YY8 and 4ZGC, unpublished). K13 BTB harbors a predicted, functional patch – located at the B2-B3 and B4-A4 loops and at the A4 helix – that overlaps with the one of KCTD proteins, suggesting that they may share similar functional sites. In KCTD proteins, these sites make some BTB-BTB contacts in tetrameric or pentameric assemblies when BTB is solved as an isolated domain26. However, the PfK13 BTB-propeller structure forms a dimer and none of the highly conserved positions make BTB-BTB contacts (PDB codes 4YY8 and 4ZGC, unpublished). Finally, amino acids of the B4-A4 loop (corresponding to positions 397-399 in PfK13) are exposed at the BTB-Cullin binding interface in several solved complexes25. These discrepancies in the role of the predicted BTB patch could be due to the fact that additional domains (such as propeller) or partner proteins might constraint the folding of the BTB domain into oligomers or complexes. Altogether, although the precise role of the predicted patch from K13 BTB remains elusive, data from the literature are fully consistent with the hypothesis that it may mediate protein-protein interactions, possibly with a Cullin protein. Of note, similarly to KCTD proteins, K13 lacks a 3-box motif which is usually located between BTB and propeller in BTB-Kelch proteins23,34 and contributes to Cullin binding together with the BTB domain34. We hypothesize that the absence of the 3-box motif in K13 is compensated by additional interactions with Cullin provided by a specific assembly of oligomerized K13 BTB fold and/or by K13 domains flanking BTB.
Interestingly, K13 also contains a CCC domain located before BTB. Based on the PAML data indicating a high level of conservation similar to the propeller domain, the K13 CCC domain likely carries critical activities. Coiled-coils are ubiquitous protein-protein interaction domains composed of two or more α-helices coiling together46, and their role in Kelch- and BTB-containing proteins remains elusive. A CCC domain was reported in a few Kelch-containing and KCTD proteins involved in cell morphogenesis but these CCC have a different domain organization than the one of K1321. Rather, CCC may participate in K13 oligomerization and/or serve as a binding interface with other proteins47. Of note, pfk13 R239Q, E252Q and D281V mutations located in CCC confer a moderate ART-R level3,11,48, consistent with our hypothesis that this very conserved domain displays some critical function.
In conclusion, through evolutionary and structural analyses, we identified the shallow pocket of the K13 propeller domain as a likely candidate surface for binding a substrate further targeted for degradation. We also detected in the BTB domain of K13 a conserved patch of sites that are involved in protein-protein interactions in known BTB-Cullin and BTB-BTB complexes. Altogether, these results support that K13 represents a core element of an ubiquitin ligase complex and provide candidate BTB and propeller sites for functional follow-up studies. Efforts should now focus on the identification of protein substrate(s) that bind(s) to the pocket of K13 propeller, which may help clarify the link between K13 function and ART-R phenotype.
Materials and Methods
Collection of k13 orthologous sequences from genomic databases
The amino acid sequence of PfK13 (PlasmoDB code PF3D7_1343700) was queried against the specialized eukaryotic pathogen database (EuPathDB release 33)49 and the NCBI non-redundant protein database using blastp and tblastn searches50 (BLOSUM62 matrix). A protein was considered as a likely orthologous sequence if the sequence identity was ≥ 30% and the e-value below the 10−3 cutoff. Forty-three K13 sequences – and corresponding k13 cDNA sequences – were retrieved from distinct Apicomplexa species including 21 Plasmodium species. A detailed bioinformatics analysis was performed on each protein sequence to confirm the presence of the three annotated domains of K13 (CCC, BTB and propeller) using InterPro51.
k13 sequence alignment
Considering the greater divergence of coding nucleotide sequences as compared to protein sequences due to the genetic code redundancy, a K13 protein sequence alignment was first generated using Mafft version 752 (E-INS-I strategy with BLOSUM62 scoring matrix, gap opening penalty 2.0 and offset 0.1). The output alignment was visually inspected and manually edited with BioEdit v7.2.553. The positions containing gaps in at least 30% of all sequences were removed, as suggested by PAML’ authors29. Then, the k13 nucleotide sequence alignment was generated with PAL2NAL54 using the cleaned K13 amino acid alignment as template.
Phylogenetic analysis of k13
The phylogenetic relationships of k13 nucleotide sequences were inferred using the maximum-likelihood method implemented in PhyML v3.055, after determining the best-fitting nucleotide substitution model using the Smart Model Selection (SMS) package56. A general time-reversible model with optimized equilibrium frequencies, gamma distributed among-site rate variation and estimated proportion of invariable sites (GTR + G + I) was used, as selected by the Akaike Information Criterion. The nearest neighbor interchange approach was chosen for tree improving, and branch supports were estimated using the approximate likelihood ratio aLRT SH-like method57. The k13 phylogeny was rooted using Cryptosporidia species as outgroup.
Molecular evolutionary analysis of k13
To investigate the evolutionary regime that has shaped the k13 protein-coding DNA sequence during species evolution, we analyzed the non-synonymous (dN) to synonymous (dS) substitution rate ratio ω (= dN/dS), estimated by maximum-likelihood using the codeml tool from PAML v.4.829,59. ω provides a sensitive measure of selective pressure at the amino acid level by comparing substitution rates with statistical distribution and considering the phylogenetic tree topology. Typically, ω < 1 indicates purifying selection, while ω = 1 and ω > 1 indicate neutral evolution and positive selection, respectively.
The heterogeneity of ω among lineages of the k13 phylogenetic tree (branch models) was tested by comparing the free-ratio (FR) model, which assumes as many ω parameters as the number of branches in the tree, to the one-ratio null model M0 which supposes only one ω value for all branches60,61.
The variation of ω among codon sites was then evaluated using codon models M1a, M2a, M3, M7 and M860,61. M1a allows codon sites to fall into two site classes, either with ω < 1 (purifying selection) or ω = 1 (neutral evolution), whereas model M2a extends model M1a with a further site class as ω > 1 (positive selection). Model M3 includes a discrete distribution of independent ω with k classes of sites (k = [3, 4, 5] in this study), with ω values and corresponding proportions estimated from the dataset. Model M7 assumed a β-distribution of ten ω ratios limited to the interval [0, 1] with two shape parameters p and q, whereas model M8 adds an additional site class with ω possibly > 1 as M2a does. The heterogeneity of ω across codon sites was tested by comparison of models M0:M3, while comparison of paired models M1a:M2a and M7:M8 allowed to detect positive selection60.
Model comparisons were made using likelihood ratio tests (LRTs)62. For each of the LRTs, twice the log-likelihood difference between alternative and null models (2ΔlnL) was compared to critical values from a chi-squared distribution with degrees of freedom equal to the difference in the number of estimated parameters between both models38. Candidate sites for positive selection were pinpointed using the Bayes empirical Bayes (BEB) inference which calculates the posterior probability that each codon site falls into a site class affected by positive selection (in models M2a and M8), as described by Yang and colleagues63. For model M3, in which no BEB approach is implemented yet, the naïve empirical Bayes (NEB) approach was used to identify those sites evolving under positive selection.
Three codon substitution models were used and compared for all models: F1×4 and F3×4, which assume equal nucleotide frequencies and individual codon frequencies at all codon positions, respectively, and the parameter-rich model F61, which estimates codon frequencies separately for each codon59,64. Since the three codon substitution models yielded similar results (Supplementary Fig. 13), we only presented those obtained with the most widely used F3×4 codon model. The analyses were run multiple times with different ω starting values to check the consistency of the results.
For PAML model M3 with k site classes of ω ratios, the posterior mean of ω value at each codon site was calculated as the average of the ω ratios across the k ω site classes weighted by their posterior probabilities29.
In addition to k13, four other Kelch protein-coding sequences were considered to compare their ω with those estimated for the whole Plasmodium proteome. We used the ω values previously estimated by Jeffares and colleagues30 with PAML under the null model M0 (ωM0 values) for each of the 3,256 orthologous protein-coding genes among six Plasmodium species: P. falciparum, P. berghei, P. chabaudi, P. vivax, P. yoelii and P. knowlesi. A full description of the procedure is presented in the original paper30.
Inferring site-specific substitution rates considering their spatial correlation in the K13 BTB-propeller tertiary structure
Most methods – including PAML – assume that site-specific substitution rates are independently distributed across sites65. However, it is widely acknowledged that amino acids located close to each other in protein tertiary structures are more likely to carry out similar functions, suggesting a site interdependence in amino acid sequence evolution attributed to tertiary structure65,66. Consequently, the substitution rate at the protein level (named λ in this study) was inferred using the FuncPatch server31. FuncPatch requires an amino acid sequence alignment, a phylogenetic tree and a protein tertiary structure to estimate the conservation level during species evolution and the strength of the spatial correlation of site-specific substitution rates λ (i.e. the characteristic length scale, in Å). We used the X-ray structure at 1.81 Å resolution of PfK13 BTB-propeller as the reference structure which does not contain the conserved CCC domain (PDB code 4YY8, unpublished). Beforehand, a Ramachandran analysis was performed to validate the quality of the structure using MolProbity67: 96.9% and 3.1% of the amino acids were in favored and allowed regions, respectively, and there were no outliers. FuncPatch only accepts monomeric proteins as input whereas BTB-propeller of PfK13 dimerizes in crystal structure. To take into account the dimeric organization of PfK13, its tertiary structure was edited using customized python scripts (Python v2.7.13) in order to merge the two monomers (chains A and B) and the K13 sequence was duplicated in the K13 protein sequence alignment. The analysis was also done using either one of the other monomeric BTB-propeller tertiary structure and also using a disulfide-bonded version of PfK13 BTB-propeller (PDB code 4ZGC, unpublished). All these control analyses yielded similar results (data not shown). The spatial correlation of the site-specific substitution rates λ in the K13 tertiary structure was tested using a Bayesian model comparison, where a null model (model 0), in which no spatial correlation of site-specific substitution rates λ is present, is compared to the alternative model (model 1). As suggested by FuncPatch’ authors, the spatial correlation was considered as significant if the estimated log Bayes factor (model 1 versus model 0) was larger than 8 in the dataset (conservative cutoff)31.
Delineation of K13 propeller blades and secondary structures
The propeller domain of PfK13 is composed of six blades having slightly different amino acid lengths. To get an accurate blade alignment at the primary amino acid sequence level, we first sought to align the six blade structures. The PDB propeller structure was obtained from the PfK13 BTB-propeller structure (PDB code 4YY8, chain A, unpublished) and was then divided into six parts, each one containing the atomic coordinates of one blade. The six blade structures were then aligned by minimizing the RMSD of atomic positions using the align function in PyMOL Molecular Graphics System68 so as to identify the amino acids from the six blades that are located at exactly the same blade position. This structure alignment was then used to align the six blades at the primary amino acid sequence level. The delineation of the strands and loops was obtained directly from the PDB file (PDB code 4YY8, chain A, unpublished).
Definition of ART-R mutations
We used the last status report on ART-R provided by the WHO to classify the positions of the PfK13 propeller domain as associated or not with an ART-R mutation5.
Evolutionary analysis of the BTB and propeller domains in other BTB- and Kelch-containing proteins
To better characterize the BTB domain of K13, we arbitrarily retrieved some members belonging to the main BTB-containing protein families (BTB-ZF, BTB-Kelch, RhoBTB, BTB-NPH3, MATH-BTB, KCTD, KCNA and SKP1 and Elongin C proteins; full list provided in Supplementary Table 6). A multiple protein alignment was generated using Mafft version 752 (default parameters) and was then manually edited with BioEdit v7.2.553 to retain only the region referring to the BTB core fold. The phylogenetic relationships were inferred with the aforementioned PhyML55 procedure using the best-fitting protein substitution model as determined by the SMS package56.
For further comparisons with the K13 BTB and propeller domains, site-specific substitution rates λ were inferred with FuncPatch for the BTB and propeller domains of several mammalian KCTD and BTB-Kelch proteins, respectively. In the present study, the proteins were selected on the basis of their sequence homology with K13, the availability of a solved 3D structure, and their known implication in a Cullin-RING E3 ligase complex as suspected for K13. In addition, only well-characterized ligand-binding function and the presence of a six-bladed propeller structure similar to the one of K13 were considered for BTB-Kelch proteins. After a careful review of the literature, we selected two KCTD proteins: SHKBP1 (UniProt code Q8TBC3) which regulates the epidermal growth factor receptor (EGFR) signaling pathway69; and KCTD17 (Q8N5Z5) which mediates the ubiquitination and proteasomal degradation of the ciliogenesis down-regulation TCHP protein70. Considering BTB-Kelch proteins, we focused on: KEAP1 (Q14145) which interacts with its client protein Nrf2 for the induction of cytoprotective responses to oxidative stress35; KLHL2 (O95198) and KLHL3 (Q9UH77) which both participate in the ubiquitination and degradation of WNK substrates regulating blood pressure33; and KLHL12 (Q53G59) which negatively regulates the WNT-beta-catenin pathway through the degradation of Dishevelled proteins36. First, each Homo sapiens KCTD and BTB-Kelch sequence was successively submitted as query sequence for a blastp search50 (BLOSUM62 scoring matrix, max target sequences fixed at 1,000) against the NCBI non-redundant protein database to retrieve orthologous sequences from a large amount of species. The output lists were then filtered according to specific criteria so as to keep only sequences having an unambiguous description (i.e. a description that includes the name of the queried KCTD or BTB-Kelch protein), and that aligned with ≥ 80% sequence coverage and had ≥ 60% sequence identity with the query sequence. The multiple protein alignment of each set of orthologous sequences was then generated using Mafft version 752 (E-INS-I strategy with BLOSUM62 scoring matrix, gap opening penalty 2.0 and offset 0.1). A second filtering step was performed to remove incomplete or miss-annotated sequences, i.e. the sequences that did not contain all the annotated domains (using the domain annotation automatically generated by the Uniprot Knowledgebase) and/or that included a gapped position located in one of the annotated domains. The final multiple protein alignments included: i) 124 sequences × 103 aligned positions for SHKBP1; ii) 139 sequences × 102 aligned positions for KCTD17; iii) 135 sequences × 285 aligned positions for KEAP1; iv) 162 sequences × 286 aligned positions for KLHL2; v) 158 sequences × 286 aligned positions for KLHL3; and vi) 129 sequences × 289 aligned positions for KLHL12. The full list of orthologous sequences used for each mammalian KCTD and BTB-Kelch protein is provided in Supplementary Table 7. Then, the phylogenetic relationships were inferred using PhyML55 after determining the best-fitting protein substitution model with the SMS package56. The 3D structures of KCTD BTB and BTB-Kelch propeller domains were retrieved from the PDBsum database under the following accession numbers: 4CRH for SHKBP1 (resolution: 1.72 Å)26, 5A6R for KCTD17 (resolution: 2.85 Å)26, 2FLU for KEAP1 (resolution: 1.50 Å, in complex with a Nrf2 peptide)71, 4CHB for KLHL2 (resolution: 1.56 Å, in complex with a WNK4 peptide)33, 4CH9 for KLHL3 (resolution: 1.84 Å, in complex with a WNK4 peptide)33, and 2VPJ for KLHL12 (resolution: 1.85 Å)23. Beforehand, the quality of each structure was validated using MolProbity67: none of the structures had amino acids identified as outliers, and approximately 98% of the amino acids of each structure were in favored regions.
Evaluation of structural properties
The electrostatic potential energy of each propeller structure was calculated using the Adaptive Poisson-Boltzmann Solver (APBS) method72. Beforehand, the required pqr input files were prepared using PDB2PQR v.2.1.173. The missing charges were added using the Add Charge function implemented in USCF Chimera74. A grid-based method was used to solve the linearized Poisson-Boltzmann equation at 298 K, with solute (protein) and solvent dielectric constant values fixed at 2 and 78.5, respectively. The contact surface selection was mapped using a radius of 1.4 Å in a scale of −8 kT/e to +8 kT/e.
The relative solvent accessibility (RSA) was estimated as the accessible surface area of amino acids using DSSP75, then normalized with the maximal accessible surface area of each amino acid76. The side-chain weighted contact number (WCNsc) of each amino acid was calculated using a customized python script provided by Sydykova and colleagues77. All structural properties were assessed using the aforementioned PDB files.
Structure visualization
All molecular drawings were generated using the UCSF Chimera software74.
Statistical analyses
Substitution rates among partitions were compared using non-parametric Mann Whitney U or Kruskal-Wallis H tests. When focusing on the propeller shallow pocket, contingency tables were produced and statistically tested with chi-squared test. We used p < 0.05 as the cutoff for significance in all statistical tests.
Author contributions
R.C., A.S. and J.C. designed the study. D.J. provided data. R.C. performed most of the analyses. R.C., A.S., and J.C. contributed to interpretations. All authors have participated in paper writing and approved it prior to submission.
Competing interests
The authors declare no competing financial interests.
Acknowledgements
We thank YF. Huang and GB. Golding for help with FuncPatch analysis. We thank O. Mercereau-Puijalon, M. Miteva, and R. Duval for helpful discussions. We thank A. Sissoko and F. Palstra for proofreading the manuscript. We thank the École Doctorale MTCI for PhD grant funding for R. Coppée.
References
- 1.↵
- 2.
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.
- 79.