Abstract
Plasmodium vivax is responsible of the majority of malaria infections outside Africa. Its closer genetic relative, Plasmodium vivax-like, was discovered in African great apes and suggested to have given rise to P. vivax in humans. We generated two newly P. vivax-like reference genomes and 9 additional P. vivax-like genotypes, to unravel the evolutionary history of P. vivax. We showed a clear separation between the two clades, a higher genetic diversity of P. vivax-like parasites in comparison to the P. vivax ones, and the potential existence of two sub-clades of P. vivax-like. We dated the relative split between P. vivax and P. vivax-like as three times shorter than the split between P. ovale wallikeri and P. ovale curtesi and 1.5 times longer than the split between Plasmodium malariae. The sequencing of the P. vivax-like genomes is an undeniable advance in the understanding of P. vivax biology, evolution and emergence in human populations.
Main text
Plasmodium vivax, the most prevalent human malaria parasite outside Africa, is responsible for severe and incapacitating clinical symptoms in humans1. Traditionally, P. vivax has been neglected because of its lower mortality in comparison to Plasmodium falciparum2,3. Its ability to produce a dormant liver-stage form (hypnozoite), responsible of relapsing infections, makes it a challenging public health issue for malaria elimination. The recent emergence of antimalarial drug resistance4 as well as the discovery of severe and even fatal human cases2,5,6 has renewed the interest for this enigmatic species, including its evolutionary history and its origin in humans.
Earlier studies placed the origin of P. vivax in humans in Southeast Asia (“Out of Asia” hypothesis) based on its phylogenetic position in a clade of parasites infecting Asian monkeys7. At that time, the closest known relative of P. vivax was considered to be Plasmodium cynomolgi, an Asian monkey parasite8. However, this hypothesis was recently challenged with the discovery of another Plasmodium species, genetically closer to P. vivax than P. cynomolgi, circulating in African great apes (chimpanzees and gorillas)9,10. This new lineage (hereafter named Plasmodium vivax-like) was considered by certain authors to have given rise to P. vivax in humans following a transfer of parasites from African apes10, but this “Out of Africa” hypothesis still remains debated. Moreover, a transfer of P. vivax-like parasites has been documented, thus making possible the release of new strains in new hosts species, specifically in human populations 9. In such a context, it now seems fundamental to characterize the genome of the closest ape-relative to the human P. vivax parasite in order to get a better understanding of the evolution of this parasite and also to identify the key genetic changes explaining the emergence of P. vivax in human populations.
Here we report the analysis of two reference and nine further genotypes of P. vivax-like parasites. We compare these genome sequences to those of several worldwide P. vivax isolates, and P. cynomolgi and Plasmodium knowlesi reference genomes. Our analyses show that the genomes of P. vivax and P. vivax-like are highly similar and co-linear within the core regions. Phylogenetic analyses clearly show that P. vivax-like parasites form a genetically distinct clade from P. vivax. Concerning the relative divergence dating, we estimate that both species diverged relatively from each other three times more recently than Plasmodium ovale wallikeri and Plasmodium ovale curtisi, and 1.5 times earlier than Plasmodium malariae, a human Plasmodium, and Plasmodium malariae-like, its ape relative. Similar to other ape-infective Plasmodium species, P. vivax-like exhibits far higher levels of genetic diversity than its human-infective relative, P. vivax. Finally, our genome-wide analyses provide new insights into the adaptive evolution of P. vivax. Indeed, we identify several key genes that exhibit signatures of positive selection exclusively in P. vivax, and show that some gene families important for red blood cell invasion have undergone species-specific evolution in the human parasite.
Genome Assemblies
Eleven P. vivax-like genotypes were obtained from two different kinds of samples: ten infected chimpanzee blood samples collected during successive routine sanitary controls of chimpanzees living in the Park of La Lékédi (a sanctuary in Gabon) and one infected Anopheles mosquito collected during an entomological survey realized in the same park11 (Supplementary Table 1). For blood samples, white blood cells were depleted using the CF11 method12 to reduce the amount of host DNA. After DNA extraction, samples were subjected to whole genome amplification (WGA) in order to obtain sufficient parasite DNA for library preparation. Sequencing was performed using short read Illumina technology. For one sample (Pvl06), long read sequencing (PacBio technology) was performed in order to get a better coverage of regions containing subtelomeric gene families.
Among the eleven samples, ten presented mixed infections with other Plasmodium species (Supplementary Table 1). Four samples containing P. gaboni or P. malariae-like co-infections were used in other studies (see Supplementary Table 1)13,14. In order to obtain the P. vivax-like genotypes, sequencing reads were extracted based on their similarity to the reference genome sequence of P. vivax, PvP0115. Sequencing reads from two samples, one obtained using Illumina sequencing, Pvl01, and another using PacBio technology, Pvl06, were used to perform de novo genome assemblies and annotated to produce reference genomes for P. vivax-like (Supplementary Table 2). Of the two assemblies, Pvl01 is of considerably higher quality with 4,570 orthologues to the PvP01 reference genome compared to 2,925 for Pvl06 (Table 1). Both assemblies consist of 14 supercontigs (corresponding to the 14 P. vivax chromosomes) and, respectively for Pvl01 and Pvl06, 226 and 370 unassigned contigs, comprising a total of 27.5Mb and 18.8Mb in size for Pvl01 and Pvl06 respectively. After annotation with Companion 16, these two genomes contained 5,532 and 4,953 annotated genes (Table 1). The genome sequences obtained from the other samples were used for SNP calling, and population genetic and phylogenetic analyses.
Gene synteny and gene composition
Comparing the P. vivax-like reference genomes with those of P. vivax (PvP01 and SalI)2,15, P. cynomolgi (B strain)8 and P. knowlesi (H strain)17 reveals several similarities, including a similar GC content, extensive collinearity and conservation of gene content/organization (Table 1). The P. vivax-like core genome sequences are completely syntenic to the P. vivax PvP01 reference genome sequence (Supplementary Figures 1 and 2).
For Plasmodium parasites, most species-specific genes are part of large gene families, such as var genes in P. falciparum or pir genes that currently are present in all Plasmodium genomes studied18,19. Table 2 provides a summarized view of gene content and copy number of the main multigene families in P. vivax-like in comparison to P. vivax, P. knowlesi and P. cynomolgi. Despite only partial coverage of the subtelomeric regions of our reference genomes (Supplementary Figures 1 and 2), at least one copy of each major gene family was detected (Table 2). In comparison to P. vivax, as expected because of the partial subtelomeric sequencing coverage, the number of copies in each family was generally lower or equal in P. vivax-like, except for the cyto-adherence linked asexual (CLAG) gene family. The CLAG family is an essential gene family in host-parasite interactions, playing a role in merozoite invasion, parasitophorous vacuole formation, cyto-adherence and in the uptake of ions and nutrients from the host plasma20,21. In P. vivax-like, the CLAG gene family is composed of one extra copy (n=4, 2 CLAG genes on chromosome 14, one CLAG gene on chromosome 8 and one CLAG gene on chromosome 7) compared to the P. vivax references PvP01 and Sal I (n=3, one CLAG gene on each of the following chromosomes 7, 8 and 14). Even if this needs to be confirmed in the future by full-genome sequencing of more P. vivax-like samples, this could suggest that P. vivax potentially lost one CLAG gene (the one situated on chromosome 14) during its adaptation to the human host. Another interesting observation is the expansion of most gene families involve in red blood cell invasion or cyto-adherence (such as PHIST, ETRAMP, PST-A, Pv-fam-e, Pv-fam-a, RBP) in the human P. vivax lineage in comparison to P. vivax-like, P. cynomolgi and P. knowlesi, which suggests that the gene duplications in the ancestral lineage of P. vivax could be an adaptation to humans (Table 2).
During the Plasmodium life cycle, host erythrocyte invasion is mediated by specific interactions between parasite ligands and host erythrocyte receptors. Two major gene families are involved in erythrocyte invasion: the Duffy-binding proteins (DBP) and the reticulocyte-binding proteins (RBP)22. For P. vivax, two DBP genes (DBP1 on chromosome 6 and DBP2 on chromosome 1) exist and seem to be essential to red blood cell invasion, as demonstrated by their inability to infect individuals not expressing the Duffy receptor on the surface of their red blood cells (i.e. Duffy-positive individuals)23⇓–25. In the reference genome of P. vivax-like (Pvl01), we observe the DBP1 gene as for the P. vivax genome PvP01 (Table 2) and also in the other Plasmodium species, however we did not observe the DBP2 one.
Knowing that gorillas and chimpanzees are today all described as Duffy positive10, we could hypothesize that P. vivax-like parasites infect only Duffy positive hosts. This would be in accordance with the fact that the only described transfer of P. vivax-like to humans was in a Caucasian Duffy positive individual9 and that no transfers of P. vivax-like were recorded in Central African Duffy negative populations despite the fact that they live in close proximity with infected ape populations26.
RBP genes encode a merozoite surface protein family present across all Plasmodium species and known to be involved in erythrocyte invasion and host specificity22. Comparison of the organization and characteristics of this gene family between P. vivax, P. vivax-like, P. knowlesi and P. cynomolgi (Table 2), reveals that: 1) all gene classes (RBP1, RBP2 and RBP3) are ancestral to the divergence of these species; 2) the expansion of RBP2, a class of genes that confer the ability to infect reticulocytes (immature red blood cells), was likely ancestral to the P. vivax/P. vivax-like lineage and 3) RBP3, which is a class of genes supposed to confer the ability to infect specifically normocytes (mature red blood cells), are functional in all species except in P. vivax (where the gene is pseudogenised), suggesting that P. vivax specifically lost the ability to infect this category of erythrocytes during its adaptation to humans (Figure 1).
Phylogenetic relationships to other Plasmodium species and divergence time
Conservation of the gene content between P. vivax-like with the other primate-infective Plasmodium species has enabled us to reconstruct with confidence the relationships between the different species and to estimate the age of the different speciation events. This analysis confirmed the position of P. vivax-like as the closest sister lineage of P. vivax (Figure 2).
Regarding the divergence time, with the fix point considered as the relative distance of the split of the speciation of the two P. ovale genomes and the P. malariae-like and P. malariae14, relative split times between Plasmodium species were estimated using the method of Silva et al.27. Assuming consistent mutation rates and generation times across the branches, we observed that the time of the split between P. vivax and P. vivax-like is about three times more recently than than the split between P. ovale wallikeri and P. ovale curtisi and 1.5 times earlier than Plasmodium malariae and Plasmodium malariae-like. It has to be noticed that the split between these two last species was estimated to be five times earlier than the one between P. ovale wallikeri and P. ovale curtesi, which is consistent with the study of Rutledge et al.14. However, it has to be confirmed with other dating methods, because the GC content could bias this estimation. Indeed, the models used in our study assume a strict molecular clock, which would not apply to all Plasmodium species, specifically for P. falciparum because of its extreme GC content in comparison to other Plasmodium species.
Relationships to worldwide human P. vivax isolates
To analyse the relationship between our 11 P. vivax-like isolates with human P. vivax, we completed our dataset with 19 published human P. vivax genomes28 (Table 1). All sequencing reads were aligned against the PvP01 reference genome15 and SNPs were called and filtered as described in the Materials and Methods section. Maximum-likelihood phylogenetic trees were then produced based on 100,616 SNPs. Our results clearly demonstrate the presence of a significantly distinct clade (a bootstrap value of 100) composed of P. vivax-like strains on one side and human P. vivax isolates on the other side (Figure 3), which is in disagreement with previous results suggesting that human strains formed a monophyletic clade within the radiation of ape P. vivax-like parasites10.
One explanation for this difference with previous results could be due to a phenomenon called Incomplete Lineage Sorting (ILS). ILS is the discordance observed between some gene trees and the species or population tree due to the coalescence of gene copies in an ancestral species or population29. Such a phenomenon is often observed when species or population divergence is recent, which is the case for P. vivax/P. vivax-like30,31. ILS may thus result in the wrong conclusion of P. vivax and P. vivax-like populations being intermixed and P. vivax diversity being included in the diversity of P. vivax-like. In our study, the use of a lot more genetic information localized throughout the genome, both in genic and intergenic regions, allows us to reduce this effect of ILS and reflects a more accurate picture of the genetic relationship between the different parasite species. Another explanation to these contradictory results would rely on the analysis of P. vivax-like samples collected only in Gabon or Cameroon, which limit the access to the full genetic diversity of these parasites. Clearly for now, the origin, the direction of the transfer and the evolutionary history of these parasites are still unclear and need addition of more P. vivax-like samples from other locations to be elucidated.
Our results also show that P. vivax-like is composed of two distinct lineages: the one including the two reference genomes (Pvl01 and Pvl06) and seven other isolates that will hereafter be referred as P. vivax-like 1 and another one including two isolates (Pvl09 and Pvl10) (referred as P. vivax-like 2) (Figure 3). These two lineages may reflect an ancient split within P. vivax-like or be the consequence of a recent introgression or hybridization event between P. vivax-like and P. vivax in Africa. Further analyses including sequencing of more African P. vivax populations, from different geographic areas, should be done to disentangle these two hypotheses.
Previous studies highlighted the high genetic diversity of P. vivax-like populations in comparison to P. vivax worldwide15. In this genome-wide analysis of the nucleotide diversity π32, we confirm that P. vivax-like populations are significantly more diverse than P. vivax populations (P<0.001, Wilcoxon test), with P. vivax-like samples showing nearly ten times higher nucleotide diversity (πP.vivax = 0.0012; πP.vivax-like = 0.0096). This suggests that African great apes P. vivax-like parasites are probably more ancient that the human P. vivax strains and that the human P. vivax species as other human Plasmodium went through a bottleneck and only recently underwent population expansion.
P. vivax specific adaptive evolution
Comparison of the P. vivax genome to its closest sister lineage (P. vivax-like) and to the other primate Plasmodium provides a unique opportunity to identify P. vivax specific adaptations to humans. We applied a branch-site test of positive selection to detect events of positive selection that exclusively occurred in the P. vivax lineage. Within the reference genome P. vivax-like (Pvl01), 418 genes exhibited significant signals of positive selection (Supplementary Table 3). In the human P. vivax genome PvP01, the test allowed the identification of 255 genes showing significant signals of positive selection (Supplementary Table 4). Among these genes presenting a significant dN/dS ratio, 71 were shared between P. vivax and P. vivax-like, including 56 encoding for proteins with unknown function, and 15 encode proteins that are involved either in energy metabolism regulation (n = 9), in chromatid segregation (n = 2) or cellular-based movement (n = 4).
We then took into consideration the genes detected under positive selection in P. falciparum13 and compared them to those obtain in P. vivax. We identified of a subset of 10 genes under positive selection in the human P. vivax and P. falciparum parasites (P-value<0.05). Among these 10 genes, five are coding for conserved Plasmodium proteins with unknown function and three for proteins involved in either transcription or transduction. Interestingly, the two remaining genes under positive selection in these two human Plasmodium parasites code for the oocysts capsule protein, which is essential for malaria parasite survival in Anopheles’ midgut, and for the rhoptry protein ROP14, involved in the protein maturation and the host cell invasion. These results suggest that these proteins could be essential for infection of humans or their vectors and future studies should focus on the involvement of these proteins in human parasite transmission and infection.
Conclusion
In summary, we assembled the first P. vivax-like reference genomes, the closest sister clade to human P. vivax, which is an indispensable step in the development of a model system for a better understanding of this enigmatic species. We validated that P. vivax-like parasites form a genetically distinct clade from P. vivax. Concerning the relative divergence dating, we estimated that the divergence between both species occurred three times more recently than the split between P. ovale wallikeri and P. ovale curtisi, and 1.5 times earlier than the split between Plasmodium malariae. Our genome-wide analyses provided new insights into the adaptive evolution of P. vivax. Indeed, we identified several key genes that exhibit signatures of positive selection exclusively in the human P. vivax parasites, and show that some gene families important for red blood cell invasion have undergone species-specific evolution in the human parasite, such as for instance RBPs. Are these genes the keys of the emergence of P. vivax in the human populations? This pending question will need to be answered through functional studies associated to deeper whole genome analyses. To conclude, this study provides the foundation for further investigations into Plasmodium vivax parasite’s traits of public health importance, such as features involved in host-parasite interactions, host specificity, and species-specific adaptations.
Material and methods
P. vivax-like sample collection and preparation
P. vivax-like samples were identified by molecular diagnostic testing during a continuous survey of great ape Plasmodium infections carried out in the Park of La Lékédi, in Gabon, by the Centre International de Recherches Médicales de Franceville (CIRMF)9. In parallel, a survey of Anopheles mosquitoes circulating in the same area (Park of La Lékédi, Gabon) was conducted in order to identify potential vectors of ape Plasmodium11. Specifically, mosquitoes were trapped with CDC light traps in the forest of the Park of La Lékédi in Gabon. Anopheles specimen were retrieved and identified using a taxonomic key33 before proceeding to dissection of isolate abdomen. Samples were then stored at −20°C until transportation to the CIRMF, Gabon, where they were stored at −80 °C until processed. Blood samples of great apes were treated using leukocyte depletion by CF11 cellulose column filtration34. P. vivax-like samples were identified either by amplifying and sequencing the Plasmodium Cytochrome b (Cytb) gene as described in Ollomo et al. or directly from samples already studied for other Plasmodium species29,38. This allowed the detection of 11 P. vivax-like samples, 10 from chimpanzees and 1 from an Anopheles moucheti mosquito. Most of them were co-infected with other Plasmodium species, and/or probably with multiple P. vivax-like isolates (see below and Supplementary Table 1). The identification of intraspecific P. vivax-like co-infections was made by analyzing the distribution of the reference allele frequency35.
Ethical approval
These investigations were approved by the Government of the Republic of Gabon and by the Animal Life Administration of Libreville, Gabon (no. CITES 00956). All animal work was conducted according to relevant national and international guidelines.
Genome sequencing
DNA was extracted using Qiagen Midi extraction kits (Qiagen) following manufacturer’s recommendation, and then enriched through a whole genome amplification step (WGA36). The Illumina isolates were sequenced using Illumina Standard libraries of 200-300bp fragments and amplification-free libraries of 400-600bp fragments were prepared and sequenced on the Illumina HiSeq 2500 and the MiSeq v2 according to the manufacturer's standard protocol (Supplementary Table 1). The Pvl06 isolate was sequenced using Pacific Biosciences with the C3/P5 chemistry after a size selection of 8 kb fragments. Raw sequence data are deposited in the European Nucleotide Archive. The accession numbers can be found in Supplementary Table 1.
Assembly of P. vivax-like genomes
Two P. vivax-like genomes (Pvl01 and Pvl06) were assembled from a co-infection with a P. malariae-like and a P. reichenowi (PmlGA01 sample in Rutledge et al. 2017)14 for Pvl01 and from a co-infection with P. gaboni for Pvl06 (PGABG01 sample in Otto et al.)37. Briefly, the genome assembly of the Illumina sequenced sample Pvl01 was performed using MaSuRCA38 and the assembled contigs belonging to P. vivax-like were extracted using a BLAST search against the P. vivax P01 reference genome (PvP01 genome; Auburn et al. 2016; http://www.genedb.org/Homepage/PvivaxP01)15. The draft assembly was further improved by iterative uses of SSPACE39, GapFiller40 and IMAGE41. The 3,540 contigs resulting from these analyses were then ordered against PvP01 genome and the P. gaboni and P. reichenowi reference genomes13 to separate possible co-infections with a parasite species of chimpanzees from the Laverania subgenus using ABACAS242. The genome assembly was further improved and annotated using the Companion web server16. BLAST searches of the unassembled contigs against the two reference genomes were performed before running Companion to keep the contigs with the best BLAST hits against PvP01 only. The PacBio assembly of Pvl06 was performed using Hierarchical Genome Assembly Process HGAP43.
Read mapping and alignment
Nine additional P. vivax-like samples were sequenced for population genomics and polymorphism analyses (see Supplementary Table 1). The dataset was completed with 19 globally sampled P. vivax isolates28 for human vs. great apes parasite comparisons, and the Asian parasite P. cynomolgi strain B was used as the root for phylogenetic inferences8. The 11 newly generated P. vivax-like samples, together with the already published 19 P. vivax samples and the reference strain P. cynomolgi8 Illumina reads were mapped against the PvP01 reference genome using BWA44 with default parameters. We then used Samtools to only keep properly paired reads and to remove PCR duplicates45.
Gene family search
For all P. vivax-like Pvl01 and Pvl06, P. vivax PvP01 and SalI, P. cynomolgi B strain and P. knowlesi H strain genomes obtained, gene variants were detected and counted using Geneious software46.
Orthologous group determination and alignment
Orthologous groups across (1) P. vivax PvP01, P. vivax-like Pvl01, P. cynomolgi B strain8 and P. knowlesi H strain17 reference genomes and (2) the 13 Plasmodium reference genomes used for the phylogeny (the seven Laverania genomes P. falciparum47, P. praefalciparum, P. reichenowi, P. billcollinsi, P. blacklocki, P. gaboni and P. adleri13, P. cynomolgi B strain and P. knowlesi H, P. vivax PvP01, P. vivax-like Pvl01 (this study), and P. malariae and P. malariae-like 14 were identified using OrthoMCL v2.0948,49. From those, we extracted different sets of one-to-one orthologues for the subsequent analyses: a set of 4,056 genes that included the one-to-one orthologues among the four restricted species, P. vivax, P. vivax-like, P. cynomolgi and P. knowlesi, and a set of 2,352 among the 13 Plasmodium species considered here for the interspecies phylogenetic analysis.
Amino acid sequences of the one-to-one orthologues were aligned using MUSCLE50. Prior to aligning codon sequences, we removed the low complexity regions identified on a nucleotide level using dustmasker51 and then in amino acid sequences using segmasker 52 from ncbi-blast. After MUSCLE alignments50, we finally excluded poorly aligned codon regions using Gblocks default parameters53.
SNP discovery and annotation
SNPs were called independently for all 11 P. vivax-like and 19 P. vivax samples by first mapping the samples against the P. vivax PvP01 reference genome using SMALT and then calling SNPs using Samtools mpileup v. 0.1.9 (parameters -q 20 -Q 20 -C 50) followed by bcftools (call -c -V indels). SNPs were filtered using VCFTools (--minDP 5 -max-missing 1).
Divergence dating
To estimate the dates of speciation, we used 12 Plasmodium genomes: the here generated P. vivax-like Pvl01, the P. vivax PvP0115, P. cynomolgi M Version 254, P. coatneyi PcyM54, P. knowlesi H strain17, P. falciparum 3D747 P. reichenowi PrCDC37, P. gallinaceum55, and P. ovale wallikeri, P. ovale curtisi, P. malariae and P. malariae-like14. From the proteins of the 12 genomes, low complexity regions were excluded with SEG filter, using default parameters56. After an all-against-all BLASTp (parameter Evalue 1e-6), OrthoMCL v.1.449 (using default parameters) was run. For each of the 2943 1-1 orthologous, an alignment was generated with MUSCLE50 and the alignment was finally cleaned with Gblocks (parameters: -t=p -b5=h -p=n -b4=2)57.
To build the phylogenetic tree, the software RAxML v.8.2.858 was used on the concatenated alignments of 1000 random picked orthologous. The PROTGAMALG substitution model was then used, as proposed in Rutledge et al14, 100 bootstraps were run confirming the tree.
To date the speciation, the methods from Silva et al.27 was applied. The dAA was obtained through a pairwise comparison using paML v.4.7 59. An R script from the authors of the method27 allowed the estimation of alpha with the error bound for each pair, based on a Total Least Squares regression. Results are reported in Figure 2. As a fix point, we used the relative distance of the split of the speciation of the two P. ovale genomes and the P. malariae-like and P. malariae14. The split of P. reichenowi and P. falciparum was also dated based on the P. malariae and P. malariae-like split estimation14. However, this will need to be confirmed with other methods, because the GC content could bias this estimation. Indeed, the models used in our study assume a strict molecular clock, which would not apply to all Plasmodium species, specifically for P. falciparum because of its extreme GC content in comparison to other Plasmodium species.
Phylogenetic tree of P. vivax and P. vivax-like strains
We constructed for Figure 3 a maximum-likelihood tree using the filtered variant call set of SNPs limited to the higher allelic frequency genotypes identified within each sample using RAxML and PhyML (using general-time reversible GTR models)58,60. Trees were visualized using Geneious software46. All approaches showed the same final phylogenetic tree described in the results section.
Genome wide nucleotide diversity
For the P. vivax and P. vivax-like populations, we calculated the genome-wide nucleotide diversity (π)32 using VCFTools61. The nucleotide diversity was compared between P. vivax and P. vivax-like species based on the Wilcoxon-Mann-Whitney non-parametric test.
Detection of genes under selection
In order to identify genomic regions involved in the parasite adapting to the human host, meaning regions under positive selection, we performed branch site tests. To search for genes that have been subjected to positive selection in the P. vivax lineage alone, after the divergence from P. vivax-like, we used the updated Branch-site test of positive selection62 implemented in the package PAML v4.4c59. This test detects sites that have undergone positive selection in a specific branch of the phylogenetic tree (foreground branch). All coding sequences in the core genome were used for the test (4,056 gene sets of orthologous genes). A set of 4056 orthologous groups between P. vivax, P. vivax-like, P. knowlesi and P. cynomolgi was used for this test. dN/dS ratio estimates per branch and gene were obtained using Codeml (PAML v4.4c) with a free-ratio model of evolution59.
Data availability
All sequences are being submitted to the European Nucleotide Archive. The accession numbers of the raw reads and assembly data will be found in Supplementary Table 2. As the assemblies are private, they will be available on request.
Author contributions
DTO, FR, FP and VR designed the study. CA,, PD, BO, NDM, APO, BN, BM, LB, CP, FP and VR collected and assessed samples. CA performed the WGA. TDO managed the sequencing. AG, BF and TDO did assembly and annotation. TDO, VR and FP performed the evolutionary analyses on core genomes. TDO and GGR performed the dating analyses. AG, TDO, FR and VR wrote the manuscript. All authors read and approved the paper.
Competing financial interest
None.
Acknowledgements
Authors thank ANR ORIGIN JCJC 2012, LMI ZOFAC, CNRS-INEE, CIRMF, IRD, Sanger Institute for financial support and Société d’Exploitation du Parc de la Lékédi, Bakoumba, GABON. GGR is supported by the Medical Research Council (MR/J004111/1) and the Wellcome Trust (098051).