Abstract
Malaria vectors are exposed to intense selective pressures due to large-scale intervention programs that are underway in most African countries. One of the current priorities is therefore to clearly assess the adaptive potential of Anopheline populations, which is critical to understand and anticipate the response mosquitoes can elicit against such adaptive challenges. The development of genomic resources that will empower robust examinations of evolutionary changes in all vectors including currently understudied species is an inevitable step toward this goal. Here we constructed double-digest Restriction Associated DNA (ddRAD) libraries and generated 6461 Single Nucleotide Polymorphisms (SNPs) that we used to explore the population structure and demographic history of wild-caught Anopheles moucheti from Cameroon. The genome-wide distribution of allelic frequencies among samples best fitted that of an old population at equilibrium, characterized by a weak genetic structure and extensive genetic diversity, presumably due to a large long term effective population size. Estimates of FST and Linkage Disequilibrium (LD) across SNPs reveal a very low genetic differentiation throughout the genome and the absence of segregating LD blocks among populations, suggesting an overall lack of local adaptation. Our study provides the first investigation of the genetic structure and diversity in An. moucheti at the genomic scale. We conclude that, despite a weak genetic structure, this species has the potential to challenge current vector control measures and other rapid anthropogenic and environmental changes thanks to its great genetic diversity.
1. Introduction
Despite having a widely acknowledged epidemiological significance, most African malaria mosquitoes are so-called “neglected vectors” because the efforts devoted to their study and control are clearly insufficient. Anopheles moucheti sensu lato is one of the best examples. This mosquito vector is a group of three related species (An. moucheti moucheti, An. moucheti nigeriensis, and An. moucheti bervoetsi) distributed across the equatorial forest and distinguishable from each other by slight morphological differences (Kengne et al., 2007). The nominal species of the group, An. moucheti moucheti (hereafter An. moucheti), is a very efficient and anthropophilic vector especially in rural areas where the highest malaria burden due to Plasmodium falciparum infections are recorded (Antonio-Nkondjio et al., 2009, 2008, 2002). In such settings, abundant populations of An. moucheti breed year-round in slow moving streams and rivers and often outcompete other main malaria mosquitoes. Despite this epidemiological significance, the evolutionary history and the adaptive potential of this vector remain understudied. Early investigations of the genetic structure based on allozymes and microsatellites showed a significant genetic differentiation among samples from three different countries (Antonio-Nkondjio et al., 2008), but detected little divergence within populations from the same country (Antonio-Nkondjio et al., 2007, 2002). Precisely, very low levels of genetic differentiation were found between populations from Cameroon across eight microsatellite loci, suggesting extensive gene flow at such geographic scales, but detailed studies in other countries are still lacking to fully support this hypothesis. On the other hand, African anopheline populations are increasingly exposed to strong selective pressures associated with insecticide-based malaria control campaigns that have been recently intensified (World Health Organization, 2013). Such pressures represent particularly efficient driving forces that often contribute to the rapid diversification of vector populations in a few decades (Clarkson et al., 2014; Kamdem et al., Unpublished data a; Norris et al., 2015). As a result, a detailed characterization of the genomic architecture of all vectors is important for a critical appraisal of the impacts of malaria control efforts. In this framework, we set out to perform the first genome-wide investigation of natural polymorphism in An. moucheti. One of our main goals was to know to what extent assessing the genetic diversity could provide clues about the spatial distribution and help predict the environmental resilience of this species. In principle, evolutionary responses of species to human-induced or natural changes rely largely on available heritable variation, which reflects the evolutionary potential and adaptability to novel environments (Orr and Unckless, 2008). Therefore, the screening of genome-wide variation is supposed to be a sensible approach that may provide a generalized measure of evolutionary potential in species like An. moucheti for which direct ecological, evolutionary or functional tests are impossible (Harrisson et al., 2014).
Thanks to recent progresses in sequencing technology, high-resolution sequence information can be generated for virtually any living organism. These technological advances are extraordinary helpful for non-model species with limited genomic resources like mosquitoes (Ellegren, 2014). However, at the exception of Anopheles gambiae for which significant genomic studies have been carried out using high-quality sequencing data (Fontaine et al., 2015; Kamdem et al., Unpublished data a; O’Loughlin et al., 2014), the other African malaria vectors have yet to fully benefit from the explosive growth of methods for assessing genetic variation at a fine scale. These neglected vectors face a vicious cycle whereby the lack of basic genomic resources that are critical to generate high-quality sequencing information and to enable robust interpretations of natural polymorphisms greatly contributes to their marginalization. One typical example is An. moucheti, which lacks all the vital resources ranging from a laboratory strain, a reference genome assembly, and a physical or linkage map.
To start filling this gap and to shed some light on the evolutionary history and adaptive potential of this vector, we have performed a high-throughput sequencing of reduced representation libraries in 98 wild-caught individuals from Cameroon and identify thousands of RAD loci scattered throughout the genome. Using high-quality Single Nucleotide Polymorphisms (SNPs) identified within these loci, we have investigated the genetic structure of populations and scan genomes of our samples to detect footprints of local adaptation and natural selection. We found that, in our study zone, populations of An. moucheti are characterized by a great genetic diversity and extensive gene flow. We argue that this vector is particularly adapted to challenge the selective pressures imposed by vector controls and rapid environmental modifications.
2. Material and methods
2.1 Mosquito sampling and sequencing
This study included two An. moucheti populations from the Cameroonian equatorial forest. A total of 98 mosquitoes (97 adults and 1 larva) were collected in August and November 2013 from Olama and Nyabessan, respectively (Table 1). The two locations are separated by ~200 km (Fig. 1A) and are crossed respectively by the Nyong and the Ntem rivers that provide the breeding sites for An. moucheti larvae. Specimens were identified as An. moucheti moucheti using morphological identification keys (Gillies and Coetzee, 1987; Gillies and De Meillon, 1968) and a diagnostic PCR, which targets mutations on the ribosomal DNA (Kengne et al., 2007). We extracted genomic DNA using the DNeasy Blood and Tissue kit (Qiagen) for larvae and the Zymo Research MinPrep kit for adult mosquitoes. We used 10µl (~50ng) of genomic DNA to prepare double-digest Restriction-site Associated DNA libraries following a modified protocol of Peterson et al., 2012. MluC1 and NlaIII restriction enzymes were used to digest DNA of individual mosquitoes, yielding RAD-tags of different sizes to which short unique DNA sequences (barcodes and adaptors) were ligated to enable the identification of reads belonging to each specimen. The digestion products were purified and pooled. DNA fragments of around 400bp were selected and amplified via PCR. The distribution of fragment sizes was checked on a BioAnalyzer (Agilent Technologies, Inc., USA) before sequencing. The sequencing was performed on an Illumina HiSeq2000 platform (Illumina Inc., USA) (Genomic Core Facility, University of California, Riverside) to yield single-end reads of 101bp.
2.2 SNP discovery and genotyping
We used the bioinformatics pipeline Stacks v1.35 (Catchen et al., 2013) to process Illumina short reads. The program process_radtags was first used to sort the reads according to the barcodes and to trim all reads to 96bp in length by removing index and barcode sequences from the ends of the reads. Reads with ambiguous barcodes, those that did not contain the NlaIII recognition site and those with low-quality scores (average Phred score < 33) were excluded. The program ustacks was then utilized to perform a de novo assembly (i.e., the assembly of reads in “stacks” enabling the creation of consensus RAD loci without prior alignment onto a reference genome sequence) (Catchen et al., 2013, 2011) in each individual in our populations. We allowed a maximum of 2 nucleotide mismatches between stacks (M parameter in ustacks) and we required a minimum of three reads to create a stack (m parameter in ustacks). Using the cstacks program, a catalogue of loci was built to synchronize variations across all individuals in our populations. Finally, we utilized the populations program to calculate population genetic parameters and output SNPs in different formats. To avoid bias associated with less informative SNPs or possible false positive SNPs (due to sequencing or pipeline errors), only RAD loci scored in at least 70-75% of individuals were retained for further analyses.
2.3 Population genomic analyses
SNP files outputted by the populations program were used to assess the population genetic structure with a Principal Component Analysis (PCA) and a Neighbor-Joining (NJ) tree analysis using respectively the R packages adegenet and ape (Jombart, 2008; Paradis et al., 2004; R Development Core Team, 2008). We also explored patterns of ancestry and admixture among individuals in ADMIXTURE v1.23 (Alexander et al., 2009) with 10-fold cross-validation for k assumed ancestral populations (k= 1 through 6). The optimal number of clusters was confirmed using the Discriminant Analysis of Principal Component (DAPC) method, which explores the number of genetically distinct groups by running a k-means clustering sequentially with increasing numbers and by comparing different clustering solutions using Bayesian Information Criterion (BIC) (Jombart, 2008). We examined the population genetic diversity, conformity to Hardy-Weinberg equilibrium and demographic background using several statistics calculated with the populations program. Precisely, to assess the global genetic diversity per population, we calculated the overall nucleotide diversity (π) and the frequency of polymorphic sites within population. To make inferences on the demographic history and to test for departures from Hardy–Weinberg equilibrium, we used the allele frequency spectrum and the Wright’s inbreeding coefficient (FIS). To quantify the geographic and genetic differentiation between allopatric populations, we estimated the genome-wide average FST (Weir and Cockerham, 1984) on 2000 randomly selected SNPs in Genodive v1.06 (Meirmans and Van Tienderen, 2004). We also conducted an hierarchical Analysis of Molecular Variance (AMOVA) (Excoffier et al., 1992) on the same SNP set to quantify the effects of the geographic origin on the genetic variance among individuals. The statistical significance of FST and AMOVA was assessed with 10000 permutations. Finally, to have a detailed picture of the genomic architecture of divergence, we inspected the genome-wide distribution of locus-specific estimates of FST.
2.4 Identification of segregating polymorphic chromosomal inversions
In structured Anopheles populations whose ecological/genetic divergence is due to polymorphic chromosomal inversions, high values of FST are expected between divergent populations within inversion loci, a pattern consistent with local adaptation of alternative karyotypes (Ayala and Coluzzi, 2005). This is the case for most populations of the main African malaria vectors An. funestus and An. gambiae, which depict multiple inversion clines in nature (Ayala et al., 2011; Fouet et al., Unpublished data; Kamdem et al., Unpublished data a; O’Loughlin et al., 2014). In addition to scanning genomes of our individuals to identify outlier values of FST that are indicators of selection and local adaptation, we also used Linkage Disequilibrium (LD) analysis to search for the presence of LD blocks corresponding to putative inversion polymorphisms. LD (the nonrandom association of alleles at different loci) provides information about past events and is affected by local adaptation and geographical structure, the demographic history, or the magnitude of selection and recombination across the genome (Lewontin and Kojima, 1960). Notably, high LD is expected in regions bearing inversions relative to the rest of the genome because the neutral recombination rate is notoriously reduced within inversions (Kirkpatrick and Barton, 2006). Thus, assessing genome-wide patterns of LD can reveal clusters of strongly correlated SNPs (LD blocks) corresponding potentially to chromosomal inversions. The R package LDna (Kemppainen et al., 2015) allows the examination of the distinct LD network clusters within the genome of non-model species without the need of a linkage map or reference genome. We have calculated LD, estimated as the r2 correlation coefficient between all pairs of SNPs, in PLINK v1.09 (Purcell et al., 2007). To avoid spurious LD due to the strong correlation between SNPs located on the same RAD locus, we randomly selected only one SNP within each RAD locus resulting in a dataset of 2569 variants containing less than 15% missing data. LDna was then used to identify LD blocks whose population genetic structure was examined with a PCA.
3. Results
3.1 De novo assembly
In total, 518,218 unique 96-bp RAD loci were identified from de novo assembly of reads in 98 individuals. We retained 946 loci that were present in all sampled populations and in at least 75% of individuals in every population, and we identified 3027 high-quality biallelic SNPs from these loci.
3.2 Population genetic structure
First, we tested for the presence of cryptic genetic subdivision within An. moucheti with PCA, NJ trees and the ADMIXTURE ancestry model. A NJ tree constructed from a matrix of Euclidian distance using allele frequencies at 3027 genome-wide SNPs showed a putative subdivision of An. moucheti populations in two genetic clusters (Fig. S1A). The first three axes of PCA also revealed a number of outlier individuals separated from a main cluster (Fig. S1B). However, when we ranked our sequenced individuals based on the number of sequencing reads, we noticed that one of the putative genetic clusters corresponded to a group of individuals having the lowest sequencing coverage (Fig. S1 and Table S1). We excluded all these individuals and reduced our dataset to 78 individuals. We conducted a new de novo assembly and analyzed the relationship between the 78 remaining individuals at 6461 SNPs present in at least 70% of individuals using PCA, NJ trees and ADMIXTURE. Both the k-means clustering (DAPC) and the variation of the cross-validation error as a function of the number of ancestral populations in ADMIXTURE revealed that the polymorphism of An. moucheti resulted from only one ancestral population (k = 1) (Fig. 1B and 1C). PCA and NJ depicted a homogeneous cluster comprising all 78 individuals providing additional evidence of the lack of genetic or geographic structuring among populations (Fig. 1D and 1E). Unsurprisingly, the overall FST was remarkably low between populations from the two sampling locations Olama and Nyabessan (FST = 0.008, p < 0.005). Similarly, the distribution of FST values across 6461 SNPs showed a large dominance of very low FST values throughout the genome (Fig. 2). The highest per locus FST was only 0.126, while 5006 of the 6461 loci revealed FST near zero. The modest geographic differentiation was also well illustrated by a hierarchical AMOVA, which showed that the genetic variance was explained essentially by within-individual variations (99.7%). Finally, we found very low overall Wright’s inbreeding coefficient (FIS= 0.0014, p < 0.005 in Nyabessan and FIS = 0.0025, p < 0.005 in Olama) (Table 2) suggesting that allelic frequencies within both populations were in accordance with proportions expected under the Hardy-Weinberg equilibrium.
3.3 Genetic diversity and demographic history
The estimates of the overall nucleotide diversity (π = 0.0020 and π = 0.0016, respectively, in Olama and Nyabessan) (Table 2) were within the range of average values found in other African Anopheles species using RADseq approaches (Fouet et al., Unpublished data; Kamdem et al., Unpublished data (a, b); O’Loughlin et al., 2014). Notorious demographic expansions have been described in natural populations of this insect clade (Donnelly et al., 2001), and the values of π observed in An. moucheti likely reflect the level of genetic diversity of a population with large effective size. The great genetic diversity of An. moucheti was also illustrated by the percentage of polymorphic sites. Of the 6461 variant sites, 89.60% were polymorphic in Olama and 34.82% in Nyabessan (Table 2). The difference observed between the two locations can be related to the sample size (n = 19 in Nyabessan and n = 59 in Olama) or to demographic particularities that persists between the two geographic sites despite a massive gene flow. To infer the demographic history of An. moucheti, we examined the Allele Frequency Spectrum (AFS), summarized as the distribution of the major allele in one population. This approach was a surrogate to model-based methods that provide powerful examinations of the history of genetic diversity by modeling the AFS at genome-wide SNP variants, but that couldn’t be implemented here due to the lack of a reference genome assembly. The frequency distribution of the major allele p (Fig. 3) indicates that the majority of polymorphic loci are highly frequent in Olama and Nyabessan as shown by the predominance of SNPs at frequencies equal to 1. Ranges of allele frequencies are also similar in both locations (between 0.47 and 1 in Olama and between 0.34 and 1 in Nyabessan). These frequency ranges are expected for old populations at equilibrium capable of accumulating high amount of genetic diversity.
3.4 Polymorphic chromosomal inversions and local adaptation
When paracentric inversions are involved in local adaptation, high values of genetic divergence are often observed within inversion loci in natural populations. Cytogenetic analyses of the polytene chromosome of An. moucheti have identified three polymorphic chromosomal inversions within samples collected from the sites we have studied (Sharakhova et al., 2014). However, the weak overall population structure and the very low FST values we have detected throughout the genome are clear indicators of the absence of local adaptation. Interestingly, this finding also suggests that none of the polymorphic inversions described previously is actually segregating among our samples, as high values of FST are absent even within inversion loci. We provided further support to this hypothesis by performing LD analyses. First, we found a globally low LD in the An. moucheti genome (average genome-wide r2 = 0.0149) as expected in highly polymorphic populations with large effective size. We next used LDna to cluster the LD values and to identify Single Outlier Clusters (SOC) that can be associated with distinct or multiple evolutionary phenomena in the An. moucheti history. We set the parameters to collect and screen a high number of SOCs using 2569 highly filtered SNPs, which allowed us to identify 20 independent LD blocks in our samples (Fig. 4). In principle, when these blocks are associated with important events in the evolutionary history of a species, downstream analyses can reveal clear pattern reflecting the underlying process (Kemppainen et al., 2015). This has been illustrated for example by studies demonstrating that SNPs within SOCs generated by polymorphic inversions in Anopheles baimaii clearly separate the three expected karyotypes (inverted homozygotes, heterozygotes and standard homozygotes) (Kemppainen et al., 2015). We conducted downstream analyses with a PCA using SNPs identified within the SOCs. As shown in Fig S2, although individuals were occasionally spread along three PCA axes, no distinct cluster could be identified from any of the 20 SOCs. These results were consistent with the absence of segregating inversions and local adaptation in our samples and corroborated low FST values observed throughout the genome. Precisely, in our data, we couldn’t identify polymorphic inversions whose karyotype frequencies change between Olama and Nyabessan due to a differential adaption between the two sites. Some of the different SOCs identified can be associated with other processes that were not captured by our analytical approach; others are probably methodological artifacts associated with the LDna pipeline (Kemppainen et al., 2015).
4. Discussion
We have analyzed the genome-wide polymorphism and characterized some of the baseline population genomic parameters in An. moucheti, an important malaria vector in rural areas across the African rainforest. We found very little differentiation among our samples, with most of the genetic variation distributed within individuals. Although a more substantial sampling will be necessary to fully dissect the population genetic structure of this species, our finding likely reflects the current dynamic of An. moucheti populations in Cameroon. It is worth mentioning that we have surveyed a total of 28 locations across the country (Fig 1A), some of which were known from several past surveys to harbor An. moucheti populations (Antonio-Nkondjio et al., 2013, 2009, 2008, 2006, 2002; Kengne et al., 2007), but we confirmed the presence of the species in only 2 villages. Extant populations of An. moucheti are distributed in patches of favorable habitats along river networks where larval populations breed. Our results indicate that despite this apparent fragmentation, connectivity and gene flow are high among population aggregates. The weak population genetic structure of An. moucheti observed with genome-wide markers corroborated results obtained with microsatellites and allozymes (Antonio-Nkondjio et al., 2008, 2002). A survey of eight microsatellite loci revealed that the highest FST among Cameroonian populations was as low as 0.003. Nevertheless, a substantial differentiation was found between samples from different countries consistent with an isolation-by-distance model (Antonio-Nkondjio et al., 2008). It is clear that a deep sequencing of continental populations is necessary to further clarify the status of these putative subpopulations. However, samples collected at lower spatial scales like ours are also very relevant as they can allow robust inferences about ongoing selective processes that cannot be captured at continental scale. Although RADseq samples only a small fraction of the genome and certain signatures of selection are likely missing when reduce representation sequencing approaches are used, it has been shown that such approaches can effectively capture strong footprints of selection across genomes of Anopheles mosquitoes (Fouet et al., Unpublished data; Kamdem et al., Unpublished data a). We have found that signatures of selection are rare in the genome of An. moucheti populations from the Cameroonian rainforest. Populations remain largely undifferentiated throughout the genome, with FST values near zero across the vast majority of variations suggesting that no local adaptation is ongoing. This perception is further supported by the absence of segregating linkage disequilibrium blocks between geographic locations. The characterization of chromosomal inversions with cytogenetic methods can be laborious and challenging (Kirkpatrick, 2010; Sharakhova et al., 2014). So far, three paracentric polymorphic inversions have been discovered in An. moucheti in Cameroon (Sharakhova et al., 2014). The ecological, behavioral or functional roles of these inversion polymorphisms remain unknown. We have implemented a recently designed method that uses Next Generation Sequencing and LD estimates to indirectly identify paracentric inversions whose karyotype frequencies varies among populations due to local adaptation (Kemppainen et al., 2015). Our LD analyses revealed the presence of a few LD clusters that are however not associated with inversions. On the other hand, the low overall LD observed across the genome reflected the significant genetic polymorphism that seems to prevail within An. moucheti populations. This polymorphism translates into exceptional levels of overall genetic diversity and very high percentage of polymorphic sites that are in the range of values observed in other mosquito species undergoing significant demographic expansions (Donnelly et al., 2001; Fouet et al., Unpublished data; Kamdem et al., Unpublished data a). The amount of neutral genetic diversity is often viewed as a correlate of the adaptive potential of a species (Orr and Unckless, 2008). Although the relationship is more complex in reality, estimates of neutral genetic diversity are commonly used in conservation biology as an intuitive conceptual and management framework to assess the genetic resilience of endangered species (Bonin et al., 2007; Latta IV et al., 2010). Our population genomic analyses have depicted An. moucheti as a species with a great genetic diversity and hence a sustainable long-term adaptive resilience. Implications of our findings in malaria epidemiology and control can be very significant. First, An. moucheti is essentially endophilic and is particularly sensitive to the principal measures currently employed to control malaria in Sub-Saharan Africa such as the massive use of Insecticide Treated Nets (ITNs) and Indoor Residual insecticide Spraying (IRS). For example, estimates of population effective size in one village in Equatorial Guinea indicated that both mass distribution of ITNs and IRS campaigns resulted in a decline of approximately 55% of An. moucheti (Athrey et al., 2012). However, the great genetic diversity and the massive gene flow we observed within populations could easily enable this vector to challenge population declines and recover from shallow bottlenecks. Moreover, most insecticide resistance mechanisms found in insects exploit standing genetic variation to rapidly respond to the evolutionary challenge by increasing the frequency of existing variations rather than relying on infrequent de novo mutations (Messer and Petrov, 2013). As a result, despite the current sensitivity of An. moucheti to common insecticides, the significant amount of standing genetic variation provides the species with a great potential to challenge insecticide-based interventions and other types of human-induced stress.
5. Conclusions
Recent advances in sequencing allow sensitive genomic data to be generated for virtually any species (Ellegren, 2014). However, the most important information we can obtain from population resequencing approaches often depends on the availability and the quality of genomic resources such as a well-annotated reference genome. The reduced genome sequencing strategy (RADseq) offers a cost-effective strategy that can be used to effectively study the genetic variation in a broad range of species from yeast to plants, insects, etc., in the absence of a reference genome. We have extended this approach to the study of the genetic structure of an understudied mosquito species with a great epidemiological significance. We have provided both significant baseline population genomic data and the methodological validation of one approach that should motivate further studies on this species and other understudied anopheline mosquitoes lacking genomic resources.
Author contributions
Conceived and designed the experiments: CF CK BJW. Performed the experiments: CF CK SG BJW. Analyzed the data: CF CK BJW. Wrote the paper: CF CK BJW.
Competing interests
The authors declare that they have no competing interests.
Acknowledgements
Funding for this project was provided by the University of California Riverside and NIH grants 1R01AI113248 and 1R21AI115271 to BJW. We thank populations and authorities of the locations surveyed for their kind collaboration. We thank the anonymous reviewers for their careful reading of our manuscript and their many insightful comments and suggestions.
Footnotes
Email address: caroline.fouet{at}ucr.edu; bwhite{at}ucr.edu