Abstract
Urbanization presents unique environmental challenges to human commensal species. The Afrotropical Anopheles gambiae complex contains a number of synanthropic mosquito species that are major vectors of malaria. To examine ongoing cryptic diversification within the complex, we performed reduced representation sequencing on 941 mosquitoes collected across four ecogeographic zones in Cameroon. We find evidence for clear subdivision within An. coluzzii and An. gambiae s.s. – the two most significant malaria vectors in the region. Importantly, in both species rural and urban populations of mosquitoes were genetically differentiated. Genome scans of cryptic subgroups reveal pervasive signatures of selection centered on genes involved in xenobiotic resistance. Notably, a selective sweep containing eight detoxification enzymes is unique to urban mosquitoes that exploit polluted breeding sites. Overall, our study reveals that anthropogenic environmental modification is driving population differentiation and local adaptation in African malaria mosquitoes with potentially significant consequences for malaria epidemiology.
INTRODUCTION
Natural selection can drive local adaptation, increasing mean individual fitness and promoting biological diversification (Nosil, et al. 2005; Hereford 2009). Contemporary anthropogenic alteration of the landscape may be increasing pressure for local adaptation in diverse taxa (Gaston 2010). For example, the rise of urban centers over the past two centuries has presented unique challenges to human-commensal species, often necessitating the rapid evolution of resistance to pollutants and pesticides (Pelz, et al. 2005; Song, et al. 2011; Davies, et al. 2012). Successful local adaptation requires that selection overcome the homogenizing effect of gene flow from nearby populations (Kawecki and Ebert 2004). Theoretical simulations suggest that such divergence with gene flow can occur under a range of conditions (Berdahl, et al. 2015), although the most likely genomic distribution of the underlying adaptive variants remains unclear (Le Corre and Kremer 2012; Tiffin and Ross-Ibarra 2014). Studies of populations in the earliest stages of ecological divergence should help elucidate the conditions needed for local adaptation and relevant targets of natural selection (Feder, et al. 2013).
The Afrotropical Anopheles gambiae complex is a group of at least nine isomorphic mosquito species exhibiting varying degrees of geographic and reproductive isolation (White, et al. 2011; Coetzee, et al. 2013; Crawford, et al. 2014). Owing to the critical role its members play in sustaining malaria transmission, a wealth of genomic data exists on the complex including whole genome assemblies for most of the species (Holt, et al. 2002; Fontaine, et al. 2014). Because radiation of the complex began ~1.85 mya, interspecific genetic comparisons will yield little insight into the establishment of divergence with gene flow. However, both ecological and genetic evidence suggest that contemporary local adaptation and diversification is occurring within Anopheles gambiae s.s. (hereafter An. gambiae) and Anopheles coluzzii, two of the most widespread and important vectors of malaria within the complex (Slotman, et al. 2007; Wang-Sattler, et al. 2007; Lee, et al. 2013b; Caputo, et al. 2014). Up to now, shallow sampling and/or or the use of low-resolution genetic markers limited the ability to delineate new cryptic subgroups within either species.
We genotyped 941 mosquitoes collected from diverse environments in Cameroon at >8,000 SNPs and find strong evidence for ongoing diversification within both An. gambiae and An. coluzzii. In total, the two species harbor seven cryptic subgroups distributed along a continuum of genomic differentiation. While An. gambiae exhibits relatively high levels of panmixia, we did identify an ecotype associated with intense suburban agriculture and a second subgroup that appears partially reproductively isolated but exhibits no obvious ecological/geographical distinctions. In contrast, An. coluzzii is separated into multiple ecotypes exploiting different regional-scale habitats, including highly urbanized landscapes. In most cryptic subgroups, selective sweeps contain an excess of detoxification enzymes and insecticide resistance genes, suggesting that human activity mediates spatially varying natural selection in both species. The extensive population structure within both species represent an additional challenge to vector control strategies. Moreover, ongoing local adaptation and cryptic diversification of Anopheles species in human-dominated environments may contribute to increase malaria transmission.
RESULTS
Identification of An. gambiae s.l. sibling species
We performed extensive sampling of human-associated Anopheles across the main ecological zones in the central African country of Cameroon (Table S1) with the objective to collect diverse populations belonging to the four species of the An. gambiae complex that are present in the country (Simard et al. 2009). As recently shown (Riehle et al. 2011), certain cryptic subgroups can be overlooked when sampling is focused on the collection of only one type of population. Therefore, to maximize the chances that our samples best represent the genetic diversity within each species and to identify cryptic groups, we used several sampling methods (Service 1993) to collect both larvae and adult populations. In Addition, populations of An. gambiae and An. coluzzii segregate along urbanization gradients, which seem to be the most important driver of ecological divergence in the forest zone (Kamdem et al. 2012). To validate this hypothesis and to investigate the genomic targets of local adaptation in urban environments, we surveyed several neighborhoods representing the urban and suburban ecotypes in the two biggest cities of the forest area: Douala and Yaoundé (Figure S1).
To investigate the genetic relatedness among individuals and to detect any cryptic populations, we subjected all 941 mosquitoes that were morphologically identified as An. gambiae s.l. to population genomic analysis. Individual mosquitoes were genotyped in parallel at a dense panel of markers using double-digest restriction associated DNA sequencing (ddRADseq), which enriches for a representative and reproducible fraction of the genome that can be sequenced on the Illumina platform (Peterson, et al. 2012).
After aligning ddRADseq reads to the An. gambiae reference genome, we used STACKS to remove loci present in <80% of individuals, leaving 8,476 SNPs (~1 SNP every 30kb across the genome) for population structure inference (Catchen, et al. 2011; Catchen, et al. 2013). First, we performed principal component analysis (PCA) on genetic diversity across all 941 individuals (Figure 1A). The top three components explain 28.4% of the total variance and group individuals into five main clusters. Likewise, a neighbor-joining (NJ) tree, based on Euclidian distance of allele frequencies, shows five distinct clades of mosquitoes (Figure 1B). We hypothesized that these groups at least partially correspond to the four sibling species – An. gambiae, An. coluzzii, An. arabiensis, and An. melas - known to occur in Cameroon. To confirm, we typed a subset of 288 specimens using validated species ID PCRs and found that each cluster comprised a single species (Scott, et al. 1993; Santolamazza, et al. 2004). In agreement with previous surveys (Wondji, et al. 2005; Simard, et al. 2009), our collections indicate that the brackish water breeding An. melas is limited to coastal regions, while the arid-adapted An. arabiensis is restricted to the savannah. In contrast, An. gambiae and An. coluzzii are distributed across the four eco-geographic zones of Cameroon (Figure 1D). Lee and colleagues (Lee, et al. 2013a) recently reported frequent bouts of hybridization between An. gambiae and An. coluzzii in Cameroon. While both the PCA and NJ trees clearly separate the two species, the PCA does show intermixing of some rare individuals consistent with semi-permeable species boundaries.
In support of population structuring below the species level, Bayesian clustering analysis with fastSTRUCTURE (Raj, et al. 2014) finds that 7 population clusters (k) best explain the genetic variance present in our sample set (Figure 1C, Figure S2). Indeed, grouping of samples within An. gambiae and An. coluzzii clades suggests that additional subdivision may exist within each species (Figure 1A, 1B). Ancestry plots further support inference from the PCA and NJ tree: at least two subgroups compose An. coluzzii and admixture is present within An. gambiae, while An. arabiensis and An. melas are panmictic (Figure 1C, Figure S2). Riehle et al. 2011 recently discovered a cryptic subgroup by comparing indoor and outdoor fauna from the same village in Burkina Faso. Visual inspections of our PCA, NJ and fastSTRUCTURE clustering results do not indicate any genetic subdivision based on the collection methods or the developmental stage. To explicitly test for the effects of the sampling methods and the geographic origin of samples on the genetic variance among individuals, we applied a hierarchical Analysis of Molecular Variance (AMOVA) (Excoffier et al. 1992). This methods partition the genetic variance among individuals in order to quantify the effects of several variables taken at different hierarchical levels on the genetic diversity. We noted that the large majority of the genetic variation was attributable to differences among individuals (86.3% (p <0.001) in An. coluzzii and 90.4% in An. gambiae s.s. (p <0.001)). However, the geographic origin of individuals retains a significant component of the genetic variation in the two species. Respectively 13.2 % (p < 0.001) and 9.4% (p < 0.001) of the genetic variation were partitioned across collection regions nested in sample type in An. gambiae s.s. and An. coluzzii. The amount of variance due to the types of sample was very low and not significant (less than 1%, p = 0.29 for An. gambiae s.s. and p = 0.20 for An. coluzzii) implying that – as suggested by PCA, NJ and fastSTRUCTURE analyses – no genetic structuring based on microhabitats or temporal segregations is apparent in both species in Cameroon.
Cryptic Population Structure within An. gambiae s.s
To further resolve the population structure within 357 An. gambiae specimens, we performed population genetic analysis with a set of 9,345 filtered SNPs. Using a combination of PCA, NJ trees, and ancestry assignment, we consistently identify three distinct subgroups within An. gambiae (Figure 2, Figure S2). The first and largest subgroup (termed GAM1) comprises the vast majority of all An. gambiae specimens including individuals collected in all four ecogeographic regions (Table S1). A total of 17 individuals make up a second small subgroup (termed GAM2). Interestingly, individuals assigned to this cluster include both larvae and adults collected in 3 different villages spread across 2 eco-geographic regions. In the absence of any obvious evidence of niche differentiation between GAM1 and GAM2, it is unclear what is driving and/or maintaining divergence between the two sympatric subgroups. Specimens collected from Nkolondom, a suburban neighborhood of Yaoundé where larval sites associated with small-scale agricultural irrigation are common (Nwane, et al. 2013; Tene, et al. 2013), form a genetically distinct third subgroup (termed Nkolondom) that appears to be a locally-adapted ecotype.
Cryptic Population Structure within An. coluzzii
To examine population structure within 521 An. coluzzii specimens, we utilized 9,822 SNPs that passed stringent filtration. All analyses show a clear split between individuals from the northern savannah region and the southern three forested regions of Cameroon (Coastal, Forest, Forest-Savannah) (Figure 3A-C). In principle, the north-south structuring could be caused solely by differences in chromosome 2 inversion frequencies, which form a cline from near absence in the humid south to fixation in the arid north. However, we find SNPs from all five chromosomal arms consistently separate northern and southern mosquitoes, indicating a substantial genome-wide divergence between the two populations (Figure S3).
Southern populations of An. coluzzii were collected from three different areas: Douala (the largest city of Cameroon), Yaounde (the second largest city) and the rural coastal region. PCA, NJ trees, and fastSTRUCTURE show clear clustering of southern samples by collection site (Figure 3D-F). Mosquitoes from Douala, situated on the coastal border, contain a mixture of urban and coastal polymorphisms as illustrated by their intermediate position along PC3 (Figure 3D). Despite considerable geographic segregation, clusters are not fully discrete, likely owing to substantial migration between the three sites. Taken together, the data suggest a dynamic and ongoing process of local adaptation within southern An. coluzzii. In contrast, no similar geographic clustering is observed in northern populations (Figure S2, S4). Light variations can occur between genetic clustering patterns suggested by different methods. We have only considered subdivisions that were consistent across the three methods we employed. For example, PCA showed that a few individuals from Yaoundé were relatively detached from the main cluster, but this putative subdivision was not supported by the NJ tree and fastSTRUCTURE analyses (Figure 3D-F). We have therefore treated the Yaoundé subgroup as a single population. All populations described as “urban” were collected from the most urbanized areas of the city, which are characterized by a high density of built environments as described in Kamdem et al. 2012.
Relationships Between Species and Subgroups
Population genomic analysis identified four different An. gambiae s.l. species present within our samples. Within An. gambiae and An. coluzzii we identified seven potential subgroups with apparently varying levels of isolation. To further explore the relationships between different populations, we built an unrooted NJ tree using pairwise levels of genomic divergence (FST) between all species and subgroups (Figure 4, Table S2). As previously observed in a phylogeny based on whole genome sequencing (Fontaine, et al. 2014), we find that An. melas is highly divergent from all other species (FST ~ 0.8), while An. arabiensis shows intermediate levels of divergence (FST ~ 0.4) from An. gambiae and An. coluzzii. As expected, the sister species An. gambiae and An. coluzzii are more closely related to each other (FST ~ 0.2) than any other species. When examining differentiation between subgroups within An. coluzzii, we find that the southern and northern subgroups are highly divergent (FST > 0.1), while differentiation between local ecotypes within the south is much lower (FST < 0.04). The An. gambiae subgroups GAM1 and GAM2 are highly diverged (FST ~ 0.1) from each other suggesting genuine barriers to gene flow despite sympatry, while the suburban ecotype from Nkolondom shows a low level of divergence from GAM1 (FST ~ 0.05), characteristic of ongoing local adaptation. In sum, we find a gradient of differentiation between species and subgroups ranging from complete (or nearly complete) reproductive isolation down to the initial stages of divergence with gene flow. To further examine the degree of isolation of subgroups within species, we assessed the reductions in observed heterozygosity with respect to that expected under Hardy–Weinberg Equilibrium among An. coluzzii and An. gambiae s.s. populations, by computing the average Wright’s inbreeding coefficient, FIS, across genome-wide SNPs. Values of FIS close to 1 indicate a deviation from Hardy–Weinberg Equilibrium and the existence of cryptic subdivisions while FIS close to 0 suggest that there are no barriers to gene flow. In spite of the strong population genetic structure observed within An. coluzzii and An. gambiae, we found surprisingly low genome-wide FIS values (less than 0.0003, p < 0.005) in both species, suggesting a lack of assortative mating. Overall, in An. gambiae and An. coluzzii, ongoing local adaptation and genetic differentiation are parallel to high levels of admixture and extensive shared polymorphisms among individuals.
Using genome scans to identify selective sweeps
To find potential targets of selection within subgroups we performed scans of nucleotide diversity (θw, θπ) and allele frequency spectrum (Tajima’s D) using non-overlapping 150-kb windows across the genome. Scans of θw, θπ and Tajima’s D were conducted by importing aligned, but otherwise unfiltered, reads directly into ANGSD, which uses genotype likelihoods to calculate summary statistics (Korneliussen, et al. 2014). Natural selection can increase the frequency of an adaptive variant within a population, leading to localized reductions in genetic diversity as the haplotype containing the adaptive variant(s) sweeps towards fixation (Maynard Smith and Haigh 1974; Tajima 1989). Selective processes can also promote the coexistence of multiple alleles in the gene pool of a population (balancing selection). Thus, genomic regions harboring targets of recent selection should exhibit extreme values of diversity and allele frequency spectra relative to genome-wide averages (Storz 2005).
We also performed genome scans using both a relative (FST) and absolute (dxy) measure of divergence calculated with STACKS and ngsTools, respectively. If positive selection is acting on alternative haplotypes of the same locus in two populations, values of FST and dxy should increase at the target of selection. Whereas spatially varying selection that acts on one population, but not the other, should produce a spike in FST between populations and no change in dxy. Finally, parallel selection on the same haplotype in two populations should lead to a decrease in both metrics (Cruickshank and Hahn 2014). For both diversity and divergence scans we used a maximum of 40 mosquitoes per population, prioritizing individuals with the highest coverage in populations where sample size exceeded 40. In contrast to Tajima’s D and FST, the genome-wide distribution of dxy and nucleotide diversity in 150-kb sliding windows yielded relatively noisy patterns (Figure 5 and 6). As a result, we based the identification of signatures of selection primarily on outliers of Tajima’s D and FST. Estimates of dxy and nucleotide diversity were used only to confirm genomic locations that were pinpointed as candidate selective sweep on the basis of values of Tajima’s D and FST. Precisely, genomic regions were considered as targets of selection if they mapped to significant peaks or depressions of diversity and dxy, and their values of FST and Tajima’s D were among the top 1% of the empirical distribution in at least one population. Significantly negative values of Tajima’s D relative to the genome-wide average suggest an increase in low-frequency mutations due to negative or positive selection whereas significantly positive values of Tajima’s D indicate a balancing selection.
Targets of Selection within An. gambiae subgroups
Our estimates of genome-wide diversity levels (Table S3) within An. gambiae subgroups are comparable to previous estimates based on RAD sequencing of East African An. gambiae s.l. populations (O’Loughlin, et al. 2014). As expected, the large GAM1 population harbors more genetic diversity than the apparently rare GAM2 population or the geographically restricted Nkolondom ecotype (Table S3, Figure 5A-C). Tajima’s D is consistently negative across the entire genome of all three subgroups, indicating an excess of low-frequency variants that are likely the result of recent population expansion (Figure 5A-C) (Tajima 1989). Indeed, demographic models infer relatively recent bouts of population expansion in all three subgroups (Table S4).
Genome scans of each subgroup reveal genomic regions that show concordant dips in diversity and allele frequency spectrum consistent with recent positive selection (highlighted in Figure 5 A-C). Based on the 1% cutoff of Tajima’s D (upper and lower bounds) and FST, we identified 4 candidate regions exhibiting strong signatures of selection. It should be noted that due to the reduced representation sequencing approach we used, our analysis is necessarily conservative, highlighting only clear instances of selection, which are likely both recent and strong (Tiffin and Ross-Ibarra 2014). An apparent selective event on the left arm of chromosome 2 near the centromere is found in all populations. This selective sweep is characterized by a prominent depression of Tajima’s D (~1.5Mb in width) and contains ~80 genes including the pyrethroid knockdown resistance gene (kdr). Although it is difficult to precisely identify the specific gene that has been influenced by selection in this region, the voltagegated sodium channel gene (kdr) involved in insecticide resistance in insects, which is a pervasive target of selection in Anopheles mosquitoes (Clarkson et al. 2014; Norris et al. 2015) represents the strongest candidate in our populations. Interestingly, the drop in Tajima’s D in the putative kdr sweep is sharpest in the Nkolondom population, which suggests that insecticide resistance is shaping the genomic differentiation and local adaptation in An. gambiae. Indeed, Nkolondom is a suburban neighborhood of Yaoundé where larvae can be readily collected from irrigated garden plots that likely contain elevated levels of pesticides directed at agricultural pests (Nwane, et al. 2013; Tene, et al. 2013). The kdr allele of para confers knockdown resistance to pyrethroids and selective sweeps in the same genomic location have been previously identified in many An. gambiae s.l. populations (Donnelly, et al. 2009; Lynd, et al. 2010; Jones, Liyanapathirana, et al. 2012; Clarkson, et al. 2014; Norris, et al. 2015). At the kdr sweep, we observe contrasting patterns in local values of dxy and FST (Figure 5D-E). In both the GAM1-GAM2 and GAM1-Nkolondom comparisons, dxy dips while local values of FST actually increase. While not definitive, the significant drop in dxy suggests that the same resistant haplotype is sweeping through each population. Localized increases in FST could owe to differences in kdr allele frequencies between populations; despite parallel selection, the sweep may be closer to fixation in certain populations relative to others, perhaps due to differences in selection intensity.
Another region exhibiting consistent signatures of selection in all populations is found around the centromere on the X chromosome. Functional analyses of gene ontology (GO) terms (Table S5) revealed a significant representation of chitin binding proteins in this region (p = 1.91e-4). A strong genetic divergence among the three subgroups of An. gambiae characterized by significant FST and dxy peaks is also observed at ~30 Mb and ~40 Mb on chromosome 3R. Positive outliers of both Tajima’s D and nucleotide diversity suggest that this genetic divergence is due to balancing selection on multiple alleles among populations. Functional analyses of GO terms indicate that the region at ~40 Mb on 3R is enriched in cell membrane proteins, genes involved in olfaction and epidermal growth factors (EGF) genes (Table S5). Finally, a striking depression in Tajima’s D supported by a marked dip in nucleotide diversity occurs on chromosome 2L from ~33-35 Mb in the Nkolondom population exclusively (Figure 5A-C). Despite the lack of genetic differentiation, this region é enriched in six EGFs (Table S5) – is probably a recent selective sweep, which could facilitate larval development of this subgroup in pesticide-laced agricultural water.
Targets of Selection within An. coluzzii subpopulations
As above, we used diversity, allele frequency spectra and genetic differentiation metrics to scan for targets of selection in the four subgroups of An. coluzzii. Overall, genetic diversity is higher in the northern savannah population than either of three southern populations, which all exhibit similar levels of diversity (Table S3). Just as in An. gambiae, all subgroups have consistently negative Tajima’s D values confirming demographic models of population expansion (Table S4). Based on the 1% threshold of Tajima’s D and FST, we found 5 putative selective sweeps in An. coluzzii populations including the kdr region, the sweep on the X chromosome, and the two hot spots of balancing selection detected on the chromosome 3R in An. gambiae (Figure 6A-D). The fifth putative selective sweep characterized by a sharp drop in both diversity and Tajima’s D occurs on 3R from ~28.5-29.0 Mb, with the decline being more significant in urban populations. Geographical limitation of the sweep to urban mosquitoes strongly suggests it may contain variant(s) that confer adaptation to extreme levels of anthropogenic disturbance. Indeed, this genomic region harbors a cluster of both Glutathione S-transferase (GSTE1-GSTE7) and cytochrome P450 (CYP4C27, CYP4C35, CYP4C36) genes, and functional analyses of GO terms reveals an overrepresentation of terms containing “Glutathione S-transferase” (Table S5, p = 5.14e-10). Both the GSTE and cytochrome P450 gene families are known to confer metabolic resistance to insecticides and pollutants in mosquitoes (Enayati, et al. 2005; David, et al. 2013; Nkya, et al. 2013). In particular, GSTE5 and GSTE6 are intriguing candidate targets of selection as each is up-regulated in highly insecticide resistant An. arabiensis populations that recently colonized urban areas of Bobo-Dioulasso, Burkina Faso (Jones et al. 2012).
As in An. gambiae, we also detected multiple regions that could be targets of selection, but were less well supported because only one of the metrics (FST or Tajima’s D) was above the 99th percentile of the empirical distribution. We found ~30 regions where significant FST peaks were not correlated to exceptional values of Tajima’s D and vice versa. This included at least 10 hotspots of FST clustered within the 2La inversion, which segregates between forest and savannah populations (Figure 6H). Most notably, at ~25 Mb on 2L, a region centered on the resistance to dieldrin (rdl) locus, large dips in Tajima’s D are evident in all southern groups. In the northern savannah population, a pronounced dip in diversity occurs at this putative sweep, but Tajima’s D stays constant. This region contains ~40 genes, but just as with the kdr gene, the rdl locus is arguably the prime candidate target of selection. This gene plays a key role in insensitivity to insecticides (Ffrench-Constant et al. 2004), and studies have confirmed the presence of footprints of selection around this locus in An. gambiae s.l. populations (Lawniczak et al. 2010; Crawford et al. 2015).
The increased use of pesticides/insecticides in agriculture and vector control imposes an unprecedented adaptive challenge to mosquito populations (Bøgh et al. 1998; Moiroux et al. 2012; Mwangangi et al. 2013; Clarkson et al. 2014; Norris et al. 2015). As a result, both selection and adaptive introgression are acting at the scale of a few decades around loci that provide selective advantage against pesticides (Clarkson et al. 2014; Norris et al. 2015). Our findings indicate that the genetic response to this challenge is also spreading across multiple loci leaving sharp signatures of selection around clusters of detoxification enzymes and major insecticide resistance genes in An. gambiae and An. coluzzii populations. As expected, An. coluzzii populations that are exposed to particularly high levels of insecticides/pollutants in human-dominated environments (Kamdem et al. 2012; Fossog Tene et al. 2013; Tene Fossog et al. 2013) are more enriched in genomic regions bearing signatures of human-driven selection. Both relative and absolute divergences between populations at the three selective sweeps involved in xenobiotic resistance reflect the spatial variation of selection along gradients of anthropogenic disturbance. In particular, the kdr locus exhibits minimal divergence in all pairwise comparisons, suggesting that the same resistance haplotype is under selection in each population (Figure 6E-H). In contrast, the region surrounding the rdl gene shows low FST and a pronounced dip in dxy between all southern populations, confirming that the same haplotype is sweeping through these three populations. However, differentiation between southern and northern populations at rdl may be obscured by the high divergence between alternative arrangements of the 2La inversion. Finally, the urban-centric GSTE/CYP450 sweep on 3R shows a peak in FST between Yaoundé and Coastal mosquitoes and minimal change in dxy – a pattern consistent with local adaptation. Comparisons between Douala and Coastal populations show a more moderate increase in FST, presumably due to high rates of mosquito migration between these nearby sites. The slight bump in FST is coupled to a large dip in dxy indicative of an ongoing, shared selective sweep between the two cities.
To further explore the 3R GSTE/CYP450 sweep, we reconstructed haplotypes for all 240 An. coluzzii southern chromosomes across the 28 SNPs found within the sweep. In the Yaoundé population, a single haplotype is present on 44 out of 80 (55%) chromosomes (all grey SNPs), while an additional 11 haplotypes are within one mutational step of this common haplotype (Figure 7A). In Douala, the same haplotype is the most common, but present at a lower frequency (31%) than in Yaoundé (Figure 7B). Strikingly, this haplotype is found on only 6/80 (7.5%) coastal chromosomes (Figure 7C). The overall low nucleotide variation and high frequency of a single haplotype in Yaoundé is consistent with positive selection acting on a de novo variant(s) to generate the 3R GSTE/CYP450 sweep. Less intense selection pressure in Douala, and particularly the Coast, would explain the markedly higher haplotype diversity in these two populations relative to Yaoundé. It is also possible that Douala mosquitoes experience similar selection pressures to Yaoundé mosquitoes, but frequent migrant haplotypes from the nearby rural Coast populations decrease the efficiency of local adaptation. Importantly, multiple population genomic analyses of the same 28 SNPs (Figure 7D-F) mirror results of the haplotype analysis, confirming that haplotype inference did not bias the results. In sum, we hypothesize that divergence in xenobiotic levels between urban and rural larval habitats is the main ecological force driving spatially variable selection at this locus.
DISCUSSION
Population genetic structure and cryptic subdivisions within the An. gambiae complex
The Anopheles gambiae complex, as a model of adaptive radiation with a puzzling evolutionary history, has been recognized as a unique portal into the genetic architecture of ecological speciation (Coluzzi et al. 2002; Ayala and Coluzzi 2005). This system has however been refractory to traditional genetic mapping methods, due mainly to the lack of observable phenotypes that segregate between populations. Recently, patterns of genomic divergence have started to be dissected thanks to the application of high-throughput sequencing and genotyping methods, and significant insights have been gained into the genomic targets of selection among populations at continental scale (Lawniczak et al. 2010; Neafsey et al. 2010; White et al. 2011). Here we have applied a population genomic approach to investigate the genomic architecture of selection at the scale of one country. We showed that reduced representation sequencing of 941 An. gambiae s.l. collected in or near human settlements in 33 sites scattered across Cameroon facilitated rapid identification of known sibling species and revealed multiple instances of novel cryptic diversification within An. gambiae and An. coluzzii. This result is opposite to that found in East Africa, where RADseq markers revealed no population genetic structure within species of the An. gambiae complex (O’Loughlin et al. 2014). Historically, West African populations have a greater tendency to be differentiated (Coluzzi et al. 1985; Coluzzi et al. 2002), and the presence of cryptic subdivisions within An. coluzzii has already been suspected in Cameroon (Wondji et al. 2005; Slotman et al. 2007). As revealed by genome-wide SNPs, these subdivisions result in at least three genetically distinct clusters: a strongly differentiated population confined to the arid savannah area, a coastal subgroup presumably adapted to tolerate high concentrations of salt (Tene Fossog et al. 2015) and a cluster encompassing urban populations that are known to thrive in breeding sites containing high levels of organic waste and present a more complex insecticide resistance profile (Antonio-Nkondjio et al. 2011; Fossog Tene et al. 2013; Tene Fossog et al. 2013; Antonio-Nkondjio et al. 2015). The modelling of ecological niches of An. gambiae s.l. in Cameroon predicts that favourable habitats of An. coluzzii populations are fragmented and much more marginal landscapes in contrast to An. gambiae, which occupies a broader environmental niche across the country (Simard et al. 2009). In line with this prediction, genome-wide SNPs indicate a weak genetic differentiation and high genetic diversity in An. gambiae over large geographic areas. Nevertheless, despite this broad geographic connectivity and extensive gene flow among populations, ongoing local adaption results in the emergence of geographic clusters in suburban areas.
It has been hypothesized that the actual number and the diversity of malaria vector species across the African continent are largely underestimated because of the limited power of morphological and genetic markers employed so far (Stevenson et al. 2012). As a result, genome-wide studies and extensive sampling are supposed to lead to the discovery of unknown species. Among the 941 An. gambiae s.l. we sequenced, we instead found a clear match between the species identified by both morphological observations and PCR of the ribosomal DNA and those suggested by thousands of SNPs scattered throughout the genome. Furthermore, the intensive use of insecticide-treated bed nets has triggered complex behavioural adaptations and changes in species distribution that may ultimately lead to splits and the creation of cryptic populations within species (Bøgh et al. 1998; Derua et al. 2012; Moiroux et al. 2012; Mwangangi et al. 2013; Sokhna et al. 2013). For example, populations that are evolving to bite outdoor or earlier at night to escape bed nets have an increased likelihood to differentiated into distinct gene pools (Bøgh et al. 1998; Riehle et al. 2011; Moiroux et al. 2012). Using a comprehensive sampling of larvae and adults at different time points during diurnal and nocturnal activities, in or near human settlements, we can conclude that no genetic clustering beyond the regional-scale subdivisions we described is apparent over the scale of our study.
Genomic signatures of selection
Populations depicting increasing levels of genetic differentiation along a speciation continuum are ideal to investigate the targets of selection at early stages of ecological divergence (Savolainen et al. 2013). We have scanned genomes of more or less divergent populations of An. gambiae and An. coluzzii using several divergence and diversity metrics in order to identify outlier regions, which likely contain factors involved in ecological divergence and/or reproductive isolation. In principle, in weakly differentiated populations such as the subgroups we described for which neutral and selective processes have yet to shape the genomic architecture, signatures of selection are often clustered within a few regions of the genome (Nosil and Feder 2012; Andrew and Rieseberg 2013). Indeed, we found that footprints of natural selection in structured populations of An. gambiae and An. coluzzii occur across a few loci enriched in genes whose functions include insecticide resistance and detoxification, epidermal growth, cuticle formation and olfaction. Although some targets of selection are likely missing because of our experimental approach (Arnold et al. 2013; Tiffin and Ross-Ibarra 2014), previous studies on ecological and phenotypic divergence in An. gambiae s.l. suggest that footprints of selection we identified are particularly relevant. First, the cuticle plays a major role at the interface between several biological functions in insects and cuticular proteins are extremely diverse within and among species (Vannini et al. 2014). Importantly, certain cuticular proteins are associated with resistance to insecticides by contributing to a thicker cuticle in Anopheles (Wood et al. 2010; Vannini et al. 2014). Olfaction also mediates a wide range of both adult and larval behaviors in blood-feeding mosquitoes (Bowen 1991; Takken and Verhulst 2013). A large family of olfactory receptors has been characterized including candidate chemosensory genes directly involved in the response to cues that are required for feeding, host preference, and mate selection in An. gambiae s.l (Carey et al. 2010; Liu et al. 2010; Rinker et al. 2013). Finally, malaria vectors of the An. gambiae complex are among the most synanthropic insects in the world, which has led to the hypothesis that human-driven selection is one of the main modulators of ecological divergence between and within species (Coluzzi et al. 2002; Kamdem et al. 2012). Consistent with this hypothesis, we have found that genomic regions harbouring genes involved in resistance to insecticides and pollutants are the dominant targets of selection in emerging subgroups adapted to urban areas. Although more direct implications of genes within these selective sweeps will ultimately be necessary to validate the role of human mediated selection in local adaptation, our data provide a genomic perspective on the interactions between human actions and the contemporary evolution of a mosquito species.
Anthropogenic Mediated Selection
Human activity has altered the evolutionary trajectory of diverse taxa. In insects, spatially varying intensity of insecticide application can drive divergence between populations, potentially leading to reproductive isolation. While a plausible scenario, scant empirical evidence supports the hypothesis (Chen, et al. 2012). Previous studies have documented significant reductions in the population size of An. gambiae s.l. after introduction of long lasting insecticide treated nets (LLINs), but did not determine the influence of exposure on population structuring (Bayoh, et al. 2010; Athrey, et al. 2012). Among Cameroonian populations of both An. gambiae and An. coluzzii, we find a pervasive signature of selection at the para sodium channel gene. We infer that this sweep confers globally beneficial resistance to LLINs, which are ubiquitous in Cameroon and treated with pyrethroids that target para (Bowen 2013). In contrast, selective sweeps centered on other insecticide resistance genes are restricted to specific geographic locations/populations. For example, a sweep at the rdl locus is limited to southern populations of An. coluzzii. Initial selection for dieldrin resistance likely occurred during massive indoor residual spraying campaigns conducted by the WHO in southern Cameroon during the 1950s. Indeed, the spraying was so intense that it temporarily eliminated An. gambiae s.l. from Yaoundé (and likely other locations in the forest region) (Livadas, et al. 1958). However, due to high human toxicity, dieldrin has been banned for use in mosquito control since the mid-1970s. In the absence of insecticide exposure, resistant rdl mosquitoes are significantly less fit than wild type mosquitoes (Rowland 1991a, b; Platt, et al. 2015), making the continued persistence of resistant alleles in southern An. coluzzii populations puzzling. One plausible explanation is that other cyclodienes targeting rdl, such as fipronil and lindane, are still commonly used in agriculture and may frequently runoff into An. coluzzii larval habitats, imposing strong selection for resistant mosquitoes. A similar phenomenon was recently proposed to explain the maintenance of resistance rdl alleles in both Culex and Aedes mosquitoes (Tantely, et al. 2010).
Mosquitoes inhabiting Cameroon’s two major cities, Yaoundé and Douala, provide a clearer example of how xenobiotic exposure can directly influence population structure. Both cities have seen exponential human population growth over the past 50 years, creating a high concentration of hosts for anthropohilic mosquitoes. Despite elevated levels of organic pollutants and insecticides in urban relative to rural larval sites, surveys show substantial year-round populations of An. gambiae and An. coluzzii in both cities (Antonio-Nkondjio, et al. 2011; Kamdem, et al. 2012; Antonio-Nkondjio, et al. 2014). Bioassays of insecticide resistance demonstrate that urban mosquitoes have significantly higher levels of resistance to multiple insecticides compared to rural mosquitoes (Nwane, et al. 2009; Antonio-Nkondjio, et al. 2011; Nwane, et al. 2013; Tene, et al. 2013; Antonio-Nkondjio, et al. 2015). In support of human mediated local adaptation, we find a selective sweep in urban An. coluzzii mosquitoes centered on a cluster of GSTE/CYP450 detoxification genes. While the specific ecological driver of the selective sweep is unknown, GSTE and P450 enzymes detoxify both organic pollutants and insecticides (Suwanchaichinda and Brattsten 2001; David, et al. 2010; Poupardin, et al. 2012). Indeed, the synergistic effects of the two types of xenobiotics could be exerting intense selection pressure for pleiotropic resistance in urban mosquitoes (Mueller, et al. 2008; David, et al. 2013; Nkya, et al. 2013). Regardless of the underlying targets of selection, it is clear that mosquitoes inhabiting highly disturbed urban and suburban landscapes are genetically differentiated from rural populations. Further analysis of specific sweeps using a combination of whole genome resequencing and emerging functional genetics approaches (e.g. CRISPR/Cas9) should help resolve the specific targets of local adaptation in urban mosquitoes, while also shedding light on the evolutionary history of the enigmatic subgroup GAM2.
Impacts on Vector Control
Just five decades ago, there was not a single city in Sub-Saharan African with a population over 1 million; today there are more than 40. Population shifts to urban areas will only continue to increase with the United Nations estimating that 60% of Africans will live in large cities by 2050 (United Nations 2014). When urbanization commenced, it was widely assumed that malaria transmission would be minimal because rural Anopheles vectors would not be able to complete development in the polluted larval habitats present in cities (Donnelly, et al. 2005). However, increasingly common reports of endemic malaria transmission in urban areas across Sub-Saharan Africa unequivocally demonstrate that anophelines are exploiting the urban niche (Robert, et al. 2003; Keiser, et al. 2004; De Silva and Marshall 2012). Specifically, our study shows that An. gambiae s.l. from the urban and suburban centers of southern Cameroon form genetically distinct subgroups relative to rural populations. Local adaptation to urban environments is accompanied by strong selective sweeps centered on putative xenobiotic resistance genes, which are likely driven by a combination of exposure to organic pollutants and insecticides in larval habitats. The rapid adaptation of Anopheles to the urban landscape poses a growing health risk as levels of resistance in these populations negate the effectiveness of almost all commonly used insecticides. Moreover, repeated instances of beneficial alleles introgressing between An. gambiae s.l. species make the emergence of highly resistant subgroups even more troubling (Weill, et al. 2000; Clarkson, et al. 2014; Crawford, et al. 2014; Fontaine, et al. 2014; Norris, et al. 2015). In essence, urban populations can serve as a reservoir for resistance alleles, which have the potential to rapidly move between species/populations as needed. Clearly, sustainable malaria vector control, urban or otherwise, requires not only more judicious use of insecticides, but also novel strategies not reliant on chemicals. Towards this goal, various vector control methods that aim to replace or suppress wild mosquito populations using genetic drive are currently under development (e.g. (Windbichler, et al. 2011)). While promising, the complexities of ongoing cryptic diversification within African Anopheles must be explicitly planned for prior to the release of transgenic mosquitoes.
MATERIALS AND METHODS
Mosquito collections
In 2013, we collected Anopheles from 33 locations spread across the four major ecogeographic regions of Cameroon (Table S1). Indoor resting adult mosquitoes were collected by pyrethrum spray catch, while host-seeking adults were obtained via indoor/outdoor human-baited landing catch. Larvae were collected using standard dipping procedures (Service 1993). All researchers were provided with malaria chemoprophylaxis throughout the collection period. Individual mosquitoes belonging to the An. gambiae s.l. complex were identified by morphology (Gillies and De Meillon 1968; Gillies and Coetzee 1987).
ddRADseq Library Construction
Genomic DNA was extracted from adults using the ZR-96 Quick-gDNA kit (Zymo Research) and from larvae using the DNeasy Extraction kit (Qiagen). A subset of individuals were assigned to sibling species using PCR-RFLP assays that type fixed SNP differences in the rDNA (Fanello, et al. 2002). Preparation of ddRAD libraries largely followed (Turissini, et al. 2014). Briefly, ~1/3rd of the DNA extracted from an individual mosquito (10μl) was digested with MluC1 and NlaIII (New England Biolabs). Barcoded adapters (1 of 48) were ligated to overhangs and 400 bp fragments were selected using 1.5% gels on a BluePippin (Sage Science). One of six indices was added during PCR amplification. Each library contained 288 individuals and was subjected to single end, 100 bp sequencing across one or two flow cells lanes run on an Illumina HiSeq2500.
Raw sequence reads were demultiplexed and quality filtered using the STACKS v 1.29 process_radtags pipeline (Catchen, et al. 2011; Catchen, et al. 2013). After removal of reads with ambiguous barcodes, incorrect restriction sites, and low sequencing quality (mean Phred < 33), GSNAP was used to align reads to the An. gambiae PEST reference genome (AgamP4.2) allowing up to five mismatches per read. After discarding reads that perfectly aligned to more than one genomic position, we used STACKS to identify unique RAD tags and construct consensus assemblies for each. Individual SNP genotypes were called using default setting in the maximum-likelihood statistical model implemented in the STACKS genotypes pipeline.
Population Genomic Analysis
Population genetic structure was assessed using the SNP dataset output by the populations program of STACKS. We used PLINK v 1.19 to retrieve subsets of genome-wide SNPs as needed (Purcell, et al. 2007). PCA, neighborjoining tree analyses, and Bayesian information criterion (BIC) were implemented using the packages adegenet and ape in R (Paradis et al. 2004; Jombart 2008; R Development Core Team 2014). Ancestry analyses were conducted in fastSTRUCTURE v 1.0 (Raj, et al. 2014) using the logistic method. The choosek.py script was used to find the appropriate number of populations (k); in cases where a range of k was suggested, the BIC-inferred number of clusters was chosen. CLUMPP v1.1.2 (Jakobsson & Rosenberg 2007) was used to summarize assignment results across independent runs and DISTRUCT v1.1 (Rosenberg 2004) was used to visualize ancestry assignment of individual mosquitoes. We used a subset of 1,000 randomly chosen SNPs to calculate average pairwise FST between populations in GENODIVE v 2.0 using up to 40 individuals – prioritized by coverage – per population (Meirmans and Van Tienderen 2004). Using this same subset of 1,000 SNPs, we conducted an AMOVA to quantify the effect of the sampling method and the geographic origin on the genetic variance among individuals in GENODIVE. We used 10,000 permutations to assess significance of FST values and AMOVA. We input pairwise FST values into the program Fitch from the Phylip (Plotree and Plotgram 1989) suite to create the population-level NJ tree. FIS values were computed with the populations program in STACKS.
Genome Scans for Selection
We used ANGSD v 0.612 (Korneliussen, et al. 2014) to calculate nucleotide diversity (θw and θπ) and Tajima’s D in 150-kb non-overlapping windows. Unlike most genotyping algorithms, ANGSD does not perform hard SNP calls, instead taking genotyping uncertainty into account when calculating summary statistics. Similarly, absolute divergence (dxy) was calculated using ngsTools (Fumagalli, et al. 2014) based on genotype likelihoods generated by ANGSD. Kernel smoothed values for 150-kb windows for all four metrics (θw, θπ, D, dxy) were obtained with the R package KernSmooth. FST (based on AMOVA) was calculated with the populations program in STACKS using only loci present in 80% of individuals. A Kernel smoothing procedure implemented in STACKS was used to obtain FST values across 150-kb windows. Because regions with unusually high or low read depth can yield unreliable estimates of diversity and divergence parameters due to the likelihood of repeats and local misassembly, we checked that the average per-locus sequencing coverage was consistent throughout the genome (Figure S5). To determine if selective sweeps were enriched for specific functional annotation classes, we used the program DAVID 6.7 with default settings (Huang, et al. 2008). We physically delimitated the selective sweep as the region corresponding to the base of the peak or the depression of Tajima’s D. Haplotypes across the GSTE/CYP450 sweep were reconstructed by PHASE v 2.1.1 using the default recombination model (Stephens, et al. 2001; Stephens and Scheet 2005).
AUTHOR CONTRIBUTIONS
Conceived and designed the experiments: CK BJW. Performed the experiments: CK BJW SG. Analyzed the data: CK CF BJW. Wrote the paper: CK CF BJW.
SUPPLEMENTRY INFORMATION
Consistently negative Tajima’s D across all subgroups may reflect recent population expansions. To further address this hypothesis we modeled the demographic history of each population using a diffusion-based approach implemented in the software package ∂a∂i v 1.6.3 (Gutenkunst et al. 2009). We fit four alternative demographic models (neutral, growth, two-epoch, bottle-growth), without migration or recombination, to the folded allele frequency spectrum of each cryptic subgroup of An. gambiae s.s. and An. coluzzii. The best model was selected based on the highest composite log likelihood, the lowest Akaike Information Criterion (AIC), and visual inspection of residuals. As the choice of model can be challenging in recently diverged populations, we prioritized the simplest model when we found it difficult to discriminate between conflicting models. To obtain uncertainty estimates for the demographic parameters we used the built-in bootstrap function implemented in dadi to derive 95% bootstrap confidence intervals.
Results indicate that GAM1, GAM2, and Savannah populations have experienced recent size increases. However, for the southern populations of Yaounde, Coast, Douala, and Nkolondom the best demographic model is a bottle-growth (Table S4). While most classical studies report An. gambiae s.l. populations that are in expansion (Donnelly et al. 2001), a more recent study employing RAD markers revealed that some East African populations have more complex demographic histories, often involving several changes in effective population size (Ne) as we observed in southern forest populations of both An. coluzzii and An. gambiae. It has also been shown that Anopheles mosquitoes can experience drastic declines in Ne due to insecticidal campaigns (Athrey et al. 2012). Such events affect demographic parameters and could be a plausible explanation for the difficulty we encountered in distinguishing between bottle-growth and two-epoch models in some populations.
ACKNOWLEDGEMENTS
This work was supported by the University of California Riverside and National Institutes of Health (1R01AI113248, 1R21AI115271 to BJW). We thank Elysée Nchoutpouem and Raymond Fokom for assistance collecting mosquitoes and Sina Hananian for assisting in DNA extraction.