Abstract
The evolution of genetic sex determination in eukaryotes is often accompanied by the morphological and genetic differentiation (heteromorphy) of sex chromosomes. Sex determination systems are of particular interest in insect vectors of human pathogens like mosquitoes, for which novel control strategies aim to convert pathogen-transmitting females into non-biting males, or rely on accurate sexing for the release of sterile males. In the major arbovirus vector, the mosquito Aedes aegypti, sex determination is thought to be governed by a dominant male-determining locus (M-locus) spanning only a small portion of an otherwise homomorphic chromosome 1. Here, we provide evidence that the Ae. aegypti sex-determining chromosome is differentiated between males and females over a region considerably larger than the M-locus, showing the features of an XY chromosomal system despite the apparent homomorphy. In laboratory F2 intercrosses, we could not detect recombination events in F1 males along at least 28% of the physical length of chromosome 1, corresponding to 62% of its cytogenetic length. Sex-specific distortions from the expected genotype ratios in the F2 progeny were consistent with the XY system and were not found on distal parts of chromosome 1 or on the other two chromosomes. The same chromosomal region showed substantial genetic differentiation between males and females in unrelated wild populations from Australia and Brazil, pointing to the commonality of these chromosomal features in Ae. aegypti. Our discovery of cryptic sex-chromosome differentiation in Ae. aegypti has important implications for linkage mapping studies, for analyses of population structure, and for the crossing practices to randomize the genetic background of populations in mosquito control strategies.
Author summary
Sex is genetically determined in many species, but the genetic mechanisms underlying sex determination can evolve rapidly even among closely related species. Sex determination is of particular interest in mosquito vectors of human pathogens, for which novel control strategies aim to convert pathogen-transmitting females into non-biting males, or rely on accurate sexing for the release of sterile males. Whereas anopheline malaria vectors have fully differentiated XY sex chromosomes, culicine mosquitoes such as the major arbovirus vector Aedes aegypti have a homologous pair of sex chromosomes that are morphologically identical. Until now, Ae. aegypti sex chromosomes were also considered genetically undifferentiated with the exception of a narrow sex-determining locus. Our study demonstrates substantial genetic divergence of the sex chromosome between males and females over a much larger region than the sex-determining locus, showing the features of an XY chromosomal system despite the similar morphology. This finding has several practical implications for genetic studies of Ae. aegypti and calls for investigations of equivalent features of an XY chromosomal system in other culicine mosquito species.
Introduction
Sex is nearly universal to eukaryotic life, yet there is an enormous array of sex-determination mechanisms even between closely related species, suggesting their potential to evolve rapidly [1–⇓3]. In many taxa where sex is genetically determined, sex-determining genes co-segregate with an entire chromosome that often evolves into morphologically and genetically distinct (heteromorphic) sex chromosomes [2,4,5]. Molecular and cytological analyses have revealed diverse evolutionary histories and stages of sex chromosomes across taxonomic groups, but our understanding of the evolutionary forces driving such a diversity of sex-determination systems remains limited [6]. Flies and mosquitoes (order Diptera) represent one of the most studied invertebrate groups, where frequent transitions in sex-chromosome structure and identity have been documented [7, 8]. The underlying mechanisms and evolutionary dynamics of sex determination are of particular interest in mosquitoes given that only females take blood meals and transmit human pathogens such as malaria parasites and dengue viruses. For instance, some novel strategies for controlling mosquito-borne diseases aim to convert pathogen-transmitting females into non-biting males [9], or rely on accurate sexing for the release of sterile males [10, 11].
Aedes and Culex mosquitoes in the Culicinae subfamily have homomorphic sex-determining chromosomes, which is considered the ancestral state of this character in the mosquito family [12]. Anopheles mosquitoes in the Anophelinae subfamily, however, have acquired fully differentiated X and Y sex chromosomes [8, 12]. Why heteromorphy of sex chromosomes evolved in some mosquito lineages but not others remains unclear. Evolutionary models suggest progression of autosomes into heteromorphic sex chromosomes after the acquisition of a sex-determining locus [13, 14]. The selective advantage of linkage between sex-determining genes and sexually antagonistic genes promotes initial suppression of recombination between homologous chromosomes, followed by expansion of the nonrecombining region [15]. An evolving pair of neo-sex chromosomes further differentiates through changes in gene content, gene decay and epigenetic modifications [4]. Yet, recent analyses of fly genomes revealed complex evolutionary trajectories where sex chromosomes have been gained, lost, replaced and rearranged multiple times over the Dipteran evolutionary history [7].
Aedes aegypti is the main vector of dengue, yellow fever and Zika viruses worldwide. It has a large genome of 1.39 million base pairs (Mbp) and, like other mosquitoes of the Culicinae subfamily, does not have morphologically distinct sex chromosomes [16]. Sex determination in Ae. aegypti is controlled by a locus of the smallest chromosome (chromosome 1) called the M-locus, with a dominant (M) allele conferring the male phenotype [17]. A male-specific gene named nix and behaving as an M-factor was recently identified within the M-locus [9], along with other male-biased (M-linked) sequences that are present almost exclusively in the male genome [9, 18]. The sex-determination locus resides in a non-recombining region mapped to band 1q21 on the q arm of chromosome 1 [19], outside which recombination is thought to occur in an autosome-like fashion, maintaining the overall homomorphic structure of this chromosome [18]. Motara and Rai [20] proposed a nomenclature to define chromosome 1 with the M-locus as the M-chromosome, and the copy without the M-locus as the m-chromosome. Interestingly, they noticed some cytological differences consistent with clear differentiation between chromosomal regions where the M-locus and the m-locus are located [20].
Here, we provide genetic evidence that chromosome 1 in Ae. aegypti is differentiated over a region much larger than the M-locus, showing features of an XY chromosomal system despite the apparent homomorphy. Specifically, while carrying out linkage mapping intercrosses to locate quantitative trait loci (QTL) of dengue vector competence, we could not detect recombination events in F1 males along at least 38% of the physical length of chromosome 1. Sex-specific distortions from the expected genotype ratios in the F2 progeny were consistent with the XY system and were not found on distal parts of chromosome 1 or on the other two chromosomes. Analyses of genomic variation in unrelated wild Ae. aegypti populations provided further evidence for substantial differentiation between the M-and m-chromosomes. These findings have important implications for the mapping and population genetic analyses of Ae. aegypti, as well as for some strategies of mosquito-borne disease prevention.
Results
Intercrosses reveal low recombination along a large region of chromosome 1 in males
We initially delineated a genomic region of reduced recombination between the M-and m-chromosomes in male meiosis using a smaller F2 intercross, referred to as Cross #1 hereafter. Twenty-two F2 males and 22 F2 females from an isofemale line originating from Kamphaeng Phet, Thailand, were genotyped using double-digest RAD sequencing. Cross #1 generated a total of 347,080 single nucleotide polymorphism (SNP) markers, of which 1,010 were selected for linkage analysis based on the following criteria: (i) full informativeness (i.e., F0 founders homozygous for alternative alleles), and (ii) sequencing depth ≥12×.
We ordered a total of 102 markers unambiguously mapped to a single position on a linkage map generated de novo from Cross #1 with both sexes combined, spanning 372.2 centiMorgans (cM). The Cross #1 linkage map consisted of three linkage groups of 25, 36 and 41 markers, covering 83.4, 209.7 and 79.0 cM for chromosomes 1, 2 and 3 respectively (Table 1). The average spacing between markers on chromosomes 1, 2 and 3 was 3.5, 6 and 2 cM, respectively. Linkage group assignments of supercontigs were generally in agreement with a previously published chromosome map [19, 21] (Figure 1A). A total of ten supercontigs (9.8%) were assigned to different chromosomes between the Cross #1 linkage map and the published chromosome map, and a few discrepancies in the linear order of supercontigs within the same chromosome were detected.
We analyzed deviations of genotype frequencies from the expected Mendelian ratios (i.e., segregation distortion) in F2 individuals using a χ2 test. In Cross #1, we observed significant deviations from the expected 1:2:1 segregation ratio for sections of chromosomes 1 and 2, but only chromosome 1 contained a set of markers with a different pattern of segregation distortion in males versus females. Specifically, on chromosome 1 we detected an absence of AA genotypes in F2 females that was mirrored by an absence of BB genotypes in F2 males (Figure 1A). This sex-specific segregation distortion is expected in the vicinity of the sex-determining locus because sex-linked markers co-segregate with the M-locus during meiosis in F1 males. For such sex-linked markers, F0 paternal A alleles preferentially segregate in F2 males (Supplementary figure 1). We detected a complete absence of AA genotypes (i.e., analogous to the lack of XY pairs) in all 22 F2 females at 6 markers spanning 4.6 cM, and an absence of BB genotypes (i.e., analogous to the lack of XX pairs) in all 22 F2 males at 13 markers spanning 25.8 cM on chromosome 1. Based on the Cross #1 linkage map, markers in the region showing reduced recombination in males spanned over 12.0 Mbp, representing 28% of the cumulative length of all supercontigs assigned to chromosome 1.
Because the probability to detect low-frequency recombinants increases with larger sample size, we further analyzed 197 F2 females from an independent F2 intercross, referred to as Cross #2 hereafter, which was initially carried out to locate QTL for dengue vector competence. There was a high degree of synteny between the linkage maps generated from Cross #1 and from Cross #2 (Supplementary figure 2). Misassemblies identified with both linkage maps are reported in Supplementary files 1 and 2. In Cross #2 again, we recorded a complete absence of AA genotypes in all 197 F2 females at 12 markers spanning 36.8 cM on chromosome 1 (Figure 1B). Cross #2 assigned 10 additional supercontigs representing 16.8 Mbp to the region of reduced recombination in males compared to Cross #1 (Supplementary file 3). Based on the synteny with the published chromosome map [19], the estimated size of the region showing reduced recombination in males in both Cross #1 and Cross #2 (highlighted in yellow in Figure 1) corresponded to 62% of the cytological length of chromosome 1.
We performed QTL mapping to confirm that the chromosome 1 region showing reduced recombination in males contained the M-locus. Using standard interval mapping based on the Cross #1 linkage map and a binary trait model, we found a major QTL associated with sex on chromosome 1 (Figure 1A). The highest logarithm of odds (LOD) score for this QTL was 7.6 at 56.4 cM with a 1.5 LOD support interval spanning from 54.1 to 69.2 cM. The genome-wide LOD threshold for statistical significance (α = 0.05) calculated from 1,000 permutation tests was 3.10. The identified sex QTL accounted for 79.7% of phenotypic variation.
We also estimated local recombination rates by comparing the Cross #1 linkage map with the physical map of the genome, using the cumulative length of supercontigs that contained uniquely mapped markers as a proxy for physical distance. Even though this method underestimates physical distances and results in overestimation of recombination rates [22], it is suitable for comparisons of local recombination rates within and between chromosomes. The estimated recombination rate varied across the genome with marked recombination cold spots on all three chromosomes (Supplementary figure 3). Notably, local recombination rates were most variable on chromosome 1, ranging from 0.04 cMTMbp to 8.43 cM/Mbp. On this chromosome, 14.6 Mbp of genomic sequence (38% of the chromosome physical length) were mapped to a 5.7 cM region (only 7% of the chromosome genetic length) that spanned between 54 and 60 cM. Local recombination rates for chromosomes 2 and 3 varied from 1.56 to 7.71 cM/Mbp and 0.26 to 4.57 cM/Mbp, respectively, showing more consistent relationships between physical and genetic distances.
M- and m-chromosomes are genetically differentiated in wild Ae. aegypti populations
Given that rare recombination events between the M- and m-chromosomes in male meiosis cannot be detected unless mapping families are very large, sequences from natural populations can be used to infer historical recombination events and their potential consequences. We characterized a recombination-dependent metric (linkage disequilibrium [LD]), genetic divergence, and heterozygosity in field-caught mosquitoes from a large, panmictic population [23] in Rio de Janeiro, Brazil. Earlier studies have also shown that this population is highly genetically divergent from Ae. aegypti found in Southeast Asia and Australia [23, 24]. We genotyped 69 males and 33 females at double-digest RAD tags, providing genome-wide sequence data across 69 M-chromosomes, 135 m-chromosomes, and 204 copies of each autosome (chromosomes 2 and 3).
RAD tags unambiguously mapped to a single genomic position were selected if they were (i) polymorphic in at least one sex (minor allele frequency [MAF] ≥5%), (ii) shared among ≥70% of all individuals and (iii) sequenced at a depth ≥5×. To order the RAD markers, we used a previously published genetic map [25] that allowed more supercontigs to be positioned than with the maps generated from Cross #1 and Cross #2. Based on this linkage map, we ordered 713 RAD markers on chromosome 1, 703 markers on chromosome 2, and 1,330 markers on chromosome 3 (Supplementary file 4). Median distance between markers was 50 kbp on chromosomes 1 and 3, and 100 kbp on the largest chromosome 2.
We analyzed the pattern of LD between RAD markers for each sex and chromosome pair separately. LD estimates (ALD, see Materials and Methods) indicated that strong allelic association (r >0.5) does not generally extend beyond 1 Mbp. Males displayed slightly elevated LD along 5 Mbp in the approximate centromeric region of chromosome 1 (Figure 2), but it is difficult determine the exact extent of local LD due to the scarcity of RAD markers along a chromosome. Overall, the observed patterns suggest that some degree of recombination between the M-and m-chromosomes occurs in this Ae. aegypti population.
Frequency of heterozygotes (H) was higher in males than in females on chromosome 1 (Hmales=0.291, Hfemales90.270, one-tailed t1424=1.908, p=0.028), but not on chromosome 2 (Hmales=0.270, Hfemales=0.277, one-tailed t1408=0.692, p=0.755), or on chromosome 3 (Hmales=0.271, Hfemales=0.271, one-tailed t2655=0.083, p=0.533). The fixation index FST is a common measure of divergence in allelic frequencies between populations [26]. FST between males and females is expected to have a maximum value of 0.5 for fully sex-linked markers where M (analogous to Y) and m (analogous to X) chromosome regions have been fixed for different alleles. Again, only chromosome 1 showed elevated FST values across a pericentric 50-Mbp region of chromosome 1 (Figure 2). Eleven supercontigs within this region have been previously physically mapped to chromosome 1 bands 1p25, 1q11, 1q13-14, 1q31-33, which are well outside the M-locus position (1q21).
To assess if genomic composition of M-and m-chromosomes is sufficiently differentiated to predict phenotypic sex in Ae. aegypti, we applied a multivariate clustering method (discriminant analysis of principal components [DAPC]) [27]. In addition to the sample from Brazil, this analysis included a smaller sample of 41 field-caught adult females and 15 adult males from Gordonvale, Australia. DAPC was performed separately for each geographic sample and chromosome. Individuals were assigned to two groups based on a discriminant analysis of five retained PCs in order to avoid model overfitting [28]. Genetic variation along chromosome 1 was sufficient to identifyassign most individuals to their correct sex: 89% (91/102) of individuals from Brazil and 95% (53/56) of individuals from Australida were correctly identified as males or females. Conversely, genetic separation based on variation on chromosomes 2 and 3 was no better than random (33% and 55% accuracy for chromosome 2, 44% and 64% for chromosome 3 in the Brazilian and Australian samples, respectively). The frequency distribution of individual mosquitoes according to their individual DAPC scores demonstrates clear separation of females and males based only on chromosome 1 variation (Figure 3).
Given that DAPC finds linear combinations of alleles (i.e., discriminant functions) which best separate the clusters [27], sex-linked markers can be identified as those with the highest contribution to discrimination between males and females. In agreement with an XY sex-determination system where one sex is expected to be heterozygous, 30 RAD markers with the highest contribution to the discriminant function were heterozygous (analogous to XY) in (nearly) all males and homozygous (analogous to XX) in all females of both populations. Importantly, these markers are located on supercontigs that have been mapped outside of the M-locus chromosomal region (Supplementary file 4). Variants were annotated as being in the intergenic, downstream and intron sequences (with modifier effects) rather than in the coding sequences (Supplementary file 4).
Discussion
We provide evidence that the sex-determining chromosome in Ae. aegypti is sufficiently genetically differentiated to show features of an XY chromosomal system despite apparent homomorphy. Results from the laboratory crosses and unrelated wild populations point to the commonality of these chromosomal features in Ae. aegypti from the New World and Austral-Asia. Our findings challenge the traditional view that the sex-determining chromosome in Ae. aegypti behaves like an autosome outside a small, non-recombining sex-determining region (i.e., the M-locus).
Synteny analyses between Ae. aegypti and the malaria mosquito Anopheles gambiae revealed a complex evolutionary history of their sex-determining chromosomes [21]. Even though both species have the same number of chromosomes (2n=6), An. gambiae has clearly distinguishable heteromorphic (X and Y) sex chromosomes that contain large amounts of heterochromatin [21]. In Ae. aegypti, chromosome rearrangements have substantially reshuffled the genetic material between 1p and 1q arms, but there are still notable homologies between the 1p arm of Ae. aegypti and the X chromosome of An. gambiae [21]. The two mosquito lineages diverged around 145-200 million years ago [29] and their common ancestor most likely had homomorphic sex chromosomes [12]. This suggests that the sex-chromosome homomorphy in Ae. aegypti is old and has been maintained through ongoing recombination [18].
However, in our linkage mapping intercrosses, recombination in males was undetectable across a large region of the sex-determining chromosome. Synteny analysis of supercontigs mapped to this region indicated that they were distributed between bands 1p22 and 1q41 of the chromosome map, representing 62% of the total ideogram length (Figure 1). This lack of recombination in males was a hurdle for generating a high-resolution genetic map for QTL mapping of dengue vector competence. In a recent study, Juneja and colleagues also reported low recombination across a large fraction of the chromosome 1 in their mapping female population, which prevented identification of a causal variant for resistance to a nematode parasite [25]. From a practical perspective, reduced recombination along chromosome 1 in males may limit the ability to identify genomic regions affecting phenotypic traits of interest through QTL mapping studies. Rather than increasing the number of crossing generations in mapping families, genome-wide association studies could be more fruitful when narrowing down genomic regions of interest for a particular trait.
Mapping families provide a contemporary measure of recombination rate, but they do not necessarily reflect historical recombination events. For example, recombination between undifferentiated sex chromosomes in male tree frogs (Hyla spp.) was only evident from an analysis of molecular data from populations when coupled with simulations [30]. Our population analyses of chromosome-wide LD in Ae. aegypti suggested that haplotype blocks on chromosome 1 in males were not longer when compared to females, in agreement with data from the other two chromosomes. However, the low density of RAD markers along chromosomes meant that we could only examine statistical associations between alleles tens of kbp (or more) apart. The absence of haplotype blocks extending beyond 1-5 Mbp is indicative of some degree of ongoing recombination between M-and m-chromosomes. This is consistent with rare recombination events in the vicinity of the M-locus detected by Hall and colleagues who screened several thousands of individuals using a sensitive transgene-assisted approach [18]. A low recombination rate in males could be sufficient to prevent long-range LD on the Ae. aegypti M-chromosome. In the hylid frogs, X and Y chromosomes remain indistinguishable even though recombination is estimated to occur in only 1 in 105 males [30]. Simulation work by Grossen and colleagues showed that recombination rates of 10-4 could keep sex chromosomes homomorphic [31]. If we assume this rate of recombination, a large Ae. aegypti population like the one in Rio de Janeiro [23] would likely have tens of males with recombinant sperm in every generation.
While sex chromosomes in Ae. aegypti are morphologically undifferentiated, they were nevertheless genetically differentiated between sexes in our wild population samples. High Fst values between wild males and females from the same population spanned a pericentromeric 50-Mbp region from band 1p25 to band 1q33 (Figure 2), which matches the region of reduced recombination delineated in our laboratory crosses (Figure 1). Furthermore, genotypes at the most differentiated (i.e., putative sex-linked) markers were in agreement with the XY sex-determination system, with homozygous (i.e., analogous to XX) females and heterozygous (i.e., analogous to XY) males.
Our analyses give conservative estimates of sequence differentiation between sex chromosomes in Ae. aegypti because the datasets consisted of RAD tags found in both sexes; any male-specific sequences without gametologs (i.e., homologous sequences on the nonrecombining opposite sex chromosome) were not considered. Male-specific sequences were previously identified in the Liverpool strain of Ae. aegypti (used to generate the reference genome sequence) as largely missing from the current genome assembly [9, 18]. Male-specific sequences were detected in our double-digest RAD sequencing datasets from wild populations (Supplementary file 5). We created reference-free de novo RAD tags from sequences that failed to align to the female-biased genome assembly, and retained tags that were only present in males. We found 185 putative male-specific RAD tags that aligned to the Ae. aegypti sequences in the NCBI database with a BLAST E-value <10−5 (Supplementary file 5). Moreover, one RAD tag was identified as the myo-sex sequence, an extremely male-biased gene linked to the M-locus in the Ae. aegypti Liverpool strain [18]. Our results support Y-like sequences in Ae. aegypti males not included in the current genome assembly. A long-read sequencing technology was recently used to assemble the repeat-rich Y chromosome of Anopheles mosquitoes [32]. The same approach could be used to identify Y-like sequences and incorporate them into an improved assembly of Ae. aegypti.
Overall, our findings show that a large section of chromosome 1 in Ae. aegypti displays features of genetically but not morphologically differentiated X and Y chromosomes. These features could mean that M-and m-chromosomes represent neo-Y and neo-X chromosomes on their path to morphological differentiation, where cessation of recombination helps to physically separate sexually antagonistic alleles [15]. Alternatively, chromosomal features in Ae. aegypti could be old, and sexually antagonistic selection may have been resolved by evolving sexually-biased gene expression instead of eliminating recombination [33]. Perhaps mosquitoes, like birds [33], have found different evolutionary solutions to deal with deleterious effects of sexually antagonistic mutations. Some lineages may have maintained homomorphic sex chromosomes (e.g., Ae. aegypti and other Culicinae), while others evolved heteromorphic sex chromosomes (e.g., Anopheles mosquitoes).
The results have implications not only for genetic mapping studies but also for Ae. aegypti population genetics. Sex-specific differentiation over a large region of the genome needs to be considered when using genetic markers for assessing population genetic structure. To date, genetic analyses of Aedes populations have proven challenging due to problems associated with null alleles and other factors that cause deviations from Hardy-Weinberg equilibrium [e.g. 34], and some of these issues may stem from markers located within the XY-like region of chromosome 1. Sexes should therefore always be clearly distinguished in such studies and the chromosomal location of markers should be established. Where sex separation based on morphological characters is difficult (e.g., in immature stages or damaged material), molecular sex-specific markers can be used (e.g., Supplementary file 5). Consideration of the XY chromosomal features in Ae. aegypi is also warranted for the development of vector-control strategies such as the field deployment of Wolbachia-infected mosquitoes [35] where the release stocks undergo several generations of backcrossing. The backcrossing is done to create favorable combinations of alleles that facilitate artificial rearing and increase fitness of the release stock under natural conditions. Because Wolbachia is transmitted maternally and causes cytoplasmic incompatibility, Wolbachia-infected females are crossed to males from a target field population [36]. Low recombination in male meiosis means that males from the release colony are expected to maintain the background of the natural population along a significant portion of the M-chromosome.
In conclusion, our discovery of reduced recombination and genetic differentiation between otherwise homomorphic sex chromosomes of Ae. aegypti calls for investigations of similar features of an XY chromosomal system in other species of the Culicinae subfamily. For instance, the M-locus of Culex pipiens has a common origin with that of Ae. aegypti [37]. Elucidating the evolutionary history of sex-determining chromosomes in the Culicinae subfamily would help to determine whether the XY-like features we observed in Ae. aegypti have been maintained over long evolutionary times, or whether they indicate that M-and m-chromosomes are nascent X and Y heteromorphic chromosomes. Finally, a thorough understanding of Ae. aegypti sex determination will also require an improved assembly of the genome sequence that incorporates male-biased and male-specific sequences.
Materials and Methods
Mosquito collection and crosses
This study used two independent laboratory crosses of wild-type Ae. aegypti mosquitoes originally collected in February 2011 from Kamphaeng Phet, Thailand. Cross #2 was an F2 intercross between a pair of field-collected mosquito founders collected as eggs using ovitraps as previously described [38]. Briefly, F0 eggs were allowed to hatch in filtered tap water and pupae emerged in individual vials. Aedes aegypti adults were identified by visual inspection and maintained in an insectary under controlled conditions (28±1°C, 75±5% relative humidity and 12:12 hour light-dark cycle). The male and female of each mating pair were chosen from different collection sites to avoid that F0 parents were siblings from the same wild mother [39, 40]. Virgin F0 adults were allowed to mate for 2-3 days following emergence and the inseminated female was blood fed and allowed to lay eggs. Egg batches from the same female were merged to obtain a pool of F1 eggs. F0 founders were saved for later DNA extraction and genotyping. F2 progeny was produced by mass sib-mating and collective oviposition of the F1 offspring. A total of 197 female individuals of the F2 progeny were used as a mapping population to generate a linkage map.
Cross #1 was an F2 intercross between a pair of mosquitoes from two different isofemale lines derived from the same wild Ae. aegypti population in Kamphaeng Phet, Thailand. Both isofemale lines were established in February 2011 from F0 founders as described above and maintained in the laboratory by mass sib-mating and collective oviposition until the 19th generation. At each generation, females were fed on commercial sheep or rabbit blood through an artificial membrane feeding system. Institutional Animal Care and Use Committee approval was not required because blood collection took place postmortem as a by-product of a commercial enterprise. Mosquito eggs were collected and stored on dry pieces of paper towel and maintained under high relative humidity no longer than 6 months. Cross #1 resulted from mating between a single virgin male from one isofemale line with a single virgin female from another isofemale line. Maintenance of an isofemale line at a small population size in the laboratory for 19 generations is expected to lead to inbreeding and maximize homozygosity, one important criterion to generate informative markers for linkage mapping. A total of 22 males and 22 females from the F2 progeny of Cross #1 were used to generate a linkage map and subsequently map the male-determining locus (M-locus).
Independent samples were analyzed from two wild Ae. aegypti populations from Rio de Janeiro, Brazil and Queensland, Australia. Samples of 41 adult females and 15 adult males from Australia were caught using Biogents sentinel traps set up in Gordonvale, Queensland in December 2010. Adult Ae. aegypti were identified as males or females based on the sexually dimorphic antennae and external genitalia structure [41]. Mosquitoes from Rio de Janeiro, Brazil were collected from ovitraps within a single 3-week period in November-December 2011. Larvae were reared until the third instar in an insectary under controlled conditions (25±1°C, 80±10% relative humidity and 12:12 hour light-dark cycle). Only one individual per ovitrap was retained to avoid analyzing siblings that tend to co-occur within the same trap [39, 40]. Sex of each individual was determined based on the presence or absence of the myo-sex sequence [18] and confirmed with two additional male-specific sequences identified in this study (Supplementary file 5; Supplementary figure 4). The final dataset from Brazil consisted of 69 males and 33 females.
Double digest Restriction-site Associated DNA (RAD) sequencing
Mosquito genomic DNA was extracted using the NucleoSpin 96 Tissue Core Kit (Macherey-Nagel, Düren, Germany) and whole genome amplified by Multiple Displacement Amplification using the Repli-g Mini kit (Qiagen, Hilden, Germany) to obtain a sufficient amount of DNA. All DNA concentrations were measured with Qubit fluorometer and Quant-iT dsDNA Assay kit (Life technologies, Paisley, UK). An adaptation of the original doubledigest Restriction-site Associated DNA (ddRAD) sequencing protocol [42] was used as previously described [24] with minor additional modifications. Briefly, a standardized quantity of 500 ng of genomic DNA from each mosquito was digested in a 50 μl reaction containing 50 units each of NlaIII and MluCI restriction enzymes (New England Biolabs, Herts, UK), 1 × CutSmart® Buffer and water for 3 hours at 37°C, without a heat-kill step. The digestion products were cleaned with 1.5× volume of Ampure XP paramagnetic beads (Beckman Coulter, Brea, CA, USA) and ligated to the modified Illumina P1 and P2 adapters with overhangs complementary to NlaIII and MluCI cutting sites, respectively. Each mosquito was uniquely labeled with a combination of P1 and P2 barcodes of variable lengths to increase library diversity at 5’ and 3’ ends (Supplementary file 6). This method allows the multiplexing of up to 60 mosquitoes using 12 P1 and 5 P2 adapters. Ligation reactions were set up in a 45 μl volume with 2 μl of 4 μM P1 and 12 μM P2 adapters, 1,000 units of T4 ligase and 1× T4 buffer (New England Biolabs) and were incubated at 16°C overnight. Ligations were heat-inactivated at 65°C for 10 minutes and cooled down to room temperature in a thermocycler at a rate of 1.5°C per 2 minutes. Adapter ligated DNA fragments from all individuals were then pooled and cleaned with 1.5× bead solution. Size selection of fragments between 350-440 base pairs (bp) for the crosses or 300-450 bp for the field population was performed using a Pippin-Prep 2% gel cassette (Sage Sciences, Beverly, MA, USA). Finally, 1 μl of the size selected DNA was used as a template in a 10 μl PCR reaction with 5 μl of Phusion High Fidelity 2× Master mix (New England Biolabs) and 1 μl of 50 μM P1 and P2 primers. To reduce PCR duplicates bias, 8 PCR reactions were run in parallel, pooled, and cleaned with a 0.8× bead solution to make the final library. If sequence reads for particular loci are enriched due to PCR duplication in a single PCR reaction, their final overrepresentation in the library would be limited with the pool of 8 different PCR reactions. At this step, final libraries were quantified by quantitative PCR using the QPCR NGS Library Quantification Kit (Agilent technologies, Palo Alto, CA, USA). For the mapping crosses, libraries containing multiplexed DNA fragments from 48 to 50 mosquitoes were sequenced on an Illumina NextSeq platform using a NextSeq 500 High Output 300 cycles v1 kit (Illumina, San Diego, CA, USA) to obtain 150-bp paired-end reads. An optimized final library concentration of 1.1 pM, spiked with 15% PhiX, was loaded onto the flow cell. For the field populations, three ddRAD libraries each containing 52-56 mosquitoes were sequenced in three lanes of the Illumina HiSeq platform with a 100-bp paired-end chemistry (SRA# pending).
Sequence processing and SNP calling
A previously developed bash script pipeline [24] was used to process raw sequence reads with minor modifications. Briefly, the DDemux program was used for demultiplexing fastq files according to the P1 and P2 barcodes combinations. Sequence quality scores were automatically converted into Sanger format. Sequences were filtered with FASTX-Toolkit. Reads were trimmed to 90 bp (HiSeq) and 140 bp (NextSeq) on both P1 and P2 ends. In addition, we trimmed the first 4 bp of P2 NextSeq reads to decrease the probability of sequencing errors in the low diversity MluCI cutting site. A higher sequencing error rate was initially detected in this AATT sequence compared to other parts of the sequence reads. This could be due to a lower accuracy of the two-channel SBS technology to distinguish A from T nucleotides in a region of very low diversity (a known weakness of the NextSeq 500 v1 kits). All reads with Phred scores <25 were discarded. P1 and P2 reads were then matched and unpaired reads were sorted as orphans.
Paired reads were aligned to the Ae. aegypti genome (AaegL1, February 2013) [16] using Bowtie version 0.12.7 [43]. Parameters for the ungapped alignment included a maximum of three mismatches allowed in the seed, suppression of alignments if more than one reportable alignment exists, and a “try-hard” option to find valid alignments. Orphans were joined with all unaligned paired-reads and single-end alignment was attempted. All aligned Bowtie output files were merged per individual and were imported into the Stacks pipeline. A catalogue of RAD loci used for single nucleotide polymorphism (SNP) discovery was created using the ref_map.pl pipeline in Stacks version 1.19 [44, 45]. First, sequences aligned to the same genomic location were stacked together and merged to form loci using Pstacks. Only loci with a sequencing depth >5 reads per individual were retained. Cstacks was then used to create a catalog of consensus loci, merging alleles together and Sstacks was used to match all identified loci. For the mapping crosses, we used the “genotypes” module to export F2 mosquito genotypes for all markers homozygous for alternative alleles in the F0 parents (i.e., AA in the F0 male and BB in the F0 female) with a sequencing depth ≥12× in ≥60% of the mapping population. The automated correction option was enabled to correct false-negative heterozygote alleles.
In samples from the wild populations, we selected polymorphic RAD tags (MAF ≥1%) shared among ≥70% of individuals. Where present, multiple SNPs per RAD tag were collated into mini-haplotypes with ≥2 “haplo-alleles” (ver. Stacks 1.35). This was done to retain information about the variation within a 90-bp sequence, but to avoid treating such nearby SNPs as independent variants in downstream analyses. Therefore, our data set consisted of bi-allelic SNPs (i.e., extracted from tags that contained only 1 SNP), and multiallelic markers (i.e., “haplo-alleles” extracted from tags that contained more than 1 SNP). The final number of markers was 2,748 in the sample from Brazil (Chr1 = 713, Chr2 = 705, Chr3 = 1,330), and 1,183 in the sample from Australia (Chr1 = 295, Chr2 = 326, Chr3 = 562).
Linkage map construction
OneMap v2.0-3 [46], implemented as a package in the R environment [47], was used to construct linkage maps based on recombination fractions among RAD markers in the mapping populations. Based on the selection criteria described above, every marker is expected to segregate at a frequency of 25% for homozygotes (i.e., AA and BB genotypes) and of 50% for heterozygotes (i.e., AB genotypes) in an F2 mapping population that includes both males and females. When only females are genotyped in the F2 progeny (i.e., Cross #2), fully sex-linked markers are expected to segregate with equal frequency (50%) of AB and BB genotypes, because the F0 paternal AA genotype only occurs in F2 males (Supplementary figure 1). Reciprocally, fully sex-linked markers in F2 males are expected to lack F0 maternal BB genotypes. The chromosome 1 section encompassing markers that appear fully sex-linked based on the sample size of the study is referred to here as the region of reduced recombination (RRR) in males. As the genetic distance from the M-locus increases, the probability to detect recombinant AA genotypes in F2 females and BB genotypes in F2 males increases on both sides of the RRR.
In Cross #2, markers exhibiting extreme Mendelian segregation distortion in F2 females were excluded from further analysis. Markers were included if they had heterozygous (AB) genotype frequencies inside the]20% – 65%[range, F0 maternal (BB) genotype frequencies inside]5% – 65%[, and F0 paternal (AA) genotype frequencies <70%. These arbitrary limits for initial marker selection were largely permissive for markers segregating according to theoretical proportions (25% AA: 50% AB: 25% BB at autosomal loci or a maximum of 50% AB: 50% BB at sex-linked loci) and facilitated the initial assignment of markers to linkage groups.
The occurrence of RAD markers in multiple copies (i.e., different loci sharing the same nucleotide sequence) was examined among all RAD markers identified. Only 0.5% of all reads aligned in multiple locations after suppression of multiple copy alignments in Bowtie, and the corresponding markers were discarded. RAD markers with a sequencing depth ≤12× in F0 parents were discarded to minimize the risk of false homozygous calls.
Recombination fractions between all pairs of selected markers were estimated using the rf.2pts function with default parameters. In Crosses #1 and #2, markers linked with a minimum LOD score of 8 and 14, respectively, were assigned to a same linkage group. Unlinked markers, if any, were removed from further analysis. For each cross, the minimum LOD score thresholds to declare linkage were chosen to maximize the number of markers in three distinct linkage groups. At this stage, linkage groups could be assigned to the three distinct Ae. aegypti chromosomes based on markers located on supercontigs shared between the present study and a published linkage map [25]. For each marker, a χ2 test was used to detect deviations of the observed genotype frequencies in the F2 progeny from the theoretical 25% (homozygotes): 50% (heterozygote) expectations. In autosomal linkage groups, only markers whose genotype frequencies had Mendelian segregation ratios with p-values >0.1 and >0.05 were retained in the analysis of Cross #1 and Cross #2, respectively. Marker selection based on Mendelian segregation ratios was not applied to Cross #2 markers in the linkage group assigned to chromosome 1, and to Cross #1 markers in the linkage group assigned to chromosome 2 that displayed an unexplained deficit of AA genotypes in both sexes. Instead, iterative marker ordering was performed by sequentially removing markers with strong linkage discrepancies within the linkage group.
Cross #1 could be analyzed as a classical F2 intercross design because both males and females were genotyped. Markers from each linkage group were provisionally ordered using the two-point based recombination counting and ordering (RECORD) algorithm [48]. Recombination fractions were converted into genetic distances in centiMorgans (cM) using the Kosambi mapping function [49]. Genetic distances among six equally spaced markers from this initial map were refined using multipoint likelihood of all possible orders. All remaining markers were then re-mapped to this initial frame in a stepwise fashion using the automated procedure of the order.seq function in the OneMap package. A “safe” linkage map was generated that only included markers uniquely mapped to a single position with a LOD score >3. The touchdown option was used to perform a second round of stepwise mapping for LOD scores >2. When possible, markers located on supercontigs that were previously mapped physically to the chromosomes [19] were substituted for markers located in supercontigs with unknown chromosome position. This step maximized connections between linkage and chromosome maps for synteny analysis.
Cross #2 could only be analyzed as a classical F2 intercross design for autosomal linkage groups because only females were genotyped. A “safe” linkage map was generated as described above but chromosome 1 was further processed to account for the specific segregation patterns in the sex-linked region of F2 females. Markers fully sex-linked lack F0 paternal AA genotypes in F2 females and segregate as in a backcross design in which F1 AB heterozygotes are backcrossed to F0 BB homozygotes. No linkage analysis method is readily available to deal with a chromosome that behaves partially as in a backcross (i.e., fully sex-linked loci) and partially as in an intercross (i.e., the remainder of chromosome 1 loci). Linkage analysis of chromosome 1 in Cross #2 was therefore restricted to the fully sex-linked region, referred to as the RRR. A new OneMap input file that only contained markers lacking AA genotypes was made by setting the population type as “backcross” instead of “F2 intercross”. Markers were ordered in the RRR using the order.seq function in OneMap as described above. The final linkage map of Cross #2 combined the relative order of markers in the RRR obtained from a backcross analysis and the relative order of markers on chromosomes 2 and 3 obtained from an F2 intercross analysis.
The linkage maps generated from Cross #1 and Cross #2 allowed a number of supercontigs to be identified that were mapped to distinct genome locations inside the same map. Intra-and inter-chromosome inconsistencies in the mapping of supercontigs can arise from supercontig misassemblies. The sequence of each RAD tag was mapped to supercontigs of the AaegL1 genome assembly using BWA [50] to provide anchors for integrating supercontigs into the genetic maps. The Chromonomer software [51] was used to detect and correct supercontigs misassemblies and orientation when possible (Supplementary file 1 and 2). Two out of the 4 supercontigs detected as misassembled in our maps had already been reported as misassembled in previous work [25, 52]. Shotgun sequence assemblies of large genomes that are rich in sequence repeats are prone to supercontigs misassemblies [53]. As reported by Juneja and colleagues [25], misassembled supercontigs are larger in size than correctly assembled ones. The size of the 4 misassembled supercontigs identified in this study was above the 95th percentile of the size distribution of all supercontigs of the current genome assembly. Chromosome rearrangements across different Ae. aegypti populations or inbred lines is an alternative hypothesis to explain discrepancies between supercontigs assignment to chromosomes.
Linkage disequilibrium and genetic differentiation analyses
Linkage disequilibrium (LD) between marker pairs on each chromosome was calculated as the asymmetric LD coefficient (ALD) in the R package “asymLD” [54]. ALD is a multiallelic extension of the r2 metric [54] and is appropriate for our haplo-allelic dataset (see above). Heatmaps of the mean LD metric within a 1-Mbp window were plotted using the R package “lattice”. The pairwise measure of genetic differentiation Fst by Weir and Cockerham [55], as well as [55] and the observed heterozygosity H were calculated for the haplo-allele dataset using the program Genepop [56]. Discriminant analysis of principal components (DAPC) [27] was performed with the R package “adegenet” [57].
Acknowledgements
We thank Alongkot Ponlawat, Jason Richardson, Richard Jarman, Butsaya Thaisomboonsuk, Robert Gibbons, Stefan Fernandez, Timothy Endy, Anthony Schuster, and members of the Lambrechts lab for their insights. We are grateful to Eric Deveaud, Nicolas Joly, Olivia Doppelt-Azeroual and Veronique Legrand for assistance with computational analysis, and to the Nectar Research Cloud for computational resources. The opinions or assertions contained herein are the private views of the authors and are not to be construed as reflecting the official views of the United States Army, Royal Thai Army, or the United States Department of Defense.