Abstract
Canonical ancient sex chromosome pairs consist of a gene rich X (or Z) chromosome and a male- (or female-) limited Y (or W) chromosome that is gene poor. In contrast to highly differentiated sex chromosomes, nascent sex chromosome pairs are homomorphic or very similar in sequence content. Nascent sex chromosomes arise frequently over the course of evolution, as evidenced by differences in sex chromosomes between closely related species and sex chromosome polymorphisms within species. Sex chromosome turnover typically occurs when an existing sex chromosome becomes fused to an autosome or an autosome acquires a new sex-determining locus/allele. Previously documented sex chromosome transitions involve changes to both members of the sex chromosome pair (X and Y, or Z and W). The House fly has sex chromosomes that resemble the ancestral fly karyotype that originated 100 million years ago, and therefore house fly is expected to have X and Y chromosomes with different gene content. We tested this hypothesis using whole genome sequencing and transcriptomic data, and we discovered little evidence for genetic differentiation between the X and Y in house fly. We propose that house fly has retained the ancient X chromosome, but the ancestral Y was replaced by an X chromosome carrying a male determining gene. Our proposed hypothesis provides a mechanisms for how one member of a sex chromosome pair can experience evolutionary turnover while the other member remains unaffected.
1. Introduction
In organisms where sex is determined by genetic factors, sex determining loci can reside on sex chromosomes. Sex chromosome systems are divided into two broad categories: 1) males are the heterogametic sex (XY); or 2) females are the heterogametic sex (ZW). In long established sex chromosomes—such as in birds, eutherian mammals, and Drosophila—the X and Y (or Z and W) chromosomes are typically highly differentiated (Charlesworth, 1996; Charlesworth et al., 2005). The X (or Z) chromosome usually resembles an autosome in size and gene density, although there are some predicted and observed differences in gene content and evolutionary rates between the X (or Z) and autosomes (Rice, 1984; Charlesworth et al., 1987; Vicoso and Charlesworth, 2006; Sturgill et al., 2007; Ellegren, 2011; Meisel et al., 2012; Meisel and Connallon, 2013). In contrast, Y (Z) chromosomes tend to contain a small number of genes with male- (female-) specific functions and are often enriched with repetitive DNA as a result of male- (female-) specific selection pressures, a low recombination rate, and a reduced effective population size (Rice, 1996; Bachtrog, 2013). This X-Y (or Z-W) differentiation results in a heterogametic sex that is effectively haploid for most or all X (or Z) chromosome genes.
Highly divergent X-Y (or Z-W) pairs trace their ancestry to a pair of undifferentiated autosomes (Bull, 1983; Charlesworth, 1991). Many species harbor undifferentiated sex chromosomes because they are either of recent origin or non-canonical evolutionary trajectories have prevented X-Y (or Z-W) divergence (Stöck et al., 2011; Bachtrog, 2013; Vicoso et al., 2013; Yazdi and Ellegren, 2014). Recently derived sex chromosomes often result from Robertsonian fusions between an existing sex chromosome and an autosome, or they can arise through a mutation that creates a new sex determining locus on an autosome (Bachtrog et al., 2014; Beukeboom and Perrin, 2014). In both cases, one of the formerly autosomal homologs evolves into an X (or Z) chromosome, and the other homolog evolves into a Y (or W) chromosome. In some cases, one or both of the ancestral sex chromosomes can revert back to an autosome when a new chromosome becomes sex-linked (Carvalho and Clark, 2005; Larracuente et al., 2010; Vicoso and Bachtrog, 2013). In all of the scenarios described above, the X and Y (or Z and W) chromosomes evolve in concert, with an evolutionary transition in one sex chromosome producing a corresponding change in its partner.
Sex chromosome evolution has been extensively studied in higher dipteran flies (Brachycera), where sex chromosome transitions involving X-autosome fusions are common (Patterson and Stone, 1952; Schaeffer et al., 2008; Baker and Wilkinson, 2010; Vicoso and Bachtrog, 2015). The ancestral brachyceran karyotype consists of ve large autosomal pairs (known as Muller elements A-E) and a heterochromatic, gene-poor sex chromosome pair (element F is the X chromosome); this genomic arrangement has been conserved for ~100 million years (my) in some lineages (Muller, 1940; Foster et al., 1981; Weller and Foster, 1993; Vicoso and Bachtrog, 2013). In species with the ancestral karyotype, females are XX and males are XY, with a male-determining locus (M factor) on the Y chromosome (Bopp et al., 2014). Many sex chromosome transitions have occurred across Brachycera, including fusions of ancestral autosomes with the X chromosome and complete reversions of the ancestral X to an autosome (Baker and Wilkinson, 2010; Vicoso and Bachtrog, 2013, 2015).
The house fly (Musca domestica) is a classic model system for studying sex determination because it harbors a vast array of natural and laboratory genetic variation (Dübendorfer et al., 2002). The house fly karyotype resembles that of the ancestral brachyceran, with five large euchromatic elements and a heterochromatic sex chromosome pair (Boyes et al., 1964). The house fly X and Y chromosomes can be distinguished based on their length in cytological preparations (Denholm et al., 1983; Cakir and Kence, 1996; Hediger et al., 1998b), and close relatives of the house fly (Lucilia blow flies) have the ancestral karyotype with differentiated X and Y chromosomes (Linger et al., 2015; Vicoso and Bachtrog, 2015). In contrast to other flies, the house fly M factor has been mapped to each of the five autosomes in addition to the Y chromosome (Hamm et al., 2015). The autosomes harboring M are expected to be neo-Y chromosomes, but the house fly Y chromosome is often assumed to be the ancestral brachyceran Y that is differentiated from the X (Vicoso and Bachtrog, 2013; Hamm et al., 2015). However, there are multiple reasons to suspect that the house fly Y chromosome is not differentiated from the X in terms of gene content. First, no sex-linked genetic markers have been identified on the ancestral House fly sex chromosomes other than M (Hamm et al., 2015), suggesting that there are no X-specific genes or genetic variants. Second, males with an autosomal M factor that do not carry a Y chromosome are fertile (Bull, 1983; Hamm et al., 2015), suggesting that no essential male fertility genes are unique to the Y chromosome apart from the M factor. Third, house flies that carry only a single copy of either the X or Y chromosome (i.e., XO or YO flies) are viable and fertile (Bull, 1983; Hediger et al., 1998a), indicating that no essential genes are uniquely found on the X and missing from the Y chromosome and vice versa.
We used whole genome and transcriptome sequencing to test for genetic differentiation between between the house fly X and Y chromosomes. We observed minimal differentiation in sequence and gene content between X and Y chromosomes. We propose that the ancestral Y chromosome has been lost from house fly populations, and that existing Y chromosomes in natural populations arose from the recent translocation of the M factor onto an ancestral X chromosome. This represents, to the best of our knowledge, the first example of the “recycling” of a sex chromosome pair through the creation of a nascent Y from an ancient X chromosome (Graves, 2005).
2. Results
2.1. The House fly X and Y chromosomes do not have unique sequences
Our first goal was to identify house fly X chromosome sequences not found on the Y, which would be consistent with the hypothesis that house flies have an ancient, differentiated sex chromosome pair. Males of the house fly genomic reference strain (aabys) have been previously characterized as possessing the XY karyotype (Wagoner, 1967; Tomita and Wada, 1989; Scott et al., 2014). To identify X-linked genes and examine differentiation between X and Y chromosomes, we used the Illumina technology to sequence genomic DNA (gDNA) separately from male (XY) and female (XX) aabys flies (3 replicates of each sex), and we aligned the reads to the annotated genome (read counts are available in Supplemental Data S1). If house fly males have a Y chromosome that is fully differentiated from the X, we expect females to have twice the sequencing coverage within genes on Muller element F (the ancestral X chromosome) as males (Vicoso and Bachtrog, 2013). We instead find that the average sequencing coverage in males and females is almost identical for genes on all six chromosomes (Fig 1).
To determine whether lack of X-Y differentiation is common to other XY strains of the house fly, we sought to identify X-linked genes in two additional strains previously reported to have XY males: A3 and LPR (Scott and Georghiou, 1985; Scott et al., 1996; Liu and Yue, 2001). We sequenced gDNA from males and females of the A3 and LPR strains, and we aligned those reads to the reference genome (read counts are available in Supplemental Data S2 & S3). Consistent with the results from aabys, both the A3 and LPR strains had equal sequencing coverage within genes across all six chromosomes in males and females (Fig 1). In addition , is moderately well correlated (0.2586 < r < 0.3049) across the three strains and in particular for extreme values (Supplemental Fig S1), suggesting similar sequence content across strains and a minimal effect of experimental error. In addition, our results suggest that there are no genes found on the house fly X chromosome that are not present on the Y chromosome.
To ensure that our results are not an artifact of incomplete annotation of house fly X-linked genes, we calculated coverage across non-overlapping 1 kb intervals in the reference genome. The distribution of across autosomes is expected to be centered at zero. If males have a single copy of the X chromosome, we should observe a second peak at indicating a 2-fold enrichment of X-linked sequences in females. We do indeed observe that the distributions of are centered near zero for all three house fly strains in our analysis, but there is no obvious secondary peak at in any of the distributions (Fig 2). To test for a secondary peak at we fit a mixture of two normal distributions to our data using an expectation-maximization algorithm with starting values of −1 and 0 for the means of the two distributions (Benaglia et al., 2009). Most of the 1 kb intervals (93–99%) are assigned to a distribution that is centered near zero, and the remainder of the intervals are assigned to a secondary distribution with a mean <0 (Fig 2). However, those secondary distributions all have a mean > −1, suggesting that there are few sequences present on the X chromosome at twice the abundance as on the Y. Additionally, for the 1 kb intervals is positively correlated (0.1661 < r < 0.2199) across the three strains, and in particular for extreme values (Supplemental Fig S2).
We next sought to identify Y-linked sequences that are absent from the X chromosome (i.e., the reciprocal of the analyses described above). To this end, we first used the male sequencing reads from the aabys strain to assemble a genome that contains a Y chromosome using SOAPdenovo2 (Luo et al., 2012). It was necessary to assemble a male genome because the genome project sequenced gDNA from female flies (Scott et al., 2014). Then we used a k-mer comparison approach to identify male-specific sequences by searching for male genomic scaffolds that are not matched by female sequencing reads (Carvalho and Clark, 2013). Most of the scaffolds in the male genome assembly were (nearly) completely matched by female sequencing reads, and none of the male scaffolds were completely unmatched by female sequencing reads (Fig 3). We obtain similar results when we use a male genome assembled with ABySS (Simpson et al., 2009) or if we assemble the male genome with SOAPdenovo2 using only male reads that do not align to the female assembly (Supplemental Fig S3). In contrast, when this approach was used to identify Y-linked scaffolds in species with differentiated sex chromosomes (Drosophila and humans), a substantial number of Y-linked scaffolds were completely unmatched by female sequencing reads (Carvalho and Clark, 2013). These results suggest that there are not large segments of the House fly Y chromosome that are unique from the X chromosome or the rest of the genome.
We examined the scaffolds from the male house fly genome with a high percentage of sequence unmatched by female reads. We performed blastx searches of the 50 scaffolds with the highest percent unmatched sequence (79.7–96.6%) against the NCBI non-redundant protein database (Altschul et al., 1997). Only 6/50 scaffolds had hits to annotated house fly genes, whereas 27 had hits to transposable element (TE) sequences, 3 hit other sequences from other species, and 14 had no hits in the database (Supplemental Data S4 & S5). The scaffold with the highest percent unmatched sequence (96.6%) is 1,143 nucleotides long and contains a 219 basepair segment that matches an annotated house fly gene on chromosome 1 that is homologous to a Drosophila melanogaster gene with a predicted membrane associated GRAM domain (CG34392). This scaffold does not have any blastn or blastx hits to other sequences in the database. In addition, there are 22 scaffolds that are both >5 kb and >50% unmatched by female reads (Supplemental Data S4 & S6). We also performed blastx of those scaffolds against the NCBI database, and we found that 15/22 of hit a TE, 4 hit an annotated house fly gene, and 3 hit an other sequence from another species. None of the annotated house fly genes hit by these 72 scaffolds are predicted to be on element F (the house fly X chromosome). Notably, most of these scaffolds (42/72) have sequence similarity with a TE, suggesting that the Y chromosome may contain unique repetitive sequences or be enriched for particular repeat classes.
2.2. Moderate differences in sequence abundance between House fly males and females
We next examined whether housefly X and Y chromosomes exhibit differential representation of shared sequences, as might be expected from expansion or contraction of satellite repeats or other repetitive elements. We first used a principal components (PC) analysis to compare read mapping coverage of the male and female sequencing libraries across non-overlapping 1 kb intervals in the reference (XX female) genome. The first PC (PC1) explains 81.5–91.1% of the variance in coverage across libraries in the three strains, and PC1 clearly separates the male and female sequencing libraries in all three strains (Fig 4). Therefore, house fly males and females, and by association X and Y chromosomes, exhibit systematic differences in the abundance of some sequences.
We applied two different approaches to characterize sequences enriched on the X and Y chromosomes (i.e., differentially abundant in female and male genomes). First, we searched for 1 kb windows with significantly different coverage between males and females (false discovery rate corrected P < 0.05 and We identified 214 of these “sex-biased” windows: 63 are >2-fold enriched in females, and 151 are >2-fold enriched in males (Supplemental Data S7). The X and Y chromosomes of house fly are largely heterochromatic (Boyes et al., 1964; Hediger et al., 1998b), and it is possible that differences in the abundances of particular repetitive DNA sequences (e.g., TEs and other interspersed repeats) between the X and Y chromosomes are responsible for the differences in read coverage between females and males. Sequences from repetitive heterochromatic regions of the genome are less likely to be mapped to a genomic location (Smith et al., 2007), and we therefore expect sex-biased windows to be located on scaffolds that are not mapped to a house fly chromosome. Only 2/63 (3.2%) female-enriched windows are within a scaffold that we were able to map to a chromosome (neither was mapped to element F, the ancestral X chromosome). In addition, 59/151 (39.1%) male-enriched windows are within a scaffold that maps to a Muller element (only one of those scaffolds maps to element F). In contrast, 65.7% of 1 kb windows that are not differentially covered between males and females are on scaffolds that we are able to map to Muller elements (2033/3096 windows with P > 0.05 and < 0.01). These unbiased windows are more likely to be mapped to a Muller element than the sex-biased windows (P < 10−15 in Fisher’s exact test), providing some evidence that differential coverage between males and females could be driven by repeat content differences between the X and Y chromosomes.
We next tested for an enrichment of annotated repeats within the female- and male- biased 1 kb windows, and we found that all 63 of the female-biased windows and most of the male-biased windows (149/151) contain sequences masked as repetitive during the house fly genome annotation (Supplemental Data). However, 3071/3096 (>99%) of the 1 kb windows that are not differentially covered between males and females also contain repeat masked sequences; this fraction is not significantly different than the fraction of repeat masked sex-biased windows (P = 1 for female-biased and P = 0.6 for male-biased windows using Fisher’s exact test). In addition, the proportion of sites within male-biased and female-biased windows that are repeat masked is less than that of unbiased windows, suggesting that the sex-biased windows are actually depauperate for annotated repeats (Supplemental Fig S4). However, these analyses are limited because a large fraction (≥ 52%) of the house fly genome is composed of interspersed repeats that are poorly annotated (Scott et al., 2014). Future improvements to repeat annotation in the housefly genome may therefore shed light on the nature of repetitive sequences that differentiate the X and Y chromosomes.
As a second approach to identify candidate X- or Y-enriched sequences, we first determined the abundances of all possible 2–10mers in the male and female aabys sequencing reads. This approach will identify smaller sequence motifs that may differentiate the X and Y chromosomes than the analysis described above, and it does not require any a priori repeat annotations. The 100 most common k-mers are found at similar frequencflies in both males and females (Fig 5), with the abundances highly correlated between sexes (r = 0.999). We considered a k-mer to be over-represented in one sex if the minimum abundance across the three replicate libraries for that sex is greater than the maximum in the other sex. Six k-mers are over-represented in males using this cutooff, but they are all less than 2-fold enriched in males (Fig 5 & Supplemental Fig S5). These results suggest that short sequence repeats do not predominantly differentiate the X and Y chromosomes.
2.3. Relative heterozygosity in males and female suggests that the house fly Y chromosome is very young
Our data suggest that, other than the unidentified M factor, the house fly X and Y chromosomes do not contain any genes not found on the gametologous sex chromosome, and we find little evidence for unique sequences on the X or Y. We therefore hypothesize that the house fly Y chromosome is actually an ancestral brachyceran X chromosome that recently acquired an M factor. While recently derived neo-Y chromosomes may not differ in gene content from the gametologous X chromosome, modest sequence-level X-Y differentiation can result in elevated heterozygosity within sex chromosome genes in males (Vicoso and Bachtrog, 2015). We tested for elevated heterozygosity by first identifying polymorphic sites (SNPs) within genes in aabys males and females. Heterozygosity is elevated in X chromosome (element F) genes relative to autosomes in both males and females (Supplemental Fig S6). However, when we compare the proportion of heterozygous SNPs in males relative to females for genes on each chromosome (Fig 6A), genes on the X chromosome resemble autosomal genes with equivalent heterozygosity in males and females (P = 0.45 in a Mann-Whitney test comparing male: female heterozygosity on element F with the other chromosomes). This result demonstrates that the house fly Y is so young that Y chromosome genes have not yet accumulated modest sequence differences from the X chromosome.
Some house fly males carry the M factor on the third chromosome (IIIM), and they have two copies of the X chromosome without an M factor (Hamm et al., 2015). The IIIM chromosome is a recently derived neo-Y not found in close relatives, and we expect that males heterozygous for IIIM (hereafter IIIM males) will have an excess of heterozygous SNPs on the third chromosome. To test this hypothesis, we used available RNA-Seq data (Meisel et al., 2015) to calculate the proportion of heterozygous SNPs in IIIM males relative to XY males (Fig 6B). As predicted, there is an excess of heterozygous SNPs on the third chromosome in IIIM males relative to XY males (P = 10−122) in a Mann-Whitney test comparing chromosome III with the other autosomes). IIIM males also have an elevated number of strain-specific SNPs on the third chromosome (Supplemental Fig S7). Surprisingly, there is also increased heterozygosity on the X chromosome in IIIM males relative to XY males (P = 10−4) even though IIIM males have the XX genotype. We also observe elevated heterozygosity on the third chromosome and X chromosome in IIIM males relative to females of the same strain, but not on the X chromosome in XY males relative to XX females (Supplemental Fig S8). These results further support our conclusion that house fly Y chromosome genes are not differentiated from X chromosome genes. In contrast, the IIIM chromosome harbors evidence that it is partially differentiated from the non-M-bearing third chromosome.
3. Discussion
We found very little evidence for differentiation between the X and Y chromosomes in house fly, despite the fact that House fly has a karyotype that resembles the ~100 my old ancestral brachyceran karyotype (Boyes et al., 1964; Foster et al., 1981; Weller and Foster, 1993; Vicoso and Bachtrog, 2013). There are few sequences unique to the X or Y (Figs 1, 2, & 3), little evidence for differential abundance of specific sequences on the X and Y (Fig 5, but see Fig 4), and no elevated heterozygosity within X chromosome genes in males (Fig 6). We conclude that the house fly X and Y chromosomes do not contain different genes (other than M), which is consistent with previous experiments that failed to identify sex-linked markers and found that XX, XO, and YO flies are viable and fertile (Bull, 1983; Hediger et al., 1998a; Hamm et al., 2015). Additionally, XY males have equal or greater expression of X chromosome genes when compared to XX (IIIM) males (Meisel et al., 2015), providing further evidence that XY males do not have a haploid X chromosome dose. In summary, our results suggest that the house fly Y chromosome is an ancestral brachyceran X chromosome that very recently acquired an M factor.
3.1 X-Y dierentiation in house fly
We failed to identify the male-determining M factor on the house fly Y chromosome. Previous in situ hybridizations of chromosomal dissections to mitotic chromosomes detected Y-specific, but not X-specific, segments of the house fly genome (Hediger et al., 1998b). The sequences of these chromosomal segments are unknown, but they presumably include the yet to be identi ed M factor. We hypothesize that our failure to identify M and other Y-specific regions is because they are small relative to the rest of the chromosome and/or di cult to assemble using short sequencing reads.
Despite the lack of genic differentiation between the house fly X and Y chromosomes (other than M), there are morphological differences between the X and Y that can be identifed through cytological examination (Boyes et al., 1964). Our results suggest that some of the morphological differences between the X and Y chromosomes result from the differential abundance of particular sequences between X and Y rather than the extensive differentiation that characterizes ancient pairs of sex chromosomes (Fig 4). Similarly, cytological analyses previously characterized separate X chromosomes carrying M (XM) that were thought to be different from YM (Denholm et al., 1983; Cakir and Kence, 1996). Our results suggest that YM and XM chromosomes are morphological variants of the same neo-Y chromosome that arose when an ancestral X chromosome acquired an M factor. Such morphological or repeat content variation in Y chromosomes has been previously documented in Drosophila and other dipterans (Dobzhansky, 1935; Miller and Stone, 1962; Miller and Roy, 1964; Lyckegaard and Clark, 1989; Lemos et al., 2008; Hall et al., 2016).
3.2. Creation of house fly neo-Y chromosomes
We hypothesize that an M factor recently translocated to the house fly X chromosome to create a neo-Y, and the ancestral Y chromosome was lost from house fly populations (Fig 7). Alternatively, the house fly Y chromosome could have arisen through the fusion of the ancestral Y and X chromosomes. However, after an X-Y fusion, the neo-Y should retain ancestral Y-specific sequences, which we fail to detect. The IIIM chromosome is also a neo-Y that likely arose when an M factor translocated onto a standard third chromosome (Fig 7). Curiously, IIIM males have elevated heterozygosity in X chromosome genes relative to XX females and XY males (Figs 6 & S8). One possible cause of this elevated heterozygosity is that some X chromosome genes were translocated onto IIIM along with the M factor. The elevated X chromosome heterozygosity we detect in IIIM males would therefore be the result of those males being triploid for X chromosome genes. No nullo-X/Y flies carrying IIIM have been identi ed to our knowledge, suggesting that the X/Y chromosome contains some essential genes not translocated to IIIM. Additional work is necessary to determine the nature of the translocations that created the Y and IIIM chromosomes.
Our results are also suggestive of the order of events that created the Y and IIIM chromosomes. IIIM males have elevated heterozygosity on the third chromosome, whereas XY males do not have elevated heterozygosity on element F (Fig 6). This suggests that the Y chromosome is a younger neo-Y than IIIM, but there is an alternative explanation for the patterns of polymorphism. differentiation between nascent X and Y chromosomes is accelerated by suppressed recombination in XY males (Charlesworth, 1991; Rice, 1996; Charlesworth et al., 2005; Bachtrog, 2013). Lack of recombination can be an inherent property of male meiosis (as in Drosophila) or arise via Y chromosome inversions that suppress crossing over between the X and Y. There is evidence for male recombination in house flies (Feldmeyer et al., 2010), suggesting that chromosomal inversions would be required for recombination suppression between the X and Y. If the IIIM chromosome carries inversions and the Y chromosome does not, then the elevated heterozygosity in IIIM males but not XY males could be a result of inversions accelerating the rate of divergence between the IIIM chromosome and the standard third chromosome (Navarro et al., 2000). Additional sequencing of XY and IIIM males is needed to test for inversions.
3.3. Translocating sex-determining loci can cause sex chromosome recycling and create cryptic neo-sex chromosomes
Our results provide the first evidence, to our knowledge, of the conversion of an existing X chromosome into a Y chromosome (or Z into W), recycling a differentiated sex chromosome pair into nascent sex chromosomes without any evidence of fusion to an autosome. In comparison, most previously documented sex chromosome transitions involved autosomes transforming into sex chromosomes through either the evolution of a novel sex determining locus on the autosome or a fusion of the autosome with a sex chromosome (e.g., Patterson and Stone, 1952; Steinemann and Steinemann, 1998; Filatov et al., 2000; Liu et al., 2004; Veyrunes et al., 2004; Carvalho and Clark, 2005; Vallender and Lahn, 2006; Ross et al., 2009; Vicoso and Bachtrog, 2013; Bachtrog et al., 2014; Beukeboom and Perrin, 2014; Vicoso and Bachtrog, 2015). There are other examples of sex chromosome transformations involving only X, Y, Z, and W chromosomes (i.e., no autosomes) in platyfish, Rana rugosa, and Xenopus tropicalis (Kallman, 1984; Miura, 2007; Roco et al., 2015). These X/Y/Z/W transformations in sh and frogs, however, involve nascent sex chromosomes, not ancient sex chromosomes as in house fly. Moreover, the sex chromosome transitions in platyfish, R. rugosa, X. and tropicalis all involve a change in the heterogametic sex (i.e., XY males to ZW females, or vice versa), whereas the house fly X and Y chromosomes did not switch to a Z and W.
We hypothesize that the X-to-Y conversion in house fly occurred because the male-determining locus translocated from the Y to the X. Translocating sex determining loci are rare and do not typically include long-established sex chromosomes (Traut and Willhoeft, 1990; Woram et al., 2003; Faber-Hammond et al., 2015), suggesting that X-to-Y (or Z-to-W) conversion similar to house fly may not be observed in other taxa. However, there is rampant gene tra c to and from long-established Y chromosomes (Koerich et al., 2008; Hughes et al., 2015), providing a possible mechanism for the Y-to-X (or W-to-Z) translocation of a sex determining locus in other taxa even if the sex determiner does not exhibit a high rate of translocation on its own. The fact that the neo-Y chromosome in house fly remained undetected despite decades of work on this system (Dübendorfer et al., 2002) suggests that X-to-Y transitions may have occurred in other taxa and remain cryptic because the karyotype has remained unchanged.
4. Methods
4.1. Fly strains
We used five house fly strains to identify X-and Y-linked sequences. One strain, Cornell susceptible (CS), has been reported to have X/X; IIIM/III males (Scott et al., 1996; Hamm et al., 2005; Meisel et al., 2015). The other four strains have previously been characterized as having males with the XY karyotype: aabys, A3, LPR, and CSaY. The genome strain, aabys, has recessive phenotypic markers on each of the five autosomes (chromosomes I–V and had been cytologically determined to have XY males (Wagoner, 1967; Tomita and Wada, 1989; Scott et al., 2014). The A3 strain was generated by crossing XY males from a pyrethroid-resistant strain (ALHF) with aabys females (Liu and Yue, 2001). The LPR strain is a pyrethroid-resistant strain that was previously determined to have XY males (Scott and Georghiou, 1985; Scott et al., 1996). Finally, the CSaY strain was created by crossing aabys males (XY) with CS females, and then backcrossing the male progeny to CS females to create a strain with the aabys Y chromosome on the CS background (Meisel et al., 2015). We validated that the M factor is not on an autosome in the A3, LPR, and CSaY strains by crossing males of each strain to aabys females, and then we backcrossed the male progeny to aabys females. We did not observe sex-linked inheritance of any of the aabys phenotypic markers, confirming that the M factor is not on chromosomes I–V in A3, LPR, or CSaY. Females of all strains were expected to be XX.
4.2. Genome sequencing, mapping, and assembly
The house fly genome consortium sequenced, assembled, and annotated the genome using DNA from female flies of the aabys strain, a line with XX females and XY males (Scott et al., 2014). The annotation includes both predicted genes and inferred homology relationships with D. melanogaster genes, and we used the orthology calls from annotation release 100 (version 2.0.2) to assign house fly genomic scaffolds to chromosome arms using a majority rule as described previously (Meisel et al., 2015). Briefly, scaffolds were assigned to a Muller element if the majority of genes on the scaffold with 1:1 D. melanogaster orthologs have orthologs on the same D. melanogaster element. In total, 62 house fly genes have 1:1 D. melanogaster orthologs on Muller element F, which amounts to 3/4 of the ~80 genes on Drosophila element F (Leung et al., 2010). We used these 1:1 orthologs to assign seven house fly scaffolds to element F (the X chromosome), and those seven scaffolds contain 51 genes. We repeated all of the analyses described in the Results using only genes with 1:1 D. melanogaster orthologs and obtained qualitatively similar results as when we used scaffold-level Muller element assignments.
We sequenced genomic DNA (gDNA) from aabys male and female heads with 150 bp paired-end reads on an Illumina NextSeq 500 at the University of Houston genome sequencing core. Three replicate libraries of each sex were prepared using the Illumina TruSeq DNA PCR-free kit, and the six libraries were pooled and sequenced in a single run of the machine. We also sequenced gDNA from three replicates of male and female heads from A3 and LPR flies (12 samples total) in a single run on the NextSeq 500 using 75 bp paired-end reads. For each of the 18 sequencing libraries, DNA was extracted from separate pools of fly heads using the QIAGEN DNeasy Blood & Tissue Kit. Illumina sequencing reads were mapped to the assembled house fly genome using BWA-MEM with the default parameters (Li and Durbin, 2009; Li, 2013), and we only included uniquely mapping reads where both ends of a sequenced fragment mapped to the same scaffold in the reference genome. Reads that failed to meet these criteria were considered unmapped for the male genome assembly described below. Mapping statistics are presented in Supplemental Table S1.
We additionally assembled the reads from aabys male samples using SOAPdenovo2 (Luo et al., 2012) and ABySS (Simpson et al., 2009) to construct a reference genome that contains Y-linked sequences. Mapping our sequence data to the reference genome revealed that our average insert size was 370 bp (Supplemental Fig S9), which was used as a parameter in the SOAPdenovo2 genome assembly, along with a pair number cutoff 3 and a minimum alignment length 32 bp. For the ABySS assembly we used a k-mer pair span (k) of 64. We also assembled a genome from only male reads that did not align to the female genome reference assembly using SOAPdenovo2 (Luo et al., 2012). For downstream analyses, we only retained scaffolds with a length ≥1000 bp in each assembly. Assembly statistics are presented in Supplemental Table S2.
4.3. Identifying X-and Y-linked sequences
We used four differential coverage approaches to identify candidate X-and Y-linked sequences in the house fly genome. The first approach identifies X-linked genes or sequences by testing for 2-fold higher abundance in females relative to males (Vicoso and Bachtrog, 2013). To do this, we used DESeq2 to calculate the log2 relative coverage within individual genes and 1 kb windows between the three male and female derived libraries (Love et al., 2014). We also used DESeq2 to calculate P-values for differential coverage between females and males.
The second approach was used to identify Y-linked sequences by searching for scaffolds in the male genome assembly that are missing from the female sequencing reads. We only considered assembled scaffolds from the male genome that were ≥1 kb. We implemented a k-mer comparison approach to identify male-specific sequences (Carvalho and Clark, 2013). In our implementation, we used a k-mer size of 15 bp, used the male sequencing reads to construct a validating bit-array, and implemented the options described by Carvalho and Clark (2013) for identifying Y-linked sequences in Drosophila genomes (Supplemental Methods S1 & S2).
In the third approach, we analyzed gDNA sequencing reads from aabys males and females to identify k-mers with sexually dimorphic abundances. We used the k-Seek method to count the abundance of 2-10mers in the three male and three female aabys sequencing libraries (Wei et al., 2014). We normalized the k-mer counts by multiplying the count by the length of the k-mer and dividing by the number of reads in the library.
The fourth approach identifies nascent sex chromosomes because they have elevated heterozygosity in the heterogametic sex (Vicoso and Bachtrog, 2015). We implemented this approach using both gDNA-and mRNA-Seq data. For the gDNA-Seq, we used the Genome Analysis Toolkit (GATK), following the best practices provided by the software developers (McKenna et al., 2010). Starting with the male and female mapped reads from the aabys strain described above, we identified duplicate reads. Insertions and deletions (indels) were identified and realigned using RealignerTargetCreator and IndelRealigner, respectively. We then called variants in each of the six aabys sequencing libraries using HaplotypeCaller, and we selected the highest quality SNPs and indels using SelectVariants and VariantFiltration (for SNPs: QD < 2, MQ < 40, FS > 60, SOR > 4, MQRankSum < −12.5, ReadPos-RankSum < −8; for indels: QD < 2, ReadPosRankSum < −20, FS > 200, SOR > 10). The high quality SNPs and indels were next used for recalibration of the base calls with BaseRecalibrator and PrintReads. The process of variant calling and base recalibration was performed three times, at which point there were no bene ts of additional base recalibration as validated with AnalyzeCovariates. We next used the recalibrated reads from all three replicates of each sex to call variants in males and females using HaplotypeCaller with emission and calling confidence thresholds of 20. We filtered those variants using Variant-Filtration with a cluster window size of 35 bp, cluster size of 3 SNPs, FS > 20, and QD < 2. We used the variant calls to identify heterozygous SNPs within genes using the coordinates from the genome sequencing project (Scott et al., 2014). An example script with our SNP calling pipeline is available in Supplemental Methods S3.
When we implemented the GATK pipeline for variant calling of the mRNA-Seq data (accession: GSE67065; Meisel et al., 2015), we used STAR to align reads from 6 XY male libraries and 6 IIIM male libraries separately (Dobin et al., 2013). After aligning reads to the reference genome, we used the aligned reads to create a new reference genome index from the inferred spliced junctions in the first alignment, and then we performed a second alignment with the new reference. We next marked duplicate reads and used SplitNCigarReads to reassign mapping qualities to 60 with the ReassignOneMappingQuality read lter for alignments with a mapping quality of 255. Indels were realigned and three rounds of variant calling and base recalibration were performed as described above for the gDNA-Seq data. We applied GenotypeGVCFs to the variant calls from the 2 strains for joint genotyping of all samples, and then we used the same filtering parameters as used in the gDNA-Seq to extract high quality SNPs and indels from our variant calls.
5. Data Access
All sequence data have been submitted to GenBank under accession PRJNA342472.
Supplemental Figures and Tables
6. Acknowledgements
This project was initiated during discussions with Andy Clark and Rob Unckless, who provided valuable comments throughout the completion of this work. Jeff Scott kindly supplied the A3 and LPR flies. Illumina sequencing was performed by the University of Houston Sequencing Core, with the assistance of Yinghong Pan and Utpal Pandya. Computational analyses were performed at the University of Houston Center for Advanced Computing and Data Systems, with some assistance from Adrian Garcia and Shuo Zhang. We thank Erin Kelleher for feedback on the preparation of this manuscript. This work was supported by start-up funds from the University of Houston.