Widespread loss of genes on the Y is considered a hallmark of sex chromosome differentiation. Here we show that the initial stages of Y evolution are driven by massive amplification of distinct classes of genes. The neo-Y chromosome of Drosophila miranda initially contained about 2700 protein-coding genes, but has gained over 3200 genes since its formation about 1.5 MY ago, primarily by amplification of proteins ancestrally present on this chromosome. We show that distinct evolutionary processes account for this drastic increase in gene number on the Y. Testis-specific and dosage sensitive genes are amplified on the Y to increase male fitness. A distinct class of meiosis-related multi-copy Y genes is co-amplified on the X, and their expansion is likely driven by conflicts over segregation. Co-amplified X/Y genes are highly expressed in testis, enriched for meiosis and RNAi functions, and are frequently targeted by short RNAs in testis. This suggests that their amplification is driven by X vs. Y antagonism for increased transmission, where sex chromosome drive suppression is likely mediated by sequence homology between the suppressor and distorter, through RNAi mechanism. Thus, newly emerged sex chromosomes are a battleground for sexual and meiotic conflict.
Introduction
Sex chromosomes have originated multiple times from ordinary autosomes (Bachtrog et al. 2011). After suppression of recombination, sex chromosomes evolve independently and differentiate (Charlesworth 1978). The complete lack of recombination on Y chromosomes renders natural selection inefficient, and Y evolution is characterized by a loss of the majority of its ancestral genes (Bachtrog 2013). Indeed, old Y chromosomes of various species contain only few functional genes, and Y chromosomes instead have accumulated massive amounts of repetitive DNA (Bellott et al. 2017; Mahajan and Bachtrog 2017).
Studies of Y chromosomes are greatly hindered by a lack of high-quality reference sequences, and the Y chromosomes of only a handful of mammalian species have been fully sequenced (Hughes et al. 2005; Soh et al. 2014). To date, no high-quality sequences of young Y chromosomes have been examined.
Drosophila miranda has a pair of recently formed neo-sex chromosomes that originated ∼1.5MY ago, and has served as a model to study the initiation of sex chromosome differentiation (Bachtrog and Charlesworth 2002). The neo-sex chromosomes of D. miranda are still homologous over much of their length, and a previous genomic analyses using Illumina short reads confirmed the notion that genes on the Y are rapidly lost (Zhou and Bachtrog 2012). About ½ of the roughly 2700 genes ancestrally present on the neo-Y were found to be pseudogenized, and 100s of genes were entirely missing (Zhou and Bachtrog 2012). However, the high level of sequence similarity between the X and Y, yet drastic accumulation of repeats, prevented assembling the Y chromosome.
We recently generated a high-quality sequence assembly of the neo-Y chromosome of D. miranda using single-molecule sequencing and chromatin conformation capture (Mahajan et al. 2018). Instead of simply shrinking, our assembly revealed that the young neo-Y chromosome dramatically increased in size relative to the neo-X by roughly 3-fold: we assembled 110.5 Mb of the fused ancestral Y and the neo-Y chromosome (Y/neo-Y sequence), and 25.3 Mb of the neo-X (Mahajan et al. 2018). Most of this size increase is driven by massive accumulation of repetitive sequences, and over 50% of the neo-Y derived sequence is composed of transposable elements (Mahajan et al. 2018). Here, we carefully annotate the neo-sex chromosomes, using transcriptomes from multiple tissues, and short RNA profiles, to study the evolution of gene content on this recently formed neo-Y chromosome.
A catalogue of genes and transcription units on the neo-Y
With a comprehensive high-quality reference sequence of the neo-Y chromosome of D. miranda, we systematically catalogue its genes. Automatic annotation of the highly repetitive Y/neo-Y chromosome is challenging, and we used extensive manual curation to validate and correct our gene models (see Methods). Comparison of the neo-sex gene content with that of D. pseudoobscura, a close relative where this chromosome pair is autosomal, allows us to infer the ancestral gene complement and reconstruct the evolutionary history of gene gain and loss along the neo-sex chromosomes (Figure 1A). We bioinformatically identified and manually examined all Y/neo-Y genes that were not simple 1:1 orthologs between species and chromosomes. In total, we identified 6,448 genes on the neo-Y, and 3,253 genes on the neo-X, compared to 3,087 genes on the ancestral autosome that gave rise to the neo-sex chromosome. Thus, contrary to the paradigm that Y chromosomes undergo genome-wide degeneration, our analysis reveals a dramatic increase in the number of annotated genes on the neo-Y, compared to its ancestral gene complement.
Overall, we find 1,736 ancestral single-copy orthologs between the neo-sex chromosomes, i.e. roughly 56% of genes are conserved. Furthermore, we find 143 genes that are present in D. pseudoobscura and the neo-X of D. miranda, but are missing from our neo-Y annotation, and we fail to detect a homolog on the neo-Y by BLAST, i.e. 5% of genes ancestrally present are completely absent on the neo-Y. On the other hand, only 17 genes (0.5% of genes ancestrally present) are absent from the neo-X, a rate of gene loss comparable to autosomes and the ancestral X (Table S1). Thus, the neo-Y is indeed losing its ancestral genes at a high rate, consistent with theoretical expectation (Charlesworth and Charlesworth 2000; Bachtrog 2013) and empirical observations of gene poor ancestral Y chromosomes (Hughes et al. 2005; Soh et al. 2014; Bellott et al. 2017; Mahajan and Bachtrog 2017).
Intriguingly, however, for 457 unique D. pseudoobscura genes, we find multiple copies in our neo-Y chromosome annotation; 1,697 genes from 363 distinct proteins are single-copy (or missing) on the neo-X and amplified on the neo-Y (two genes of which were gained from Muller E; Table S2), and 2,036 Y genes from 94 distinct proteins that co-amplify on the X/neo-X (and harbor a total of 647 copies on the X/neo-X; Table S3, Figure S1). Thus, genes on the Y/neo-Y fall into distinct categories (Figure 1B), and we refer to them as single-copy ancestral Y genes, multi-copy Y genes (which are single-copy on the X/neo-X), and co-amplified X/Y genes. Genes whose ancestral location could not be determined, or genes with more complex evolutionary histories were not further analyzed (see Methods).
Many amplified gene copies are fragmented, and some have premature stop codons or frame shift mutations (Table S2, S3). We find full-length copies for 786 amplified Y genes (46%), 776 co-amplified Y-genes (38%), and 300 of co-amplified X genes (46%). Thus, even if ignoring partial gene copies (which may nevertheless have function as non-coding transcripts, see below), we still find considerably more genes on the neo-Y compared to the neo-X or the ancestral autosome.
Transcriptome analysis shows that most individual gene copies of amplified gene families on the Y/neo-Y are expressed, both for partial genes and full-length transcripts. We detect expression of 71% of individual copies among multi-copy Y genes, and 94% for co-amplified X/Y genes, which suggests that many gene copies may indeed be functional (Table S4).
Different evolutionary processes drive amplification of Y genes
What may drive massive gene amplification on the neo-Y chromosome? Y chromosomes are subject to unique evolutionary forces, and fundamentally different processes appear to cause gene family expansion of multi-copy Y genes versus co-amplified X/Y genes (Figure 1C). The high repeat content on the Y makes this chromosome particularly vulnerable to accumulate multi-copy genes for various reasons. Repetitive sequences provide a substrate for non-allelic homologous recombination and thereby promote gene family expansion (Konkel and Batzer 2010). Indeed, we find several cases where repeats appear to have contributed to gene duplications, on both the X and Y chromosome (Figure S2). Spreading of heterochromatin from repeats globally dampens expression of neo-Y genes (Zhou et al. 2013), and multi-copy gene families may simply be more tolerated on the neo-Y (although many individual gene copies are transcribed, Table S4).
Gene family expansions on the Y can also be beneficial for males. Global transcription is lower from the neo-Y chromosome, and drives the evolution of dosage compensation of homologous neo-X genes (Ellison and Bachtrog 2013). Gene amplification may help compensate for reduced gene dose of neo-Y genes, especially if their neo-X homologs are not yet dosage compensated. Additionally, Y chromosomes are transmitted from father to son, and are thus an ideal location for genes that specifically enhance male fitness (Rice 1984). Y chromosomes of several species, including humans, have been shown to contain multi-copy gene families that are expressed in testis and contribute to male fertility (Skaletsky et al. 2003; Cortez et al. 2014; Bellott et al. 2014).
We find a handful of multi-copy Y gene families that have dozens of gene copies on the Y (six gene families have more than 30 copies, and 14 gene families have >15 copies; Figure 2B, Figure S3), while the vast majority of multi-copy Y gene families only have a few copies (90% of multi-copy Y gene families have fewer than 4 copies). Gene expression and chromatin analysis suggest that different evolutionary forces drive the accumulation of low versus high-copy number multi-copy Y gene families.
Multi-copy Y gene families with a high copy number (i.e. >15 copies) are expressed almost exclusively in testis (Figure 2A, Figure S3), mimicking patterns of gene family amplification of male fertility genes found in other species (Skaletsky et al. 2003; Bellott et al. 2010; Cortez et al. 2014). Their neo-X homologs, in contrast, are expressed predominantly in ovaries (Figure S3), and sex-linkage may have enabled neo-Y and neo-X gametologs to specialize in their male-and female specific function, respectively (Rice 1984). Most multi-copy Y gene families, in contrast, only have few copies and are ubiquitously expressed (Figure 2A). Consistent with gene dosage contributing to increased copy number on the Y, we find that the neo-X homologs of multi-copy Y genes are less likely to be dosage compensated compared to single-copy Y genes (Figure 2C, p-value Fisher’s exact test = 0.007). In particular, male Drosophila achieve dosage compensation by recruiting the MSL-complex to their hemizygous X chromosome (Lucchesi and Kuroda 2015), and we find that neo-X homologs of multi-copy neo-Y genes are less likely to be targeted by the MSL complex than the neo-X homologs of single-copy neo-Y genes (Figure 2C). This suggests that multi-copy Y genes are dosage sensitive, and additional gene copies on the Y contribute to dosage compensation.
X/Y co-amplified genes reveal ongoing conflict over sex chromosome transmission
Genes that co-amplify on the X and Y chromosome, on the other hand, show testis-biased expression, independent of copy number (Figure 2D). Functional enrichment patterns and short RNA expression profiles suggest that fundamentally different forces drive their co-amplification on the X and Y chromosome (Figure 3). In particular, Y chromosomes compete with the X over transmission to the next generation (Frank 1991; Jaenike 2003), and sex chromosomes may try to cheat fair meiosis to bias their representation in functional sperm (i.e. meiotic drive). Meiotic drive on sex chromosomes, however, reduces fertility and distorts population sex ratios (Jaenike 2003), and creates strong selective pressure to evolve suppressors to silence selfish drivers. The RNAi pathway has been implicated to mediate suppression of sex chromosome drive in Drosophila (Tao et al. 2007a; b; Lin et al. 2018).
Overall, we identify 2683 co-amplified genes on the neo-sex chromosomes of D. miranda from 94 distinct proteins (2036 genes amplified on the Y/neo-Y, and 647 on the X/neo-X). Co-amplified X and Y-linked gene copies are typically both highly expressed in testis (Figure 2D, Figure 3D; Figure S3, S4). Testis expression of co-amplified X-linked genes is unusual, as testis-genes in Drosophila normally avoid the X chromosome (Sturgill et al. 2007), but can be understood under intragenomic conflict models (Tao et al. 2007a; b; Mueller et al. 2008; Meiklejohn and Tao 2010; Mueller et al. 2013). In particular, an X-linked gene involved in chromosome segregation may evolve a duplicate that acquires the ability to incapacitate Y-bearing sperm (Figure 1C). Invasion of this sex-ratio distorter skews the population sex ratio and creates a selective advantage to evolve a Y-linked suppressor that is resistant to the distorter. Suppression may be achieved at the molecular level by increased copy number of the wildtype function or by inactivation of X-linked drivers using RNAi (Tao et al. 2007a; b; Lin et al. 2018). If both driver and suppressor are dosage sensitive, they would undergo iterated cycles of expansion, resulting in rapid co-amplification of both driver and suppressor on the X and Y chromosome (Jaenike 2003)
Consistent with ongoing conflicts over chromosome segregation driving co-amplification of X/Y genes, we find that many of the most highly co-amplified genes have well-characterized functions in meiosis (Figure 2E, Table S3), and are ancestrally expressed in gonads (Figure 3C, Figure S5). Gene ontology (GO) analysis reveals that co-amplified X/Y genes are significantly overrepresented in biological processes associated with meiosis and chromosome segregation (Figure 3A, 3B). In particular, multi-copy Y genes are enriched for GO categories including “nuclear division”, “spindle assembly”, “meiotic spindle midzone assembly”, “DNA packaging”, “chromosome segregation”, or “male gamete generation” (see Table S3). Among the most highly co-amplified X/Y genes are wurstfest (145 copies on the Y and 5 on the X), a gene involved in spindle assembly in male meiosis I; mars (48 Y-linked copies and 6 X-linked copies), a gene involved in kinetochore assembly and chromosome segregation, orientation disruptor(18 Y-linked copies and 5 X-linked copies), a chromosome-localized protein required for meiotic sister chromatid cohesion, or Subito(8 Y-linked copies and 11 X-linked copies), a gene required for spindle organization and chromosome segregation in meiosis (Figure 2E, Figure 4, see Table S3for additional genes). These important meiosis genes are typically single-copy and highly conserved across insects, but highly co-amplified on the recently evolved D. miranda X and Y chromosome.
Additionally, our GO analysis reveals an overrepresentation of co-amplified X/Y genes associated with piRNA metabolism and the generation of small RNA’s (Figure 3A, 3B). Again, this is expected under recurring sex chromosome drive where silencing of distorters is achieved by RNAi, since compromising the short RNA pathway would release previously silenced drive systems (Lin et al. 2018). Noteworthy genes in the RNAi pathway that are typically single-copy but co-amplified on the X and Y include Dicer-2(26 Y-and 6 X-linked copies), a double-stranded RNA-specific endonuclease that cuts long double-stranded RNA into siRNAs, cutoff (7 Y-and 9 X-linked copies), a gene involved in transcription of piRNA clusters, or shutdown (50 Y-and 22 X-linked copies), a co-chaperone necessary for piRNA biogenesis (Figure 2E, Figure 4, Table S3). Thus, functional enrichment strongly supports meiotic conflict driving co-amplification of X/Y genes.
We gathered stranded RNA-seq and small RNA profiles from wildtype D. miranda testis, to obtain insights into the molecular mechanism of sex chromosome drive. Consistent with meiotic drive and suppression through RNAi mechanisms causing co-amplification of X/Y genes, we detect both sense and antisense transcripts and short RNA’s derived from the vast majority of co-amplified X/Y genes (Figure 3D, Figure 4). Globally, we find that co-amplified Y genes show significantly higher levels of endo-siRNA production than single-copy Y genes, or multi-copy Y genes (Figure 4A; p-value < 10−16, Table S5). Likewise, siRNA levels are higher for co-amplified X linked genes, compared to single-copy X genes, or X homologs of multi-copy Y genes (Figure 4A; p-value < 10−16, Table S5). Targeting of co-amplified X/Y genes by short RNA’s in testis demonstrates that siRNA production is not simply a consequence of the repeat-rich environment of the neo-Y but instead a property of co-amplified X/Y genes. This is consistent with cryptic sex chromosome drive having repeatedly led to characteristic patterns of gene amplification of homologous genes on both the X and the Y chromosomes that are targeted by short RNAs.
Conclusions
Contrary to the paradigm that Y chromosomes undergo global degeneration, we document a high rate of gene gain on the recently formed neo-Y chromosome of D. miranda, mainly through amplification of protein-coding genes that were ancestrally present on the autosome that created the neo-Y. Our comparative genomic analysis reveals different types of amplified Y genes: multi-copy genes exclusive to the Y, and genes that are co-amplified on the X and Y, and we show that their acquisition likely is driven by different selective pressures. Multi-copy Y genes presumably increase male fitness, and come in two flavors; they are either selected and amplifying on the Y because of their testis-specific function, and these male-beneficial genes often have dozens of copies on the Y. Their neo-X homologs are often expressed in ovaries, and sex linkage may have allowed these former homologs to specialize in their sex-specific roles (Rice 1984). Ubiquitously expressed housekeeping genes also duplicate on the Y, presumably to mitigate gene dose deficiencies of partially silenced neo-Y genes; these genes are present at a much lower copy number, and are targeted less often by the dosage compensation complex on the X.
Co-amplified X/Y genes are highly expressed in testis and often have functions in chromosome segregation and RNAi, and we speculate that their parallel amplification on the X and Y is a result of ongoing X-Y interchromosomal conflicts over segregation. Sequence homology between putative drivers and their suppressors on the sex chromosomes, and their widespread targeting by endo-siRNAs suggests that RNAi mechanisms are involved in silencing rampant sex chromosome drive. While we cannot determine the sequence of evolutionary events with certainty, the X chromosome is a priori more likely to acquire segregation distorters, creating strong selection to evolve suppressors on the Y. On one hand, natural selection is impaired on the non-recombining Y (Bachtrog 2013), making drivers more likely to originate on the X. Additionally, the heterochromatic nature of a Y chromosome may render it especially vulnerable to be exploited by selfish elements during meiosis (Helleu et al. 2016). If amplified Y genes are involved in a battle with the X over fair transmission, changes in gene copy number may tip the balance over inclusion into functional sperm, and drive repeated co-amplification of distorters and suppressors on the sex chromosomes.
Rampant sex chromosome drive can have important evolutionary consequences. Strong selective pressure to amplify Y-linked suppressors of meiotic drive may indirectly account for the ongoing degeneration of most ancestral genes on the Y chromosome, and ultimately its complete decay. Specifically, since the Y chromosome lacks recombination, strong positive selection for meiotic drive suppressors can propel linked deleterious mutations to fixation(Charlesworth and Charlesworth 2000). Thus, the decay of ancestral Y genes may be a by-product of silencing recurrent meiotic drivers arising on the X. Patterns of molecular variation are suggestive of episodes of recurrent positive selection shaping neo-Y evolution of D. miranda (Bachtrog 2004), and natural lines of D. miranda show a wide range of sex-ratio bias (with typically female-biased sex ratios (Dobzhansky 1935)). These observations are consistent with recurrent and ongoing conflicts over segregation affecting the genomic architecture of sex chromosomes in this species.
Genetic conflict between X-Y ampliconic genes may also contribute to hybrid sterility and consequent reproductive isolation (Frank 1991; Hurst and Pomiankowski 1991). Segregation distortion can result in male hybrid sterility in Drosophila (Phadnis and Orr 2009), and further functional characterization of co-amplified, lineage-specific X-Y gene families will be needed to test the proposed link between X-Y genetic conflict and hybrid sterility.
X-Y interchromosomal conflict, and its consequent impact on gene amplification on sex chromosomes, may be widespread. In both human and mouse—two species with high-quality reference sequences for both sex chromosomes—the X and Y have co-acquired and amplified genes, and in both cases, meiotic drive has been invoked to explain this co-amplification (Lahn and Page 2000; Cocquet et al. 2009; 2012; Soh et al. 2014). Co-amplified genes have also been found in D. melanogaster (Balakireva et al. 1992), and RNAi mechanisms have been shown to mediate suppression of sex ratio drive in flies (Tao et al. 2007a; b; Lin et al. 2018). Highly amplified gene families have been detected in other mammals (Murphy et al. 2006) and across fruit flies (Ellison et al.), suggesting that sex chromosome drive may be prevalent in evolution; to determine the true phylogenetic range of lineage-specific acquisition and amplification of X-Y genes, high-quality sex chromosome assemblies across more taxa are needed.
Materials and methods
De novo transcriptome assembly
To mask repeats in the genome, we used RepeatMasker (Smith et al.) with custom de novo repeat libraries, generated using RepeatModeler (Smith and Hubley) and Repdenovo (Chu et al. 2016), along with the Drosophila repeat library from Repbase (Bao et al. 2015). Paired end RNA-seq reads from several male and female tissues (heads, carcass, whole body, testis, ovary, accessory gland, spermatheca, 3rd instar larvae) were then aligned to the repeat-masked genome using HiSat2 (Kim et al. 2015) with the ––dta parameter on default settings. The resulting alignment file was used to assemble the transcriptome using the software StringTie (Pertea et al. 2015) with default parameters. Fasta sequences of the transcripts were extracted from the gtf output produced by StringTie using the gffread utility.
Gene annotation with Maker
We ran Maker (Campbell et al. 2014) three times to iteratively build and improve the gene annotation of the neo-sex chromosomes. For the first Maker run, we used annotated protein sequences for D. melanogaster and D. pseudoobscura from flybase.org, our de novo assembled D. miranda transcripts from the previous section, and the gene predictors Augustus (Stanke and Waack 2003) and SNAP (Korf 2004) to get the initial set of predictions. The parameters est2genome and protein2genome were set to 1 to allow Maker to directly build gene models from the transcript and protein alignments, and we used the Augustus fly gene model and the SNAP D. melanogaster hmm file for this first run.
The predictions from the first round were then used to train Augustus using BUSCO (Simão et al. 2015) and also to train SNAP. The new Augustus gene model and SNAP hmm file were then used during the second Maker run, with the parameters est2genome and protein2genome set to 0. The maximum intron size was increased to 20000bp (default 10000bp). The results from the second round were then used to train Augustus and SNAP again, before the final round of Maker. This process resulted in a total of 21,524 annotated genes in D. miranda.
Orthology detection
Transcript sequences for D. pseudoobscura were downloaded from flybase.org and only the largest transcript per gene was retained for downstream analyses. De novo annotated D. miranda transcripts were then aligned to this filtered D. pseudoobscura transcript set using BLAST (Camacho et al. 2009). Alignments with percentage identity <60% were discarded and the best alignment was calculated based on the e-value, score, % identity and alignment lengths. Each D. miranda transcript was thus assigned the ortholog that was its best BLAST hit. We identified paralogous genes in the D. pseudoobscura genome as those for which at least 80% of the sequence of one aligned to the other and vice versa. Paralogous genes in the D. miranda–D. pseudoobscura orthology calls were replaced by a single gene name from the duplicated gene family.
Identifying multicopy genes
The gene annotation produced by Maker had roughly 2,500 more genes annotated on the Y/neo-Y compared to the neo-X, and hundreds of genes had multiple annotated copies on the Y/neo-Y chromosome (and also the X chromosome and autosomes in some cases). Based on the orthology calls from BLAST, 822 Maker annotated genes had more than 2 copies on the Y/neo-Y, and 209 of those genes had more than two copies on both the X/neo-X and Y/neo-Y. In our initial Maker annotation, 366 genes were missing on the neo-Y and 155 genes were missing on the neo-X. However, closer inspection revealed that the annotation was often fragmented, especially on the Y/neo-Y chromosome, which led to an overestimation of the number of distinct genes that had duplicated, but subsequent BLAST searches also revealed that Maker often failed to annotate individual copies of gene families. On the other hand, several genes in the annotation were “chimeras”, where two genes were collapsed into one by Maker and thus one of the genes appeared to be missing from the gene annotation, if it got assigned to the other D. pseudoobscura gene during orthology assignment. Thus, the actual number of missing genes is much smaller than our initial Maker annotation suggested. We thus manually verified, and if necessary fixed, each gene model that was annotated by Maker, and inferred to be either duplicated on the Y/neo-Y or missing from the neo-X and/or Y/neo-Y annotations. We used nucmer (Kurtz et al. 2004) from the mummer package to individually align (one gene at a time) the sequences of their corresponding D. pseudoobscura orthologs to the D. miranda genome with the parameters --maxmatch and --nosimplify. Alignment coordinates were manually stitched together to get full gene coordinates. Only fragments that were at least 25% the length of the corresponding D. pseudoobscura ortholog were counted as duplicates/paralogs in the D. miranda genome. We also performed BLAST searches to identify the genes that had been lost from the neo-sex chromosomes.
In total, we annotate 6,448 genes on the neo-Y. Of these, 1,736 are ancestral single-copy Y genes (i.e. they were present on the ancestral autosome that formed the neo-Y). 1,105 of these genes were readily identified on both the neo-X and neo-Y by our Maker annotation, and are used as the single-copy orthology gene set in our analysis. 631 ancestral single-copy Y genes were initially missed or mis-qualified by our Maker annotation (i.e. 347 neo-Y genes were wrongly annotated as multi-copy by Maker, but our manual inspection revealed that they were present only as single-copy genes, and 114 neo-X genes and 170 neo-Y genes were missing from the Maker annotation, but found to be present on the neo-X and neo-Y, respectively, after manual checking using nucmer (Kurtz et al. 2004)).
In addition, we identify 457 genes (with 3,733 gene copies) that have become amplified on the neo-Y: 1,697 multi-copy Y genes (with a single copy on the neo-X), and 2,036 co-amplified neo-Y genes (which also amplified on the X/neo-X). In addition, we detect 959 genes in our Maker annotation that were not further considered. These “other” genes are comprised of 159 neo-Y genes that lack a homolog in D. pseudoobcura, 287 neo-Y genes that are present on an unknown location in D. pseudoobscura, 189 single-copy neo-Y genes that are present at multiple other locations in the genome (based on the Maker annotation), and 324 genes (from 49 unique proteins) with complicated mapping which could not be included in any categories of our analysis (i.e. genes for which the number of copies were ambiguous based on alignments such as for nested/overlapping genes; genes for which many alignments of variable identity were observed; genes which were amplifying on autosomes and the Y chromosome; chimeric genes).
Thus, after manual verification, we identified 94 genes that have co-amplified on the X/neo-X and the Y/neo-Y, with 647 copies on the X/neo-X and 2036 copies on the Y/neo-Y (and 58 copies on the autosomes). We also identified 363 genes that have only amplified on the Y/neo-Y chromosome, with a total of 1697 copies on the Y/neo-Y. Thus, the Y/neo-Y chromosome has gained at least 3200 gene copies.
We identified 17 genes that are present on chr3 in D. pseudoobscura but are missing on the neo-X in D. miranda; 6 of those genes are found on other chromosomes in the D. miranda genome and 6 are still present on the Y chromosome in D. miranda. We identified 143 genes that are present on chr3 in D. pseudoobscura but absent on the neo-Y in D. miranda and 138 of those are still present on the neo-X in D. miranda. However 5 genes have been lost from both the neo-X and neo-Y chromosomes and BLAST searches failed to identify other chromosomal locations that those genes could have moved to.
Gene expression analysis
Kallisto (Bray et al. 2016) was used to quantify gene expression and calculate TPM values for each gene in our annotation using several male and female tissues (whole body, carcass, 3rd instar larvae, gonads, spermatheca and accessory gland). The R function heatmap.2 from the gplots package (https://cran.r-project.org/web/packages/gplots/index.html) was used to plot heatmaps to visualize tissue-specific differences in gene expression. Each row in the heatmap is a different gene and the different columns represent different tissues. The heatmap was row-normalized to indicate the tissue with the highest expression for each gene.
GO analysis
GO analysis was done using GOrilla (Eden et al. 2009) and the D. melanogaster orthologs of the genes co-amplifying on the X and Y was used as the target set. The GO terms that were enriched and had a p-value less than 10-3 and fdr less than 0.05 were visualized using the software Revigo (Supek et al. 2011).
Analysis of smRNA and totalRNA data
Stranded total RNA paired end reads were mapped to the D. miranda genome using HiSat2 with default parameters and the ––rna-strandness parameter set to RF. The sam file was then filtered to obtain sense and antisense transcription estimates. Single end small RNA reads were aligned to the genome using bowtie2. BamCoverage from the deeptools package (Ramírez et al. 2014) was used to convert bam alignment files to bigwig format in both cases to be visualized using IGV. The number of small RNA reads mapping to each annotated gene was calculated using bedtools (Quinlan and Hall 2010) and these counts were divided by the gene/fragment length and boxplots were plotted in R for single copy Y genes, genes that have only amplified on the Y and genes that have co-amplified on the X and Y.
Author contributions
SM generated the genome assembly and annotation and performed the bioinformatics analysis. D.B. oversaw the project and wrote the manuscript with input from all authors.
Competing interests
The authors declare that no competing interests exist.
Table S1. Gene loss on the different Muller element’s of D. miranda
Table S2. Overview of multi-copy Y genes
Table S3. Overview of co-amplified X/Y genes
Table S4. Expression of individual copies of multi-copy Y genes and co-amplified X/Y genes
Table S5. Small RNA mapping from testis, for single copy X and Y genes, multi-copy Y genes and their X homologs, and co-amplified X and Y genes
Figure S1. Location of multi-copy Y genes and co-amplified X/Y genes
Figure S2. Repeats contribute to accumulation of multi-copy genes
Figure S3. Expression patterns of X-and Y-linked genes for A. single-copy X-and Y-linked genes; B. multi-copy Y-linked genes and their X homologs; C. co-amplified X and Y genes. Expression for individual gene copies is shown.
Figure S4. Expression patterns of X-and Y-linked genes for A. single-copy X-and Y-linked genes; B. multi-copy Y-linked genes and their X homologs; C. co-amplified X and Y genes. Total summed expression for all gene copies of a gene family is shown.
Figure S5. Tissues-specific expression patterns of orthologs of co-amplified X/Y genes in D. melanogaster.
Acknowledgements
Funded by NIH grants (R01GM076007, GM101255 and R01GM093182) to DB. We thank L. Gibilisco for generating short RNA libraries.