Abstract
Premise of the study Shifts in ploidy level will affect the evolutionary dynamics of genomes in a myriad of ways. Population genetic theory predicts that transposable element (TE) proliferation may follow because the genome wide efficacy of selection should be reduced and the increase in gene copies may mask the deleterious effects of TE insertions. However, to date the evidence of TE proliferation following an increase in ploidy is mixed, with some studies reporting results consistent with this scenario and others signs of genome downsizing.
Methods We used high-coverage whole genome sequence data to evaluate the abundance, genomic distribution, and population frequencies of TEs in the self-fertilizing recent allotetraploid Capsella bursa-pastoris, a species with prior evidence for genome-wide reductions in selection at the amino acid level since the transition to selfing. We then compared the C. bursa-pastoris TE profile with that of its two parental species, outcrossing C. grandiflora and self-fertilzing C. orientalis.
Key results We found no evidence that C. bursa-pastoris has experienced a large proliferation of TEs. Instead, the abundance, both overall and near genes, as well as the population frequencies of TEs, are intermediate to that of its two parental species C. grandiflora and C. orientalis.
Conclusions The lack of shift in TE profile beyond additivity expectations in C. bursa-pastoris can be because of variety of factors. In general, we argue that allopolyploid lineages that retain high outcrossing should provide a ‘perfect storm’ for TE proliferation, while highly selfing polyploids may generally experience TE loss.
Introduction
A central goal of population and comparative genomics research is to understand what factors drive the evolution of genome size and structure (Gregory, 2005; Lynch, 2007; Alföldi and Lindblad-Toh, 2013; Koenig and Weigel, 2015). Coupled with decades of information on genome size from flow cytometry across diverse lineages, the growing wealth of whole genome sequence data has revealed how extensive and rapidly genome size and structure can evolve, even among close relatives (Ungerer et al., 2006; Hawkins et al., 2009; Wright and Ågren, 2011; Tenaillon et al., 2011; Leitch and Leitch, 2013; Ågren and Wright 2015; Ågren et al., 2015). Yet our ability to explain this variation remains in its infancy.
Whole genome duplication via polyploidization has long been considered to be a major contributor to genome evolution (Adams and Wendel, 2005; Soltis and Soltis, 2012; Hollister 2015; Soltis et al. 2015). Most obviously, polyploidization will cause a direct increase in the total amount of DNA per cell. This initial doubling of genome size has often been followed by a process of diploidization, leading to a pattern of DNA loss over time (Leitch and Bennet, 2004; Lysak et al., 2009; Renny-Byfield et al., 2013; Vu et al., 2015). Moreover, although the direct role of recent polyploidy on genome size evolution can be investigated and controlled for, most plant species have experienced a history of whole genome duplication in their evolutionary history (Vision et al. 2000; Jaillon et al., 2007; Jiao et al., 2011; Vanneste et al., 2014). A lack of complete information on this history can therefore make it difficult to fully investigate the importance of whole genome duplication events on the evolution of genome size and structure.
In addition to the direct effect of ploidy on genome size, transposable element (TE) proliferation may follow whole-genome duplications due to the masking of deleterious insertions and a reduction of the efficacy of selection across the genome caused by genome redundancy (recently reviewed in Parisod and Senerchia, 2012; Tayalé and Parisod, 2013). Furthermore, host-mediated silencing of TEs may be disrupted in allopolyploids (where an increase in ploidy is due to interspecific hybridization; Madlung et al., 2002; 2005; Kraitshtein et al., 2010; Yaakov et al., 2011). This combination of relaxed selection and a breakdown of silencing mechanisms could potentially drive dramatic evolution of genome structure following whole genome duplication, from gene-dense euchromatic regions with very low TE content, to major accumulation of TEs in genic regions. Such a mechanism may explain, for example, the dramatic transposable element expansion in the maize genome following whole genome duplication (Schnable et al., 2009; Baucom et al., 2009; Diez et al., 2014). On the other hand, genome downsizing in polyploids may lead to a net loss of transposable elements during the process of diploidization (Parisod et al., 2010). Indeed, to date studies showing both increases in TE copy number, e.g. in Nicotiana tabacum (Petit et al., 2007; 2010; but see Renny-Byfield et al., 2011) as well as TE loss, e.g. in Orobanche gracilis (Kraitschtein et al., 2010) following allopolyploidization have been reported. Thus, overall, the factors driving proliferation vs. loss of transposable elements in polyploids remain poorly understood.
Here, we use high-coverage whole genome sequence data to evaluate the abundance, genomic distribution, and population frequencies of TEs in the self-fertilizing allotetraploid Capsella bursa-pastoris. Capsella bursa-pastoris is a recently derived allotetraploid, with population genomic evidence for genome-wide reduction in the strength of selection on point mutations due to both gene redundancy and its selfing mating system (Douglas et al. 2015), making it an interesting model to examine the early fate of transposable elements following allopolyploidization. We look for evidence of TE proliferation beyond what would be expected from additivity of its two parental diploid species. We discuss our results in light of the literature on the association between polyploidization, mating system, and TE abundance across plant species and use these observations to put forward the Perfect Storm Hypothesis for how mating system and ploidy may interact to allow TE-driven genome expansion.
Material and Methods
Study system
The genus Capsella consists of four species with varying mating system, ploidy, and geographical distribution (Hurka et al. 2012; Figure 1). Selfing is believed to have evolved multiple times from an ancient progenitor of the diploid (2n = 2x = 16) Capsella grandiflora, a self-incompatible species restricted to Albania and northwestern Greece. Most recently, Capsella rubella diverged from C. grandiflora within the last 100,000 years (Foxe et al. 2009; Guo et al. 2009; Slotte et al. 2013; Brandvain et al. 2013). Capsella orientalis is believed to have evolved selfing prior to C. rubella, also from a C. grandiflora-like ancestor. C. orientalis and C. grandiflora have been inferred to have diverged approximately 930,000 years ago, providing the potential for a longer time period of mating system divergence (Douglas et al. 2015). Whereas C. rubella has expanded to a larger Mediterranean distribution, C. orientalis is now found in an area spanning Eastern Europe to Central Asia (Hurka et al. 2012). The origin of the world-wide distributed Capsella bursa-pastoris long remained elusive, but was recently determined to be an allotetraploid (2n = 4× = 32) following a hybridization event between C. grandiflora and C. orientalis within the last 100,000-300,000 years (Douglas et al. 2015). Consistent with this hybrid origin, a principal component analysis of the types of TEs found in C. bursa-pastoris puts it as an intermediate between C. grandiflora and C. orientalis and of all shared insertions found in two of the three species, the majority are between C. bursa-pastoris and either C. grandiflora or C. orientalis, with very few shared between C. grandiflora and C. orientalis to the exclusion of C. bursa-pastoris (Douglas et al. 2015). In this study, we expand the TE analysis of C. bursa-pastoris, to test for an accumulation of TEs following allopolyploid origins
Identification and quantification of transposable elements
We combined the datasets generated by Ågren et al. (2014) and Douglas et al. (2015). In brief, these studies applied the PopoolationTE pipeline of Kofler et al. (2012) on 108-bp paired-end Illumina reads on 8 C. bursa-pastoris, as well as 8 C. grandiflora and 10 C. orientalis individuals. Since the approach is designed for pooled population data, we adjusted it to use population frequencies to infer insertions as homo- or heterozygous, as per Ågren et al. (2014) and Douglas et al. (2015). We used this approach to determine the abundance, genomic locations, and population frequencies of TEs in the three species.
Results and Discussion
We quantified the abundance of four major categories of TEs: DNA, Helitrons, long terminal repeat (LTR) retrotransposons and non-LTR retrotransposons. The three species differ in their mean number of TEs (Kruskal-Wallis chi-squared◻=◻21.342, ◻fC=◻2, p◻<◻0.00001), but all species show similar relative abundance across elements, with LTR elements making up the bulk of the insertions (Figure 2).
To test whether C. bursa-pastoris has experienced an accumulation or loss of TEs following its origin, we calculated the expected diploid TE copy number from a C. orientalis × C. grandiflora hybrid and compared this number to the observed C. bursa-pastoris abundance. We randomly paired up C. orientalis and C. grandiflora individuals and calculated the average TE copy number of such a cross. We performed 1,000 replicates of in silico crosses, sampling with replacement. We then compared the expected copy number to the observed abundance in C. bursa-pastoris and found that C. bursa-pastoris harbours slightly more insertions than what would be expected under strict additivity (Figure 2; Wilcoxon rank sum test, W = 6038, p = 0.01297).
Since TE insertions near genes will likely disrupt gene function, population genetic theory predicts that selection will rapidly remove such insertions (Dolging and Charlesworth 2008). Following a whole-genome doubling event, a tetraploid like C. bursa-pastoris will carry twice as many gene copies as its diploid progenitors and the fitness cost of an insertion should therefore be less. As a consequence, tetraploids may be expected to accumulate more TEs near genes than diploids. To test this prediction, we used the gene annotation from the reference genome of C. rubella (Slotte et al. 2013) to calculate the distance to the closest gene for all TE insertions, in all three species. Again, just like the overall abundance, we are interested in whether C. bursa-pastoris has more insertions near genes than what would be expected by additivity from a C. orientalis × C. grandiflora cross. Using the approach outlined above, we calculated the expected TE copy number within 1000 bp of the closest gene from such a hybrid and compared it to the observed abundance in C. bursa-pastoris. We find that C. bursa-pastoris harbours slightly more insertions near genes, compared to what would be expected under strict additivity (Figure 3; Wilcoxon rank sum test, W = 7144.5, p = 0.000126)
Average abundance of transposable elements (TEs) in 100 bp bins near their closest gene in the three Capsella species. Error bars are◻±◻ 1 standard error. The expected C. bursa-pastoris value was generated by performing 1,000 replicates of in silico crosses between C. orientalis × C. grandiflora, sampling with replacement.
Ågren et al. (2014) previously reported that C. grandiflora showed an excess of rare insertions relative to C. orientalis, consistent with higher TE activity in the former. Here, we used the presence/absence of all TE insertions, across all individuals in the three species to categorize insertions as either singletons (present in only one individual) or non-singletons (present in more than one individual). We find that C. grandiflora has the highest number of singletons, potentially suggesting a higher TE activity than C. orientalis and C. bursa-pastoris, which both show similar proportions of singletons (Table 1). However, differences in demographic history between the species are likely also contributing to the frequency spectrum.
Overall, we found no evidence that the species is experiencing a large proliferation of TEs. Instead, the abundance, both overall and near genes, as well as the population frequencies of TEs, are intermediate that of its two parental species C. grandiflora and C. orientalis. This is consistent with our analyses of genome structure variation (Douglas et al. 2015) indicating that this allopolyploid has not experienced large-scale ‘genome shock’ since its origins.
The perfect storm hypothesis
Despite a number of models predicting TE proliferation following allopolyploidization, our results provide little evidence for expansion in C. bursa-pastoris. This is in contrast with our results from genome-wide SNP patterns (Douglas et al. 2015), which suggest a large-scale reduction in the efficacy of selection on amino acid and conserved noncoding mutations. Consistent with this, recent reviews have suggested that TE proliferation is not consistently observed in allopolyploids (Parisod and Sinerchia 2012; Tayalé and Parisod, 2013). One important consideration when predicting the effects of polyploidization on genome evolution may be its association with mating system. Highly outcrossing species should experience high rates of transposable element activity (Wright and Schoen, 1999; Morgan, 2001; Charlesworth and Wright, 2001). Allopolyploidization events that are associated with a retention of high rates of outcrossing could therefore represent a ‘perfect storm’, whereby TE activity remains high while genome redundancy enables rapid proliferation. On the other hand, polyploidy is often associated with elevated rates of self-fertilization compared with diploid relatives, and both asexual reproduction and high rates of selfing are common in polyploid lineages (reviewed in e.g. Mable, 2004; Husband et al., 2008; Robertson et al., 2011; Ramsey and Ramsey, 2014). Highly selfing lineages such as C. bursa-pastoris may experience a reduction in copy number of transposable elements and a reduction in the spread of new, active transposable elements through populations, because of a loss of transmission via outcrossing.
It is notable that some of the most well-documented ancient TE expansion events, including maize (Schnable et al., 2009; Baucom et al., 2009; Diez et al., 2014) and the Brassica genus (Zhang and Wessler, 2004), are associated with ancient allopolyploidization events involving outcrossing lineages. Whether this is simply circumstantial or causal will require in-depth comparative analyses of the joint and unique effects of polyploidy and mating system on genome size and TE proliferation. As we gain increasingly detailed insights into the time since last whole genome duplication event in many lineages, investigating how this interacts with mating system to structure genome evolution is becoming increasingly feasible.
Acknowledgements
J.A.Å was supported by a Junior Fellowship from Massey College, S.I.W. by a Natural Sciences and Engineering Research Council (NSERC) Discovery Grant, and H.R.H by the National Natural Science Foundation of China (grant number 31370005) and a fellowship from the China Scholarship Council.