Abstract
Transposable elements (TEs) play an essential role in shaping eukaryotic genomes and in organism diversification. It has been hypothesized that bursts of TEs activity may correspond to punctuated events of speciation (CArrier SubPopulation, Epi-Transposon, TE-Th-rust hypotheses), thus it is expected that highly differentiated taxa might bear highly active TEs in their genomes. Two new parameters designed to measure the taxa adaptive radiation and the magnitude of TE activity were created: the Relative Rate of Speciation (RRS) and the Density of Insertion (DI). Furthermore, we defined as “hot” and “cold” those genomes with high and low DI respectively. The correlation between RRS and DI, which we called “Cold Genome” hypothesis, was tested in Mammalian families and superorders. Since ages of TEs on a large scale can be approximated by calculating their distance from the respective consensus sequences, we subsetted TEs in different classes in order to study the evolution of genomes at different time scales. We considered “recent” those TEs with < 1% divergence, whereas we called “less recent” TEs with < 5% divergence. Comparing the TEs activity in 16 pairs of species belonging to different mammalian families and the average TEs activity of sampled species from the four superorders of Placentalia, we showed that taxa with positive RRS correlate with “hot genomes”, whereas taxa with negative RRS correlate with “cold genomes”. Specifically, the density of recent insertions correspond to recent macroevolutionary events, while the density of less recent insertions coincide with older events. These results are fully coherent with our “Cold Genome” hypothesis. In addition, our study supports, in both phases of radiation and stasis, the “Punctuated Equilibria” theory in mammals.
Introduction
1. Gradualism and Punctuated Equilibria
The debate between phyletic gradualism and punctuated equilibria (PE) in evolutionary biology is still intense1–3. Phyletic gradualism was embraced by the Modern Synthesis as the evolutionary dynamics for speciation implying a gradual accumulation of mutation throughout time, until genetic incompatibilities break up the gene flow. The theory of punctuated equilibria proposes, instead, that “[…] evolution is concentrated in very rapid events of speciation (geologically instantaneous, even if tolerably continuous in ecological time). Most species, during their geological history, either do not change in any appreciable way, or else they fluctuate mildly in morphology, with no apparent direction”4. Evidences in favor of punctuated equilibria come from diverse fields such as paleontology4–6, phylogenesis7,8 and experimental evolution9. At the same time, gradualism gained evidences when studying various living models11,12.
2. The evolutive power of TEs
Transposable Elements (TEs) are far from being junk DNA13. In the last years they have been continuously linked to essential cellular activities such as the telomeres repairing13–15, rewiring of transcriptional networks16,17, regulation of gene expression18,19, ectopic recombination and chromosomal rearrangements20. TEs are key contributors to evolution and play/played a fundamental role in biological processes of utmost importance13,21–27, like the insurgence of the V(D)J system of acquired immunity22,28,29, placenta development30, embryogenesis31,32 and neurogenesis33–36. Mobile elements make genomes fluid, dynamic37 and organisms evolvable38,39.
Given their huge impact on shaping genomes, Transposable Elements are thought to contribute to the formation of reproductive barriers facilitating speciation32,40–43. Some authors proposed to correlate high TEs activity with organismal differentiation44–46. Moreover, the environment and its influence on the epigenetic structure of the cell seems to modulate the mobilization of TEs. The disruption of the epigenome potentially leads to bursts of activity and to a rapid accumulation of genomic variability necessary for phenotypic innovation and speciation (Epi-Transposon and TE-thrust hypotheses)44,45. Furthermore, the diversification of TE families is likely to coincide with events of genetic drift (CArrier SubPopulation hypothesis)46. Interestingly, these observations44–46 can explain both gradualistic and punctuated evolution on the premise of TEs content in genomes. The debate between gradualism and punctuated equilibria is still open but we believe that TEs evolutionary studies can shed some light on it.
3. Hot and cold genomes
Organisms owe their ability to diversificate to their genomic plasticity and the activity of TEs can substantially contribute to it20,21. Hence, we expect that a positive relationship exists between TEs activity and the extant biodiversity. For example, within Mammals, the order Monotremata is the most ancient and the poorest in living species (Figure 1); accordingly, the platypus, that belongs to this group, has a genome that harbors the lowest number of recently mobilized elements46. Is it possible that taxa with low rates of speciation are associated with genomes with inactive TEs? Starting from the observations of specific cases, we widen the perspective to a general evolutionary explanation that we called the “Cold Genome” hypothesis. According to this hypothesis, genomes with highly active TEs (“hot genomes”) belong to taxa with high rates of speciation, whereas genomes with inactive TEs (“cold genomes”) belong to taxa with low rates of speciation (Figure 2A). We investigated the TEs activity using the Density of Insertion (DI) of elements and (as previously proposed in literature44) the number of TE families at different divergences from consensus. Furthermore, we introduced the concept of Relative Rate of Speciation (RRS) in order to establish the magnitude of speciation rates within a given taxa (Figure 2B). Viable elements act like an evolutionary driving force (leading to punctuated events/bursts of insertions/“hot” genomes) but cells have a plethora of molecular mechanisms that modulate the expression of TEs. Molecular mechanisms seem to intervene in temporally discrete periods to regulate the activity of TEs45 – for example young LINE-1 elements are repressed via methylation while ancient TEs are repressed by the KRAB/KAP1 system47 – potentially leading to a paucity of innovative elements and a macroevolutionary genomic stasis (“cold” genomes).
In this paper, we investigated the mechanisms of speciation in Mammals, showing how the gradualistic model does not seem to be able to fully describe the evolution of the extant biodiversity, whereas our new evolutionary hypothesis, that includes PE theory and the genomic impact of transposable elements, might better explain evolutionary dynamics.
Results
4. Clade age is not related to species richness in Mammalia
From a neodarwinian point of view, species continuously accumulate mutations that would eventually lead to differentiation and speciation; therefore, older taxonomical groups should have had more time to accumulate biodiversity, leading to an overabundance of species in comparison to younger groups12. We tested the phyletic gradualism model in Mammalia by investigating the relationship between clade age and species richness of the 152 mammalian families, using both a linear regression model (lm function in R, stats package47) and a non-parametric correlation test (cor.test function in R47). We retrieved the number of species for all the 152 mammalian families listed in the last mammalian phylogeny49 from Catalogue of Life50. The crown age of the mammalian families was estimated from their timed phylogenetic tree49. The calculated regression coefficient is slightly negative (-0.1104) and the R2 is very low (0.00041, P-value 0.8039), hence there is no statistically significant association between the two variables (Figure S1). There is no significant correlation between clade age and species richness either (rho 0.01815343, P-value 0.8243). This model does not seem to describe mammalian evolution accurately, as their differentiation pattern seems to behave in a more complex way.
5. TEs activity and speciation correlate in the whole Mammalia class
TEs are powerful facilitators of evolution20–24,44,45 and they are tightly associated to the evolutionary history of mammals16,30,31. The data about TE families and TE insertions in the genomes of the species considered were analyzed by Jurka and colleagues46. For each TE family, a consensus sequence was produced51, representing the reference sequence for that element. High divergence of a sequence from its consensus approximates, on a large scale, the long time it had to accumulate mutations, whereas lower divergence represents a more recent and less pronounced molecular differentiation. The mobile elements diverging less than 1% from their consensus sequences and their respective TE families were pooled in the “1% dataset”. This dataset represents the most recently mobilized elements, whereas the ones that diverge less than 5% were included in the “5% dataset”, i.e. the list of both recent and more ancient insertions. For better evaluation of the activity of mobile elements in the considered genomes/species, we propose a new parameter herein called Density of Insertion (DI). DI is calculated according to the formula: DI = NI / GS, where NI is the total Number of Insertions (of elements contained in the 1% or 5% datasets) and GS is the Genome Size in gigabases. Accordingly, DI is measured in insertions for gigabase (ins/Gb). We call “1% DI” the parameter calculated using the “1% dataset” and “5% DI” the one that used the “5% dataset”. All the parameters measuring TEs activity were averaged between species belonging to the same taxonomical family, which allowed to perform analyses on a larger scale.
In order to test if and how TEs activity reflects mammalian speciation pattern, we calculated the Rate of Speciation (RS) with the formula: RS = NS / CA, where NS is the total number of species for the analyzed taxonomical family, whereas CA is the Crown Age of the same taxon49.
In Figure 3, we show the TEs activity measured for all the families. These families are arranged in order of RS. There is an increasing trend for all the parameters from left (low Rate of Speciation) to right (high Rate of Speciation). This possible association was tested through non-parametric statistical correlation (using R47 function cor.test with the Spearman method) and by generating a linear regression model (lm function in R47, stats package) (Table S2). All the parameters show significant correlation with the Rate of Speciation in the whole Mammalia class (Table S2). In order of significance, the parameters with better descriptive power are: 5% DI (P-value < 0.005), 1% dataset families (P-value < 0.005), 5% dataset families (P-value < 0.05) and 1% DI (P-value < 0.05). Linear regression models for all the parameters in function of the Rate of Speciation were estimated (Table S3). All models show positive angular coefficients and have significant P-values. These results indicate that TEs activity seems to be tightly associated with speciation events.
6. Families with higher Relative Rate of Speciation show higher TE activity
Mammalian phylogeny seems to support an evolutionary framework in which short bursts of diversification are alternated with longer periods of relative stasis, similarly to what is stated in the punctuated equilibria theory4–6.
In order to validate this observation, we tested two key factors: the activity of Transposable Elements (using the same parameters described above), and the newly introduced Relative Rate of Speciation (RRS). In fact, RS does not factor in when speciation occurred intra-taxonomically, intrinsically including a bias that can offset the results of evolutionary researches. The RRS, instead, is a binomial parameter, also based on the age of the clade of interest and its species richness, that, given a pair of taxa, identifies which one has the highest speciation activity. Briefly, the taxon that shows a higher number of species and, at the same time, is younger has a positive (+) Relative Rate of Speciation and a putative “hot” genome; consequently the other taxon has a negative (-) RRS and a putative “cold” genome. All comparisons were performed between species belonging to the same taxo-nomical order. In this way the compared species share a common history until the origin of their respective orders.
The RRS attribution can be represented by the logical formulae:
RRS1(+) <- NS1 > NS2 ˄ CA1 < CA2
RRS1(-) <- NS1 < NS2 ˄ CA1 > CA2
If neither of these assertions is true, the RRS is non-applicable (NA), and the pair of taxa cannot be compared because of the impossibility to distinguish the age of their intra-taxo-nomical differentiation. This enables the exclusion of non-informative (or misleading) information while still retaining the ability to calculate a reliable proxy of evolvability with only two basic parameters.
As an explanatory example, we can describe in detail some comparisons in the order primates (Figure 2B). Galago, that includes 19 extant species, is a monkey of the family Galagidae. This family is more ancient than Cercopitecidae, whereas the number of species in Cercopitecidae is higher than Galagidae. In this particular case, Galagidae has RRS(+) compared to Cercopitecidae (Figure 2AB). When comparing Galagidae and Tarsidae families, the situation was the opposite: Tarsidae is more ancient and poorer in species than Galagidae, so the latter has RRS(+) (Figure 2B). It is not always possible to determine RRS for all the potential pairs of families/species. For instance, we could not compare the families Hominidae and Cercopithecidae. In particular, Hominidae seems to have a lower rate of speciation, with only 7 living species when compared the Cerco-pithecidae’s that includes 159 species; at the same time the family Cercopithecidae is 8 Mya older, which means that this taxon could have accumulated more species in a larger amount of time.
In total, we tested our hypothesis on 16 families, represented by 19 species, that encompass six mammalian orders (Table 1).
The four parameters of TEs activity, used as proxies for evolvability and differentiability as aforementioned, were measured. The levels of “1% DI” dataset in all the observed pairs are shown in Figure 4A. The correspondence between putative “hot”/“cold” genomes, based on the RRS, and “hotness” and “coldness” of the same genomes predicted by the four TEs activity parameters was tested with a paired Wilcoxon Signed Rank Test (using the wilcox.test function in R): P-values are significant with the exception of “5% DI” dataset (Table 3). In order of significance (and descriptive efficiency), the parameters that better explain the relationship between “hot”/“cold” genomes and TEs content are (P-values in parentheses): “1% DI” dataset (0.0013), “5% TE” Families (0.0063), “1% TE” Families (0.0075) and “5% DI” dataset (0.0739). The descriptive efficiency of the chosen parameters is also presented in Figure 4B. When there is no association between the better descriptive parameter (”1% DI”) and RRS, the other parameters show no association either. Within the “1% DI” dataset, 14 out of 16 pairs follow the expected trend of association between DI values and RRS. Among those, 11 pairs show a difference in DI of at least one order of magnitude, up to almost 180-fold higher in the pair Macaca mulatta - Tarsius syrichta. Despite the two exceptions, Microcebus murinus - Callithrix jacchus and Otolemur garnettii - Callithrix jacchus, the statistical support clearly suggests that, in Mammals, TEs are associated with adaptive radiation.
It is worth noticing that Canis lupus is the only species that, despite the lower total number of mobile elements insertions (lower DI: 194 ins/GB), has more specific TE families (4) than its paired species Felis catus (higher DI: 1446 ins/GB, 3 specific TE families). The other case in which the “1% DI” and the number of 1% dataset TE families show discordance is the pair Tarsius syrichta - Otolemur garnettii. Otolemur garnettii has in fact more new integrants (DI: 51 Ins/Gb) compared to Tarsius syrichta (DI: 6 Ins/Gb) but the same number of TE families (2 each). In both these cases, the greater diversity in TE families (or absence thereof) does not reflect the relative hotness of the species genome, which is better described by the impact of the elements on their genome. From the example presented and the statistical tests, DI is a more sensible parameter than the abundance of TEs families, since the diversity in active families is not always a proxy of the activity of mobile elements in the genome.
The parameters measuring TEs activity confirmed to be accurate even at an intra-family level (as shown in the case of Muridae). Indeed, keeping as constant the taxa with negative RRS and interchanging different genera of the same family as a comparison point, we observed variations in the parameters values, but a statistically similar relevance of their association with the RRS (Figure 4B).
The results obtained point to state that the activity of TEs does not vary randomly within the mammalian phylogeny and that the RRS shows strong association with this activity. Therefore, RRS allows prediction of the relative level of TE mobility between two taxa, which in turn is highly related to their ability to differentiate and speciate.
7. Ancient TE bursts correlate with the ancient history of Placentalia
Once proven that DI is the most descriptive parameter for intra-order activity of TEs, we tested our hypothesis at a higher taxonomic level.
The combined groups consist of 22 species belonging to “hot” superorders and 5 belonging to “cold” superorders. For both 1% and 5% DI averaged datasets, Xenarthra (X) and Afrotheria (A) have a mean DI more than tree fold lower than Laurasiatheria (L) and Eu-archontoglires (E). Specifically for the 5% dataset, the pairs A - E and X - E show a ratio of about 1:5, while for the pairs A - L and X - L the ratio is about of 1:8. Minor differences can be observed for the average density of insertions within the “1% DI” dataset. In increasing order of ratio we have X - E (1:3), E - L (1:4), A - E (1:7), A - L (1:10). The comparison between (X + A) and (E + L) at 1% of divergence from the consensus is non-significant with a P-value of 0.0973.
Bar plots in Figure 4C show that, in both comparisons, standard errors of average DI do not overlap. The Wilcoxon test, performed for the average “5% DI” parameter, showed significant difference between the two groups (P-value 0.0394). Although the same tendency is clearly visible for the average “1% DI” parameter, the Wilcoxon give marginally non-significant results: the putative colder taxa are well differentiated in absolute values but the standard errors of the pairs X - A and X - E overlap one another only slightly. It must be noted that, given the biological paucity of species in the groups Xenarthra and Afrotheria, their 1% and 5% datasets are very small and heterogeneous (i.e. low statistical power). For this reason, it is hard to statistically compare them to the larger datasets of Euarchontoglires and Laurasiatheria.
The discrepancy between the results of the two datasets may be interpreted from an evolutionary point of view. The Density of Insertion at 5% is the least accurate parameter to study recent events (Figure 4B and Table 3) but it works efficiently in the study of older macroevolutive events such as the origin of the four superorders of Placentalia (Figure 4C). The divergence of the elements from their consensus tends, in average, to reflect their age and thus the different datasets describe different periods of time and related tax-onation events.
Conclusions
Transposable Elements are a major source of genomic variability and have more than once impacted evolution through the rise of key molecular processes13–33,36. Even though they seem to be an important factor for the adaptive radiations, the genomic signal they left in the extant biodiversity is exposed, as many biological phenomena, to alteration and noise throughout time. In evolutionary time-scales, their activity is modulated producing alternations of insertional bursts and silencings, which is consistent with speciation patterns explained by the punctuated equilibria theory. Phenotypic differentiation and adaptive radiation would thus macroscopically reflect TEs activity molecular dynamics. Furthermore, TEs seem to positively influence speciation, as is shown by the results of our study: a high differentiation rate is strongly associated with an increased molecular activity of the mo-bilome. This is apparent for both general and relative rates of speciation (RS and RRS) and reflected by all the parameters used as proxies for TEs activity and the states of “hot” or “cold” genomes, with Density of Insertion being the most descriptive.
When silencing mechanisms would progressively inhibit TEs activity (state of "cold” genome), their lack of contribution to molecular differentiation (negative RRS) seems to lead to the relatively static phase postulated by punctuated equilibria.
The “Cold Genome” hypothesis thus supports the punctuated equilibria theory in both the punctuated differentiation bursts and stasis periods.
Furthermore, we showed that TEs insertions describe the insurgence of clades accordingly to their estimated age: “less recent” TE bursts are a proxy for older taxonation events (mammalian superorders), “recent” ones for late events (mammalian families).
Whether TEs mobilization and accumulation of new insertions is cause or effect of adaptive radiation and speciation remains open for debate, although the results presented (and the intrinsic characteristics of the mobilome’s activity) seem to suggest the former of the two to be the more likely hypothesis.