Abstract
The persistence of DNA over archaeological and paleontological timescales in diverse environments has led to revolutionary body of paleogenomic research, yet the dynamics of DNA degradation are still poorly understood. We analyzed 185 paleogenomic datasets and compared DNA survival with environmental variables and sample ages. We find cytosine deamination follows a conventional thermal age model, but we find no correlation between DNA fragmentation and sample age over the timespans analyzed, even when controlling for environmental variables. We propose a model for ancient DNA decay wherein fragmentation rapidly reaches a threshold, then subsequently slows. The observed loss of DNA over time is likely due to a bulk diffusion process, highlighting the importance of tissues and environments creating effectively closed systems for DNA preservation.
Introduction
The genomic era of massively parallel DNA sequencing has driven a revolutionary body of research using ancient DNA-based genomics (1, 2). Paleogenomics has led to the re-writing of recent hominin evolutionary history (3), nuanced understandings of historical human movements and interactions around the globe (4, 5), breakthroughs in Quaternary paleontology (6–8), evolutionary ecology, the biology of extinct species (9), impacts of humans on ancient ecosystems and biodiversity (10), and the evolution and movements of domestic plants and animals (11–14). The successful probing of ancient epigenomes, microbiomes, and metagenomes further illustrates the flexibility and information value of ancient DNA-based research in the genomic age (15–17). In sum, time-series genomic datasets have proven extremely valuable in diverse research avenues.
In addition to the scale and sensitivity of analysis afforded by genomic methods in ancient DNA research, genomic datasets allow for a revised understanding of the patterns and expectations of DNA survival over millennia. This is beneficial in two key ways: i) Criteria of ancient DNA authenticity warrant updating for the genomic era, and formalized expectations of DNA degradation are necessary for this process; and ii) Better predictive models of DNA degradation may help researchers target specimens likely to yield high information value where destructive analysis is unavoidable. Generally, ancient DNA is expected to be highly fragmented (18) and to carry an abundance of characteristic misincorporations—deaminated cytosine residues appearing as C-to-T transitions in single-stranded fragment overhangs (19). Further, DNA fragmentation is biased by biomolecular context. For example, a short-range (∼10bp) periodicity observed in the distribution of fragment lengths is attributed to the period of a complete turn of the DNA double-helix around a histone (20), which is thought to offer some protection against breakage at histone-adjacent sites. Finally, base compositional biases have been regularly observed in DNA preservation, especially enrichment of GC-content in ancient DNA (21).
The relationships between these characteristic patterns of DNA degradation and the preservational environment and age of tissues are poorly understood. We carried out a meta-analysis of 185 ancient genomic datasets—dating from the Middle Pleistocene to the nineteenth century from 21 published studies (Figure 1; data sources cited fully in Supplemental Methods)—to test for relationships between sample age, environmental variables, and DNA diagenesis.
We used mapDamage 2.0 (23) to quantify deamination, and we developed tests for assessing fragmentation, histone periodicity, and energetic biases in ancient genomic data (described in Supplemental Methods). We analyzed these damage statistics in relation to sample age, annual mean temperature, temperature fluctuation, and precipitation—treated as a proxy for humidity—using simple multivariate linear models (Supplemental Methods). Ultimately, we aimed to establish the key determinants of DNA survival, and the specific patterns of DNA breakdown expected under variable conditions.
Results and Discussion
We found that cytosine deamination is strongly influenced by both sample age and site mean temperature (multiple r2 = 0.264; age p = 1.9 × 10−9; temperature p = 1.52 × 10−5, model p = 2.54 × 10−10; Figure 2). Previous studies have identified age as the key critical predictor of deamination (24), but our finding is in line with predictions of a time-dependent hydrolytic process where activation energy is achieved more frequently at higher ambient temperatures. A rate of deamination can be calculated for any sample with a known age and partial conversion of exposed cytosines (Supplemental Methods; Figure 3). The resulting rates vary widely, and show a strong correlation with temperature (r2 = 0.279; p = 1.23 × 10−12). In sum, deamination is a time-dependent process heavily modulated by temperature. When analyzing DNA fragmentation, however, we found that precipitation and thermal fluctuation were strong predictors (multiple r2 = 0.202; precipitation p = 0.0025; temperature fluctuation p = 6.18 × 10−8) but that age was not significantly correlated with the degree of fragmentation (p = 0.77), even when controlling for environmental conditions.
At present, the ancient DNA literature lacks clear consensus concerning some of the fundamental predictors of DNA fragmentation: One recent study identified a strong age dependency in DNA recovery through qPCR analysis of a regionally controlled time series of bone samples (18). This result was interpreted as evidence that DNA degradation in ancient bone is mainly driven by thermal age-dependent hydrolytic depurination driving rate-constant fragmentation over time. However, a separate analysis (24) found no significant link between sample age and the degree of fragmentation. Consistent with this latter finding, early ancient DNA research pointed to very rapid initial DNA decay followed by subsequent stabilization (25), rather than fragmentation as a rate-constant random decay process. Additionally, controlled experiments using qPCR with recently deceased tissues demonstrate a precipitous immediate decline of endogenous DNA content and/or quality, followed by stabilization hypothesized to be linked to the mineral environment of bone (26). This model likewise contradicts the idea that DNA decay can be thought of strictly in terms of exponential breakdown under a decay constant. In total, evidence has been presented for both a rate-constant decay model and a more age-independent scenario. Here, our meta-analysis points to the statistical decoupling of age and fragmentation. We aimed to validate this finding with three strategies:
First, we recognize that numerous sources of variance cannot be controlled in our meta-analysis across several studies—including sample excavation and storage conditions, wet lab and computational methods, and species and tissue types—and that these sources of variance have the potential to obscure subtle relationships. If major sources of inter-study confounding variance in DNA fragmentation were present, the result would likely be the dampening of any statistical relationship between natural variables and fragmentation as the in situ signal for fragmentation is lost. If age was a significant predictor of DNA decay along with thermal fluctuation and humidity, it is difficult to imagine that only the age relationship would be lost due to post-excavation handling and inter-study variation. Therefore, we suggest that confounding variance is not a parsimonious explanation for the lack of a clear age-fragmentation relationship in the presence of a robust environmental association. However, to test this possibility more directly, we restricted our analysis from 185 datasets across 21 studies to 97 Bronze Age human genomes generated from a single study (27). We thereby control for species, tissue type, and biases in sample preparation, and we consider a narrower timeframe and more constrained set of preservational conditions, eliminating several potential sources of confounding variance. Under the same linear model as above (Supplemental Methods), we find that exactly as in the broader dataset, thermal fluctuation and precipitation were strong predictors of fragmentation (respectively, p = 0.014 and p = 4.2 × 10−4; multiple r2 = 0.25), but age was still not a significant predictor of overall fragmentation (p = 0.420).
Second, we tested the fundamental assumption that from a single archaeological or paleontological site, DNA from older samples is expected to be more fragmented than from younger ones. While we initially analyzed data from 94 different sites, the meta-dataset includes 114 pairs of samples from the same site separated by at least 100 years. Thus for these 114 pairs where we can eliminate inter-site variation, the older sample is predicted to be the more fragmented sample a significant majority of the time under the fundamental assumption that fragmentation increases with age in a single environment. Given 114 pairs of samples, only 55 (0.48) satisfy this assumption (‘successes’). The null hypothesis of 47–67 successes (p = 0.05 calculated using the beta distribution) cannot be rejected, and indeed fewer than half of cases satisfy the basic assumption. By increasing the minimum age difference to 1000 years, we retain 55 valid pairwise comparisons and still observe no relationship between age and fragmentation, with only 27 (0.49) satisfying the basic assumption (null hypothesis at p = 0.05: 23–32 successes). We validated this approach by replicating the procedure with deamination, a known age-linked variable (24, and above). With deamination, we reject the null hypothesis and find a significant age effect as expected (131 comparisons possible, 80 successes (0.61); null hypothesis at p = 0.05: 55–76 successes).
Finally, we routinely observe complete deamination of all exposed cytosine residues. This saturation of measurable deamination has been described in several samples previously (23), and is observed in 14 out of the 185 (7.6%) datasets analyzed here, spanning 2kya to 500kya (Figure 4). However, complete deamination in single-stranded overhangs is incongruent with a rate-constant fragmentation model: If fragmentation followed a simple rate-constant process that would yield a robust association between thermal age and fragmentation, new overhangs would continually be exposed with the expectation of intact cytosine, suppressing the proportion of deaminated residues and preempting complete deamination. Even by simulating deamination rates tenfold faster than the most extreme of those estimated in our meta-analysis, deamination fails to converge to saturation under a rate-constant fragmentation model (Supplemental Methods). In total, observing complete deamination under a rate-constant fragmentation model would require that the deamination rate exceeds the fragmentation rate so that new overhangs are rapidly saturated with deamination—all exposed cytosine residues are rapidly converted to uracil. Under such extreme deamination rates, however, it is implausible that deamination would show such a robust correlation with age across samples as observed here and elsewhere (24).
We find strong validation that age does not predict DNA fragmentation in our meta-dataset. However, we recognize that DNA breakdown by hydrolytic depurination is a well-characterized and immutable chemical mechanism by which DNA decays exponentially according to first-order kinetics, producing a measurable half-life signal of molecular depletion (28). The mismatch between this predicted behavior and our findings indicates that the preservation state of ancient DNA is determined by multiple processes, and cannot be attributed to a simple fragmentation rate as suggested in a rate-constant fragmentation model. Instead, we propose a multi-stage DNA fragmentation model: First, physical and biotic stressors cause rapid breakdown of nucleic acids shortly after organism death. While microbes and cellular processes (e.g. autolysis and nuclease activity) rapidly degrade a large fraction of endogenous DNA—depending on tissue type and depositional environment—fragmentation appears to reach an initial threshold and then stabilize somewhat in contexts where DNA has the potential for long-term preservation.
The strong association of humidity and thermal fluctuation with DNA fragmentation suggests that processes like the loss of bioapatite surface area caused by diagenetic recrystallization and physical shearing effects of hydraulic fluctuations in bone, for example, may play a role in the initial breakdown process. Further, DNA may reach a size in bony contexts—the majority of our re-analyzed datasets—where it can penetrate the protective internal porosity of bone and gain some additional protection from the mineral environment. The counterintuitive result that DNA is sometimes better preserved in cooked than uncooked medieval bone may offer support for this scenario (29, although see 30 for further analysis of cremated bone). In our analysis, 15 plant samples from herbaria (31) fit with the overall fragmentation model—comparing fragmentation linear model residuals reveals no significant difference between plant samples and non-plant samples (Welch’s t-test, p = 0.44). However, they make up a very small fraction of the variation here, and because of the possible role of the mineral makeup of bone in DNA preservation across samples (26), we suggest that re-analysis of plant data across a much greater age range will be important in understanding any possible differences in preservation between plant and animal tissues. Over a short timespan, age-dependency in fragmentation has been documented in plant tissues (32), but the currently paucity of paleogenomic plant data currently precludes a comprehensive analysis spanning thousands of years. In total, our meta-analysis and model are necessarily focused on mammalian hard tissue (n=169 out of 185 datasets) given dataset availability. As more datasets are generated from diverse systems and tissue types, we expect further refinement of these general findings to reflect a more nuanced understanding behind the specific drivers of DNA diagenesis and factors underlying preservation. For example, DNA is integrated into hair during programmed cell death and keratinization leading to some amount of immediate shearing which might affect downstream processes (33). Thus ancient DNA in hair might warrant a modified set of expectations for preservation relative to bony tissue given a certain background environment. Recent experimentation comparing tooth cementum and petrous bone DNA diagenesis reinforces the necessity of integrating sample type information in assessing DNA degradation in the future (30).
We also find that in addition to the humidity and thermal fluctuation pattern, the degree of DNA fragmentation correlates strongly with base compositional biases. Specifically, datasets dominated by short fragments are significantly depleted of weakly-bonded nucleotide motifs (p = 6.79 × 10−12, r2 = 0.253; Figure 2; Supplemental Methods), suggesting that DNA breakdown follows predictable patterns with regard to microenvironment and nucleic acid biochemistry. Relatedly, we detected a histone-associated fragmentation bias (20) in the majority of our samples (n=112; Supplemental Methods), and we find that annual mean temperature is associated with the intensity of this pattern (p = 1.2 × 10−5, r2 = 0.16; Figure 2). Specifically, DNA breakdown in colder environments appears to more faithfully reflect cellular architecture and the in vivo genome context, whereas breakdown in warmer conditions is much less discriminant.
Previous research identified a strong age dependency in DNA recovery—assayed by quantitative PCR—in a controlled time-series of bone samples from a regional set of depositional sequences, and interpreted the result as evidence for an exponential decay process due to time-dependent DNA fragmentation (18). However, bulk diffusion of DNA—rather than rate-constant fragmentation—provides an equally parsimonious scenario for the observed qPCR signal. Specifically, the previous study estimated a 521-year half-life for a target fragment of 242bp in the tested environment (18). We estimate, however, that the same qPCR signal is consistent with bulk loss of 0.0013 of all remaining molecules per year as an alternative to rate-constant fragmentation (Supplemental Methods). As such, our results do not conflict with the previous experiment identifying a time-dependent decay behavior in relative copy number of a given fragment size. However, we propose that bulk DNA loss is congruent with both this qPCR signal and our meta-analysis, whereas exponential decay by fragmentation is not supported as the primary mechanism of DNA loss in our analysis. Therefore, we propose that much of the time-dependent nature of ancient DNA recovery may be due to bulk loss of DNA from tissue. Recent research focusing on the dense, non-vascularized petrous part of the temporal bone as a source of high endogenous DNA content (30, 34) demonstrates that targeting “semi-closed” systems with little opportunity for chemical exchange may be the best strategy to continue pushing the boundaries of DNA preservation by combating this diffusion process. This idea has also been robustly illustrated in studies dealing with DNA preserved in hair, which is thought to confer a protective micro-environment that impedes biological degradation, leaching, and possibly hydrolytic damage, and therefore often constitutes a good source of relatively high-quality endogenous DNA (33, 35).
We suggest that rate-constant fragmentation through hydrolytic depurination is seldom the limiting factor to long-term DNA preservation, but we offer some caveats: Fragmentation through depurination is a well-characterized process (28), and we do not propose that it is irrelevant for long-term DNA degradation. We suggest, rather, that the rate of this process is significantly slower than previously estimated in many ancient tissues (18), and the signal over the timespan re-analyzed here is overprinted by other factors in a multi-faceted breakdown process. Thus when estimating the value of ‘lambda’ for a dataset—the parameter describing fragment length distribution (Supplemental Methods)—we are analyzing the outcome of multiple processes rather than inferring a simple decay rate. Further, importantly, any paleogenomic meta-analysis is fundamentally limited to those scenarios in which DNA actually survives over Quaternary timescales, and so hydrolytic fragmentation as previously described might be a central mechanism for the total postmortem depletion of DNA in many tissues and conditions. That is to say, we can only analyze DNA that has survived, which may represent an abnormal mode of diagenesis. Our model for ancient DNA decay therefore necessarily speaks only to the special case in which conditions exist for long-term DNA survival. The immutable depurination process likely still imposes practical limits on DNA recovery in deep time, and recovering Mesozoic DNA, for example, remains extremely unlikely. However, semi-closed chemical exchange systems like the petrous bone, though rare, offer excellent potential for the long-term retention of DNA in tissues, and extraordinary preservational micro-environments created by chemical interactions have proven valuable for deep-time protein preservation (36). Breaching the current Middle Pleistocene age boundary of genomics seems entirely plausible.
Data availability
Analyses were based solely on publicly available datasets. Summary data are available for re-analysis as Supplemental Dataset S1. A tar file containing complete metadata and results from analyses, custom scripts, and run logs has been uploaded as Supplemental Dataset S2.
Acknowledgments
Research was supported by NERC Independent Research Fellowship NE/L012030/1 (to LK). We thank Ludovic Orlando, Beth Shapiro, and Kes Schroer for comments on an early version of the manuscript.