Abstract
We present a complete ancient human genome and oral microbiome sequenced from a piece of resinous “chewing gum” recovered from a Stone Age site on the island of Lolland, Denmark, and directly dated to 5,8585,661 cal. BP (GrM13305;5,007±11). We sequenced the genome to an average depth-of coverage of 2.3× and find that the individual who chewed the resin was female and genetically more closely related to western hunter gatherers from mainland Europe, than hunter gatherers from central Scandinavia. We use imputed genotypes to predict physical characteristics and find that she had dark skin and hair, and blue eyes. Lastly, we also recovered microbial DNA that is characteristic of an oral microbiome and faunal reads that likely associate with diet. The results highlight the potential for this type of sample material as a new source of ancient human and microbial DNA.
Our ability to recover ancient genomes from archaeological bones and teeth has revolutionised our understanding of human prehistory (1). The recovery of ancient microbial DNA, meanwhile, has provided fascinating insights into human health in the past and the evolution of the human microbiome (2). However, such analyses are highly dependent on the preservation of suitable samples, which get increasingly scarce as we move back in time. Further, there are periodic gaps in the burial record that can be either explained by poor preservation or burial practices that did not leave any traces behind, e.g. cremations (3, 4). Considering the scarcity of human remains, we hypothesised that there might be other sources of ancient human DNA in the archaeological record, in particular ancient “chewing gum”. These inconspicuous lumps of birch tar have been recovered from several prehistoric sites in Scandinavia and elsewhere in Europe and while their use is still debated, they often show tooth imprints, suggesting that they were chewed (5).
Birch bark resin has been used as adhesive as far back as the Middle Pleistocene (6, 7). However, direct evidence of it being used as a masticant, first appears in the Early Holocene (5). The oldest pieces of chewed birch bark resin found in archaeological contexts, have been dated to the Preboreal of Northern Europe (c.11,700-9,000 BP), such as Huseby Klev in Sweden (8), and Barmose in Denmark (9). Several of these objects were subsequently identified as birch bark tar by gas chromatography-mass spectrometry (GC-MS) analysis, through detection of both triterpenoids betulin and lupeol, which when found together are unique to birch (Betula) (5). Both of these triterpenoids have medicinal properties, such as being anti-microbial, anti-viral, and anti-inflammatory, suggesting that the use of birch resin as masticant in prehistory may also have been medicinal (5).
We used next-generation sequencing to recover a complete ancient human genome from one such artefact (Fig. 1a) that was found at the Stone Age site of Syltholm in Denmark, and directly dated to 5,858-5,661 cal. BP (GrM-13305; 5,007±11) (Fig. 1b). The site is one of the largest Stone Age settlements excavated to date (10) and spans the transition from the Mesolithic to the Neolithic period in Denmark (Fig. S3). In addition to the human genome, we also recovered microbial reads from the gum that enabled us to reconstruct a 5,700 year-old oral microbiome. The human genome sheds light on the dynamics of European populations at the time of the Neolithic transition, while the microbial data highlight the effects of major dietary shifts associated with the introduction of agriculture on our ancestral oral microbiome. Additionally, we found reads that are likely associated with diet. Overall, the results of our study highlight the potential of birch tar mastics as a new and well preserved source of ancient DNA.
We generated over 380 million reads for this sample, over half of which mapped to the human reference genome (Table S2). We used the human reads to reconstruct the complete genome of the Syltholm individual, assembled to an effective depth-of-coverage of 2.3×. All the libraries displayed features characteristic for ancient DNA, including i) short average fragment lengths, ii) an increased occurrence of purines before strand breaks, and iii) an increased frequency of apparent cytosine (C) to thymine (T) substitutions at 5′-ends (Fig. S6). The amount of modern human contamination was estimated to be around 1-3% based on mtDNA (11). The sex of the individual could be determined to be female, based on the fraction of high quality reads (MAPQ ≥30) mapping to the sex chromosomes (12). In addition, we recovered a complete mitochondrial genome, sequenced to 91-fold coverage. The mtDNA haplogroup was determined to be K1e.
Genome-wide affinities
To investigate the population genetic affinities of the Syltholm individual, we merged the ancient genome with over 1,200 modern and ancient genomes (Dataset S1), and performed a principal component analysis (PCA), by projecting the ancient sample onto the reference panel. We find that the Syltholm genome clusters with Palaeolithic and Mesolithic hunter-gatherers from western Europe, including Loschbour (13) and La Braña (14) (Fig. 2a). To explore the ancestral composition of the Syltholm genome further, we performed a supervised ADMIXTURE (15) run, specifying western hunter-gatherers (WHG), eastern hunter-gatherers (EHG), and Neolithic farmers (Barcın) as possible source populations. The result shows that the Syltholm genome is entirely composed of western hunter-gatherer ancestry, without any significant trace of EHG or farmer ancestry (Fig. 2b). To formally test these relationships, we computed a series of f- and D-statistics and found no evidence for significant levels of EHG or farmer gene flow in the Syltholm genome (Fig. 2c, Table S4-5). When using the modeling approach implemented in qpAdm (16), we find that a model assuming 100% WHG ancestry cannot be rejected in favour of two-way admixture. Specifically, modelling the Syltholm genome as a mixture of EHG and WHG ancestries does not significantly improve the fit, and neither does the addition of Neolithic farmer ancestry (Table S6).
Pigmentation, lactase persistence and immune response
We used imputed genotypes for 41 SNPs included in the HIrisPlex-S system (17) to predict hair, eye and skin colour and found that the Syltholm female likely had blue eyes, brown hair and dark skin (Dataset S2). We also examined the allelic state of two SNPs linked with the primary haplotype associated with lactase persistence in modern humans and found that the she carried the ancestral allele for both, indicating that she was lactose intolerant (Dataset S3). In addition, we imputed alleles at 62 loci on 42 genes involved in the immune response in humans and found that she carried at least one derived allele in 21 (50%) of the genes involved (Dataset S3).
Microbiome characterisation
Metagenomic profiling of the non-human reads in our sample revealed a close affinity between the microbial composition of the gum and the oral microbiome profiles from the Human Microbiome Project (HMP) (Fig. 3a). The actual profile (Fig. 3b) closely matches the profiles of oral microbiomes in HMP with two exceptions: a higher count of Neisseriales and a lower occurrence of Veillonellales. To further characterise and validate the microbial taxa present in our sample, we used MGmapper (18). As expected, the majority of reads could be assigned to taxa associated with the oral microbiome, with eight taxa having an effective depth-of-coverage >3× (Table 1). Additionally, we recovered 539 reads that mapped to Human herpes virus 4 (Epstein-Barr virus). We validated these taxa by aligning the non-human reads to the respective reference genomes and inspecting post-mortem DNA damage patterns (Fig. S7) and checking for evenness of coverage (Fig. S8).
Diet
We also identified reads that could associated with taxa related to diet, including mallard (Anas platyrhynchos) and eel (Anguilla anguilla), as well as reads associated with the source of the resin: birch (Betula pendula) (Table 1). We validated the presence of these taxa, as described above, by checking for the presence of damage patterns (Fig. S7) and evenness of coverage (Fig. S8). We also specifically looked for reads mapping to other taxa, such as barley (Hordeum vulgare), emmer (Triticum dicoccum) and einkorn (Triticum monococcum), as well as common hazel (Corylus avellana), but either did not retain any reads after filtering for mapping quality (MAPQ ≥35) or were unable to confirm that the reads are ancient.
The results highlight the potential of birch resin mastics as a new source of ancient human and microbial DNA. The endogenous human DNA content was surprisingly high and the overall preservation of the DNA in the sample was comparable to well-preserved teeth or petrous bones (Table S3) (19–21). We believe that the remarkable preservation can be explained by the antiseptic, hydrophobic environment of the resin, which both inhibits microbial and chemical degradation of DNA. This opens up fascinating new possibilities to extend the spectrum of human ancient DNA studies, particularly for time periods or cultural contexts where human remains are scarce or non-existent. Furthermore, our ability to recover authentic microbial DNA provides information on the state and evolution of the human oral microbiome and the spread of specific oral pathogens, while reads that likely relate to foodstuffs provide fascinating insights into the diet and changing subsistence strategies of prehistoric people.
Population history
From a population history point of view, the 2.3× Syltholm genome (5,858-5,661 cal. BP) provides an important data point at a crucial juncture in European prehistory, the time of Mesolithic/Neolithic transition. In Denmark, this transition occurred around 6,000 years BP and is marked by rapid and dramatic changes in diet and subsistence strategies from a largely marine diet in the Mesolithic to a terrestrial-based diet in the Neolithic (22–24). Culturally, the change is marked by the transition from the Late Mesolithic Ertebølle culture (7,300-5,800 cal. BP) with its flaked stone tools and typical T-shaped antler axes (25, 26), to the early Neolithic Funnel Beaker culture (5,900-5,300 cal. BP) with its characteristic pottery, polished flint artefacts, and domesticated animals (27–29). While human remains from Denmark exist for this time period, none have been sequenced to date. In fact, there are still only a handful of published ancient genomes from Scandinavia as a whole (30–33).
Previous studies have shown that the late hunter-gatherer populations of Europe can be split into western hunter-gatherers (WHG), typified by individuals from present-day Luxembourg (13) and Spain (14), and eastern hunter-gatherers (EHG), generally represented by individuals from northern Russia and Estonia (16). Genetically, hunter-gatherers from central Scandinavia are known to be intermediate along this cline, mirroring their geographic location between the two extremes of this west–east axis (13, 31). According to Günther et al. (32), EHG ancestry reached central Scandinavia from the north, while the WHG ancestry entered from the south, following the retreat of the ice sheets around 12 to 11 ka years ago. Interestingly, the Syltholm genome is entirely composed of WHG ancestry, suggesting that EHG ancestry did not reach southern Denmark in prehistory.
Furthermore, it is striking that the Syltholm genome also lacks Neolithic farmer ancestry, despite its relatively late date. This is all the more interesting because there is evidence for domesticated animals and farming practices at the site from around 6,000 BP (10). It suggests that admixture with farming populations might not have been as pervasive at the time. Having said that, the mtDNA genome we recovered from the resin belongs to haplogroup K1e, which is surprising, as it is more commonly associated with early farming communities (34–38). However, haplogroup K1 has also been observed in Mesolithic individuals in other parts of Europe (39) and it is possible, therefore, that it reached northern Europe during the Mesolithic. In any event, the lack of genome-wide Neolithic farmer ancestry in the Syltholm genome is worthy of note because it suggests that the genetic impact of early farming communities might not have been as extensive as once thought (34).
Phenotypic traits
The phenotypic combination of blue eyes and otherwise dark hair and skin pigmentation has been previously noted in European hunter-gatherers, including La Braña (14) and Cheddar Man (40). Our results indicate that this phenotype was widespread in Mesolithic Europe and that the adaptive spread of light skin pigmentation in European populations only occurred later on in prehistory (32). The absence of adaptive alleles on the MCM6 gene, which have consequences on lactase persistence into adulthood, has also been observed before (14) and fits well with the idea that the rise of dairy farming during the Neolithic created selective pressures that favoured the spread of alleles associated with lactose tolerance in adults (41). The fact that the Syltholm female carries derived alleles in a large number of genes that have a wide range of functions in the human immune system confirms earlier suggestions that the Neolithic transition, with its associated dietary changes and shifts in pathogen exposure, was not a major driving force of adaptive innovation relating to the immune response in modern humans (14).
Oral microbiome
Previous studies (42–44) of microorganisms preserved in mineralised dental plaque (calculus) have shown changes to the ancestral oral microbiome over time. One of the most significant changes occurred with the Neolithic revolution. However, unlike dental calculus, which represents a long-term reservoir of the oral microbiota, the bacteria found in ancient mastics are more likely to give a snapshot of the species active at the time, and to reflect the microbiome of the oral mucosa instead of the plaque microbiome. As most modern clinical studies use saliva as sample material, this facilitates comparative studies. As seen in Fig. 3, the Syltholm sample has a similar microbial composition as modern oral microbiomes, but there are also some clear differences. Most notably, we observe an increase Veillonellales and a decrease in Neisseriales sp., which might indicate a dietary shift away from wild foods to a diet rich in carbohydrates and dairy products following the Neolithic transition (42).
Diet
The Mesolithic/Neolithic transition in Denmark is generally thought to have been characterised by rapid and dramatic changes in subsistence strategies, from a marine diet in the Mesolithic, to a largely terrestrial-based diet in the Neolithic (23, 45). However, several lines of evidence suggest that freshwater and marine resources continued to be exploited during the Neolithic, including fishing equipment, faunal remains, and residue analyses (46–49). We identified several thousand reads in the resin that might be associated with diet, including reads mapping to mallard (Anas platyrhynchos) and eel (Anguilla anguilla) (Table 1). The presence of these two taxa in the sample suggests that the community at Syltholm continued to exploit wild resources well into the Neolithic. This is also supported by the presence of faunal remains at the site, such as duck (Anas sp.; NISP = 50), as well as hundreds of wooden leister prongs and bone points used for catching eel.
In summary, we demonstrate that chewed birch tar is an exceptional source of authentic ancient human and microbial DNA, presumably because of the hydrophobic and antibacterial properties of the resin. We recovered a complete ancient human genome (2.3×) and oral microbiota from one such artefact. While the human genome provides new information on the population dynamics of northern Europe at the time of the Neolithic transition, the microbial data shed light on the state of our ancestral oral microbiome. In addition, the human genome provides information on the evolution of skin pigmentation and other adaptive traits in humans. Lastly, the recovery of sequences derived from foodstuffs also provided insights into diet. This new source of ancient DNA opens up fascinating new possibilities for the field of ancient DNA, especially for time periods where human remains are scarce.
Material and Methods
Sample
The “gum” was found at site MLF906-II (Fig. S1) during excavations carried out by the Museum Lolland-Falster near Rødbyhavn, Denmark. The site spans the Mesolithic/Neolithic transition in Denmark (Fig. S2) and was heavily exploited during the Late Ertebølle and Early/Middle Neolithic periods. The piece of gum was found at a depth of c.2 m below surface in the north-eastern section of the site and measures c.3 cm in length and 1 cm in width (Fig. S3).
Radiocarbon dating
The sample was only sparsely soluble in organic solvent, so we used an acid-base-acid (ABA) treatment. First, the sample was treated with 4% HCl (80°C) and then rinsed to neutrality with ultra-pure water. Second, a basic solution 1% NaOH (RT) was applied, and the reaction vessel rinsed again to neutrality. Finally, a further acid step was applied using 4% HCl (80°C) to ensure no atmospheric CO2 absorbed during the alkaline phase remained in the reaction vessel. After a last rinse to neutrality, the product was thoroughly air dried. An aliquot of approximately 4 mg was then weighed into a tin capsule for combustion in an Elemental Analyser (EA, IsotopeCube NCS, Elementar®). The EA was coupled to an Isotope Ratio Mass Spectrometer (IRMS, Isoprime® 100), which allowed the δ13C value of the sample to be measured, as well as a fully automated cryogenic system that trapped the liberated CO2 into an airtight vessel. The vessel was manually transferred to a vacuum manifold, where a stoichiometric excess of H2(g) (1: 2.5) was added, and the sample CO2(g) reduced to graphite over an Fe(s) catalyst. The graphite was pressed into a cathode for radioisotope analysis in an accelerator mass spectrometer (MICADAS IonPlus®). The MICADAS generated an estimate of the 14C:12C ratio that was close to ±1‰, and from this data, and in accordance with all standard operations and conventions, the 14C date (in yrs BP) was calculated.
FTIR and GC-MS analysis
The birch resin was analysed by Fourier Transform infrared (FTIR) spectroscopy and Gas Chromatography Mass Spectrometry (GC-MS). FTIR was carried out on a Perkin Elmer Spectrum 1000 FT-IR spectrometer using potassium bromide pellets (Fischer Scientific, IR Grade). GC-MS analysis was carried out on a Bruker SCION 456 GC-TQMS equipped with a Restek Rtx-5 capillary column (30 m, 0.25 mm ID, 0.25 μm) programmed for a 1 cm3 min−1 helium flow. 1 μl of sample was injected using a Programmed Temperature Vaporizer, held at 64°C for 0.5 min, raised to 315°C at 200°C min−1 and held at that temperature for 40 min. The split ratio was high from 0 to 0.5 min and then switched to 5. The GC oven temperature was held at 64°C for 0.5 min, then raised to 190°C at 10°C min−1 and then on to 315°C at 4°C min−1 and held at that temperature for 15 min. The sample was hydrolysed in methanolic KOH (Merck) and extracted with GC-grade tert-Butyl methyl ether (MTBE) after acidification. The MTBE extract was methylated with diazomethane (Sigma-Aldrich) prior to injection in the GC-MS system (50). The FTIR analysis produced a spectrum reminiscent of freshly produced birch resin (Fig. S4), while the GC-MS analysis revealed the presence of the triterpenes betulin and lupeol (Fig. S5), which are considered biomarkers of birch resin (51–55).
Sample preparation and DNA extraction
We sampled c.250 mg from the gum and vortexed it briefly in 5% bleach solution to remove any surface contaminants. The sample was then washed in molecular grade water, left to dry, and cut into six smaller pieces weighing around 30-50 mg each. We then extracted DNA from the samples using three different methods: Samples 1-4 were incubated in 1 ml of a proteinase K-based digestion buffer on a rotor at 56°C for c.12 hrs. Subsequently, the supernatant for samples 1 and 2 was concentrated down to ~150 μl using Amicon Ultra centrifugal filters (MWCO 30 kDa), mixed 1:10 with a PB-based binding buffer (56), and purified using MinElute columns, eluting in 30 μl EB. For samples 3 and 4 we added an additional phenol-chloroform step, by adding 1 ml phenol (pH 8.0) to the digestion mix, followed by 1 ml chloroform:isoamyl alcohol. The supernatant was then concentrated and purified, as described above. Samples 5-6 were dissolved in 1 ml chloroform:isoamylalcohol without adding proteinase K. The chloroform:isoamylalcohol was mixed with 1 ml molecular grade water, that then was purified as described above. Following metagenomic profiling, the extracts prepared with chloroform were found to be contaminated with Delftia and Methylophilus sp., highlighting the danger of reagent and laboratory contamination for microbiome studies (57).
Library preparation and sequencing
16 μl of each DNA extract were built into double-stranded libraries using the BEST protocol (58), with minor modifications as described in Mak et al. (59). One extraction NTC was included as well as a single library NTC. 10 μl of each library were amplified and indexed in 50 μl reactions, using a dual indexing approach, as described by Kircher et al. (60). The optimal number of PCR cycles was determined by qPCR (MxPro 3000, Agilent Technologies). The amplified libraries were purified using SPRI-beads and quantified on a 2200 TapeStation (Agilent Technologies) using High Sensitivity tapes. The amplified and indexed libraries were then pooled in equimolar amounts and sequenced on 1/10 of a lane of an Illumina HiSeq 2500 run in SR mode. Following initial screening, additional reads were obtained by pooling libraries #2, #3 and #4 in molar fractions of 0.2, 0.4 and 0.4, respectively and sequencing them on one full lane of an Illumina HiSeq 2500 run in SR mode.
Decay rate estimate
To investigate the rate of human DNA degradation in the ancient chewing gum we examined the DNA read length distributions of the mapped reads, using a previously published method (61). The distribution follows a typical pattern of degraded DNA with an initial increase in the number of reads towards longer DNA fragments, followed by a decline. We observe that the declining part of the distribution follows an exponential decay curve (R2=0.99), as expected if the DNA had been randomly fragmented over time. Deagle et al. (62) showed that the decay constant (λ) in the exponential equation represents the fraction of broken bonds in the DNA strand (the damage fraction) and that 1/λ is the average theoretical fragment length in the DNA library. By solving the equation, we obtain a DNA damage fraction (λ) of 3.4%, which corresponds to a theoretical average fragment length (1/λ) of 29 bp (Table S3). We note that this is not directly comparable to the observed average length, which is affected by lab methods and sequencing technology. If the DNA is found in a stable matrix long term DNA fragmentation can be expressed as a rate and the damage fraction (λ, per site) can be converted to a decay rate (k, per site per year), when the age of the sample is known. By applying an estimated age of 5,700 years for the Syltholm resin, the corresponding DNA decay rate (k) is 5,96−06 breaks per bond per year, which corresponds to a molecular half-life of 1,162 years for a 100 bp DNA fragment. This means that after 1,162 years (post cell death), each 100 bp DNA stretch will have experienced one break on average. This estimated rate of DNA decay for the gum sample seems within the expected age for DNA preserved in a stable matrix in temperate climate zone. For example, the rate is close to that observed in the La Braña sample (14), preserved under similar temperatures as our gum sample (Table S3). By contrast, the DNA decay in human remains from warmer climates is much faster (63). Although these calculations are only based on a single sample, the results suggest that ancient mastics provide remarkable conditions for molecular preservation, with a rate of DNA decay that is comparable to that in well-preserved teeth or petrous bones.
Data processing
Basecalling was done with bcl2fastq 2.2. Only sequences with correct indexes were kept. FastQ files were processed using PALEOMIX 1.2.12 (64). Adapters and low quality reads (Q<20) were removed using AdapterRemoval 2.2.0 (65) only retaining reads >25 bp. Trimmed and filtered reads were then mapped to hg19 (build 37.1) using BWA (66) with seed disabled to allow for better sensitivity (67), as well as filtering out unmapped reads. Only reads with a mapping quality ≥30 were kept and PCR duplicates were removed. MapDamage v.2.0.6 (68) was used to evaluate the authenticity of the retained reads as part of the PALEOMIX pipeline (64), using a subsample of 100k reads per sample (Fig. S5). For the population genomic analyses, we merged the ancient sample with individuals from the Human Origin dataset (13) and over >200 other previously published ancient genomes. At each SNP in the human origin dataset, we sampled the allele with more reads in the ancient sample, resolving ties randomly, resulting in a pseudohaploid ancient sample.
MtDNA and contamination estimates
We used Schmutzi (69) to determine the endogenous consensus mtDNA sequences and to estimate the amount of modern human contamination. Reads were mapped to the Cambridge reference sequence (rCRS) and filtered for MAPQ ≥30. Haploid variants were called using the endoCaller program implemented in Schmutzi (69) and only variants with a posterior probability exceeding 50 on the PHRED scale (probability of error: 1/100000) were retained. We then used Haplogrep 2.2 (70) to determine the mtDNA haplogroups, specifying PhyloTree (build 17) as the reference phylogeny (71).
Imputation and phenotyping
We used information from 41 SNPs from the HirisPlex-S system (17) to identify phenotypes such as skin, eye and hair colour in the ancient individual. First, we used angsd (72) to obtain genotype likelihoods in a 5 Mb region around these SNPs. Subsequently, we used the 5 Mb region to estimate the underlying haplotypes carried by the Syltholm individual at these loci, and from these haplotypes imputed the genotypes at the 41 SNPs of interest. We used multiple posterior probability thresholds, ranging from 0.95 to 0.50, to filter the imputed genotypes. These imputed genotypes were uploaded to the HIrisPlex-S website to obtain the predicted outcomes for the pigmentation phenotypes. In addition, using the same steps as outlined above, we imputed 82 additional SNPs, known to play a role in lactase persistence into adulthood, and immune response (14). We used the imputed genotypes to check for the presence of derived alleles.
Principal component analysis
Principal component analysis was performed using smartPCA (73) by projecting the ancient individuals onto a modern reference panel including around 1,000 modern Eurasian individuals from the HO dataset (13) using the option lsq project. Prior to performing the PCA the data set was filtered for a minimum allele frequency of at least 5% and a missingness per marker of at most 50%. To mitigate the effect of linkage disequilibrium, the data were pruned in a 50-SNP sliding window, advanced by 10 SNPs, and removing sites with an R2 larger than 0, resulting in a final data set of 593,102 SNPs.
Model-based clustering
We used ADMIXTURE v.1.3.0 (15) to estimate the admixture proportions of the ancient individuals in our merged dataset. The algorithm was run in supervised mode, specifying typical Western hunter-gatherers (Loschbour), Eastern hunter-gatherers (Samara), and Neolithic farmers (Barcın), as potential ancestral source populations. To help visualise the results, we plotted population averages for a selected number of ancient populations (Fig. 2.b).
D- and f-statistics
D- and f-statistics were computed using AdmixTools (74). To estimate the amount of shared drift between a set of ancient genomes, including the Syltholm genome, WHG, and EHG, we calculated f4-statistics of the form f4(Yoruba, X; EHG, WHG), where X stands for one of 15 ancient genomes, including the Syltholm genome. Standard errors were calculated using a weighted block jackknife. To confirm the absence of EHG gene flow in the Syltholm genome and to contrast this result with those obtained for other Scandinavian hunter-gatherers, we calculated a set of D-statistics of the form D(Yoruba, EHG; WHG, X) where X stands for one of 15 ancient genomes, including the Syltholm genome. Lastly, we used the qpAdm algorithm from the AdmixTools package (74) to model different admixture scenarios and estimate admixture proportions in the Syltholm genome based on f4-statistics and assuming WHG, EHG, and Neolithic farmers (Barcın) as potential ancestral source (“left”) populations and one modern (Dinka) and seven ancient genomes, including the Mal’ta genome (75), as outgroup (“right”) populations.
Metagenomic profiling
We used MetaPhlan2 (76) to create a metagenomic profile based on the non-human reads. The reads were first aligned to the MetaPhlan2 database (76, 77) using Bowtie2 v2.2.9 aligner (78). PCR duplicates were removed using PALEOMIX filteruniquebam (67). For cross tissue comparisons 689 human microbiome profiles published in the Human Microbiome Project Consortium (79) were initially used, comprising samples from the mouth (N=382), skin (N=26), gastrointestinal tract (N=138), urogenital tract (N=56), airways and nose (N=87). Pairwise ecological distances amongst the profiles were computed at genera and species level using taxon relative abundances and the vegdist function from the vegan package (http://cran.r-project.org/package=vegan). These were used for principal coordinate analysis (PCoA) of Bray-Curtis distances with the R function pcoa. Subsequently, we calculated the average relative abundance of each genus for each of the body sites present in the Human Microbiome Project and visualised the abundance of microbial orders of our sample and the HMP body sites with MEGAN6 (80). To further characterise the metagenomic reads we used MGmapper v.2.7 (18). Trimmed and filtered sequences were split in ten FastQ files, and MGmapper was used on best mode to first align these to the PhiX database, and then align the non-PhiX reads against the human, viral, plant, mammal, other vertebrates, invertebrates, human oral microbiome, and bacterial database, where only unmapped reads were aligned to the next database. The results of the ten read subsets were then concatenated and processed with MGmapper_classify. Besides MGmapper’s default filters, hits with a lower ratio of unique reads/all reads than 0.01 were excluded. To validate the assignments we then used a reference-based mapping approach. Human reads were removed by first aligning all trimmed and filtered reads to the human reference genome hg19 (build 37.1) using BWA and subsequently extracting all unmapped reads. The reference genomes of taxa of interest were obtained from the NCBI Reference Sequence Database. The trimmed and filtered non-human sequences were aligned to each reference genome with BWA (66) and PCR duplicates removed using Picard Tools v.2.13.2 (http://broadinstitute.github.io/picard/). Sequences with mapping quality below 35 were removed, and mapDamage v.2.0.6 (81) was used to estimate postmortem damage (Fig. S7). The breadth and depth of coverage (Fig. S8) were then calculated with bedtools v.2.27.1 (82) and visualized with circos v.0.69-6 (83). As a final check, sequences that aligned to the reference genome were extracted with bamtools v.2.5.1 (84) and subsequently assembled with MEGAHIT v.1.1.1 (85). The resulting assemblies were then used as a query for a nucleotide BLAST search (86) against the nr/nt database.
Funding
This research was funded by a research grant from VILLUM FONDEN (grant no. 22917) awarded to HS. TZTJ and JN were supported by the European Union’s EU Framework Programme for Research and Innovation Horizon 2020 under grant agreement no. 676154 (ArchSci2020). LTL was partly funded by Danish National Research Foundation (DNRF128).
Author contributions
TZTJ and HS designed and led the study. SAS provided the sample for analysis. TZTJ, MHSS, MRE, MCC, MNM and MWD generated the data. TZTJ, JN, KHI, AKF, SG, SHN, MEA, LTL, AJT, MCC, MNM and HS analyzed the data. TZTJ, JN, LTL, AJT, SAS, MJC, MTPG, MS, SR and HS interpreted the data. TZTJ, JN and HS wrote the manuscript with input from all the other authors.
Competing interests
The authors declare no competing interests.
Data and materials availability
Alignment data are available through the European Nucleotide Archive under accession number PRJEB30280.
Acknowledgments
We thank the Museum Lolland-Falster for access to the sample and Miren Iraeta Orbegozo for technical assistance.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.
- 36.
- 37.
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.
- 44.↵
- 45.↵
- 46.↵
- 47.
- 48.
- 49.↵
- 50.↵
- 51.↵
- 52.
- 53.
- 54.
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵