Abstract
Eukaryotic cells carry two genomes, nuclear (nDNA) and mitochondrial (mtDNA), which are ostensibly decoupled in their replication, segregation and inheritance. It is increasingly appreciated that heteroplasmy, the occurrence of multiple mtDNA haplotypes in a cell, plays an important biological role, but its features are not well understood. Until now, accurately determining the diversity of mtDNA has been difficult due to the relatively small amount of mtDNA in each cell (< 1% of the total DNA), the intercellular variability of mtDNA content and copies of mtDNA pseudogenes in nDNA. To understand the nature of heteroplasmy, we developed Mseek, a novel technique that purifies and sequences mtDNA. Mseek yields high purity (> 98%) mtDNA and its ability to detect rare variants is limited only by sequencing depth, providing unprecedented sensitivity and specificity. Using Mseek, we confirmed the ubiquity of heteroplasmy by analyzing mtDNA from a diverse set of cell lines and human samples. Applying Mseek to colonies derived from single cells, we find heteroplasmy is stably maintained in individual daughter cells over multiple cell divisions. Our simulations suggest the stability of heteroplasmy is facilitated by the exchange of mtDNA between cells. We also explicitly demonstrate this exchange by co-culturing cell lines with distinct mtDNA haplotypes. Our results shed new light on the maintenance of heteroplasmy and provide a novel platform to investigate various features of heteroplasmy in normal and diseased tissues.
Introduction
Mitochondria are organelles present in almost every eukaryotic cell [1]. They enable aerobic respiration[2] to efficiently generate ATP, and play an important role in oxygen sensing, inflammation, autophagy, and apoptosis[3, 4]. Mitochondrial activity relies on over a thousand proteins, mostly coded by the nuclear DNA in humans[5], but genes from the mitochondrial genome, a small circular DNA (mtDNA), play a critical role in their function. In humans, the mtDNA is ≈ 17 kbp and codes thirteen proteins critical for the electron transport chain, along with twenty-two tRNAs, two rRNAs and a control region, called the displacement loop (D-loop) (Fig. S1)[6]. Their genetic code differs from the nuclear code. In mammalian mitochondria, ATA codes for Methionine instead of Isoleucine, TGA codes for Tryptophan instead of the stop codon, and AGA, AGG code for stop codons instead of Arginine hinting at a bacterial origin[7]. Mitochondria are inherited solely from the mother and reproduce without recombination. Each mitochondrion carries multiple mitochondrial genomes(5 - 10)[8] and each cell contains hundreds to thousands of mitochondria, depending on the tissue[9]. Inherited mutations in mtDNA have been linked to several genetic disorders including diabetes mellitus and deafness (DAD) and Leber’s hereditary optic neuropathy (LHON)[10]. De novo mutations in mtDNA have also been linked to diseases[11, 12, 13, 14].
Heteroplasmy, which is the occurence of multiples mtDNA haplotypes, has been documented in a variety of human tissues[15] and in samples from the 1000 genomes project[16]. Accurate determination of heteroplasmy, especially the low-frequency haplotypes, is needed for disease-association studies with mtDNA, as well as studies of metabolic activity of cancer cells[17]. Deep sequencing is the only means to identify novel mtDNA haplotypes as well as somatic mutations in tissues and perform association studies to link the haplotypes to disease states. However, measurements of heteroplasmy are compromised by copies of large segments of mtDNA, called Nuclear-mtDNA pseudogene sequences (Numts), present in the mammalian nDNA[18] (Fig. S2). Thus, accurate determination of heteroplasmy requires purification of mtDNA. Without purification, Numts contaminate the measurements of mtDNA variants, and introduce inaccuracies in the estimates of heteroplasmy, especially because Numts exhibit variability and occur in variable copy numbers similar to any other part of the nDNA. Isolating mtDNA has long been a challenge. In forensics and genealogy, allele-specific primer extensions (SNaPshot) are used for genotyping mtDNA[19]. Hyper variable regions(HVR) in the D-loop have been amplified using PCR[20]. Entire mtDNA has been accessed using primers specific to mtDNA to either perform long-range PCR[21], or amplifiy overlapping fragments[15]. Isolation of organelles by ultra-high-speed centrifugation has also been used, though the yields are low along with contamination from fragmented nuclear DNA[22]. Computational methods have also been used to infer heteroplasmy from whole-exome[23, 16] and whole-genome data[24],
Heteroplasmy derived from PCR-based methods are error-prone, due to variability in amplification. Errors also arise from clonal amplification of variants arising from mistakes of polymerases, a common problem in PCR-amplicon sequencing. Additionally, sequence and copy number variations of Numts confound results from computational and PCR-based methods in unpredictable ways. Thus, none of the methods outlined above are able to accurately identify low-frequency variants in mtDNA.
We present here Mseek, a novel method to enzymatically purify mtDNA by depleting linear nDNA and inexpensively sequencing it. By applying Mseek to several cell-lines and human peripheral blood mononuclear cells (PBMC), we identified multiple mtDNA haplotypes in the samples. A major benefit of this method is the ability to call extremely rare variants, with sensitivity of calls only limited by the sequencing depth. Sequencing errors can also be overcome with more sequencing, which is not always possible, especially in PCR amplicon sequencing. Additionally, through clonal expansion of single cells from a variety of cell lines, we establish that heteroplasmy is stably maintained at a single cell level through multiple divisions. This suggests active intercellular exchange of mtDNA. This exchange is explicitly demonstrated by co-culturing two different cell-lines with distinct mtDNA haplotypes, labeling one cell line with GFP, and sorting the cells after many generations (> 25) to show mtDNA haplotypes unique to one cell-line selectively appear in the other. These results, in conjunction with simulations, suggest that exchange of mtDNA between cells is a source of renewal and stability.
Results
Mseek: An efficient method to isolate and sequence mtDNA
The potential relevance of mtDNA to many diseases requires a method to accurately determine the diversity of mtDNA in populations of cells. However, as noted, one of the major problem of existing approaches is the presence of nuclear DNA, which contains sequences of high homology to mtDNA (Numts), making it difficult to discern mtDNA from nDNA (Fig. S2). To address this issue, we sought to take advantage of the difference in topology between nDNA and mtDNA using an exonuclease to digest the linear nDNA, while leaving intact the circular mtDNA. Total DNA was extracted from HEK 293 T cells, and digested with exonuclease V or left undigested. To determine the outcome, we PCR amplified sequences specific to nDNA or mtDNA using appropriate primers. As expected, in the undigested samples of total DNA we could detect both nDNA and mtDNA (Fig.1A). In sharp contrast, in the samples treated with exonuclease V we could only detect mtDNA (Fig. 1B). The lengths of the expected pcr products are shown in Fig. 1C. Using this approach, mtDNA was prepared and sequenced on the Illumina MiSeq platform. Out of a total of 3.05 million 100nt reads, 1.233 million mapped to the mitochondrial genome and 50,000 (< 2%) mapped to the nDNA. The remainder were adapter dimers, which are sequencing artifacts currently filtered out experimentally using Ampure beads. Over 98% of the mappable reads were derived from mtDNA with an average coverage > 3000X (Fig. 1D). More than 50 distinct samples were processed similarly to consistently obtain high purity mtDNA sequence.
The error rate per base of the reads is approximaely 1 in 1000 (Q score > 30). Using at least 10 non-clonal reads to make a variant call reduces errors from sequencing to much less than 1 in a million. This coverage also allows removal of variants with a signifcant bias towards one strand, a known source of errors on the Illumina platform[25]. Contamination from the small amount of nDNA left in the samples does not contribute appreciably to the noise as Numts are a small fraction of total nDNA. Thus, calling rare variants to any level of sensitivity only depends on the depth of sequencing. This approach, designated Mseek (Fig. 2), provides a means of unmatched efficiency in accurately sequencing the mtDNA contained within a population of cells
Ubiquity of heteroplasmy
Since cell lines are clonally derived, the expectation is that the nDNA (and mtDNA) are identical across cells. We decided to explore the diversity of mtDNA in a variety of cell lines to test the expectation that mtDNA would be homoplasmic in cell lines, since, either a slight fitness advantage of one haplotype or drift[26] would lead to a clonal selection and homoplasmy. We applied Mseek to thirty samples including four human PBMCs and human cell lines derived from human diploid fibroblasts (501 T), glioma (A382) and breast carcinoma (HCC1806 and MDA-MB-157).
The mtDNA sequences were analyzed for variations in order to infer universal features in mtDNA variability and differences between human cell lines and blood-derived mtDNA. Repeat content of the sequences was computationally identified to estimate nDNA contamination, which ranged from 0.5 - 1.5%; further confirming the specificity of Mseek. Importantly, because of this high degree of mtDNA purity (> 98%) we were able to multiplex all 30 samples in a single MiSeq run, with average coverage of > 100X.
Variants with a frequency of either 0 or 1 in the population arise from homoplasmic mtDNA. Intermediate frequencies between 0 and 1 imply the co-existence of multiple haplotypes in the population. Strikingly, in both cell-lines and human blood-derived mtDNA, we observed variants occurring in the 0.1 - 0.9 frequency range (Fig. 3), indicating that multiple haplotypes were present in the samples. The tool Mutation Assessor[27] was used to label the variants as high, medium, low, or neutral signifying their predicted impact on protein function. Cell-lines and human PBMCs did not exhibit putative deleterious mutations at high frequency, consistent with the expectation that functioning cells should have functional mitochondria.
The mtDNA has a few non-coding regions outside of the D-loop which occur as gaps between genes. None of the samples exhibited mutations in these regions, suggesting an evolutionarily conserved role, such as in transcriptional control, for these regions. Each sample had unique, distinguishing mutations, ranging in frequency from 0.36 to 1.0. There were a number of unique variants in the four human PBMC samples (ranging in number from 5 to 15) and in the cell lines (ranging in number from 5 to 21).
Since the cell lines were derived from a variety of tissues, our findings have some level of universality. There were no key distinguishing features between cell-line and human blood-derived mtDNA, in terms of deleterious mutations or degree of heteroplasmy, contrary to findings from a study based on whole-genome sequencing of TCGA samples[24]. Our findings are consistent with another study based on colorectal cancer[15].
Stability of heteroplasmy in cell-lines
The results above indicate heteroplasmy exists within a cell population but do not establish heteroplasmy in individual cells, since a mixture of homoplasmic cells with different haplotypes would give the same result. In order to establish heteroplasmy in individual cells, we placed the severest possible bottleneck on the population by deriving colonies from single cells, utilizing MDA-MB-157 and U20S breast carcinoma and osteosarcoma lines respectively (Fig. 4). In each of the derived colonies (8 colonies), the variants from the original lines remained in the derived colonies and at approximately the same frequencies as in the original tumor lines. The sharing of mutations between the original and derived colonies suggests that the diversity in mtDNA exists in individual cells. The preservation of the frequencies between the original and derived colonies indicates further that this heteroplasmy is uniform across cells in the original line (Fig. 4). Since the new clonal lines underwent at least 25 divisions from the single-cell stage, these results also suggest that heteroplasmy is stably maintained over multiple generations with no signs of selection or drift. Over many divisions, errors in replication should have increased diversity in heteroplasmy, while small differences in fitness and drift should lead to homoplasmy. In fact, drift has been proposed as a mechanism for the selection of homoplasmic mtDNA mutations in tumors[26], which has been corroborated in other studies[15]. In light of these reports, our findings are quite unexpected.
A simple model of mtDNA genetics assumes random assortment of mtDNA haplotypes between daughter cells upon cell division, along with multiplication of mitochondria. This model would predict drift towards homoplasmy, as seen in our simulation of this process (Fig. 5) and by others[26]. The rate of drift in haplotype frequencies is a function of the number of mtDNA molecules per cell and the original frequency of the haplotypes (Fig. 5). After many passages, irrespective of the original mtDNA distribution, the likelihood of two randomly selected cells having the same heteroplasmic mix would be extremely low, which is at odds with the stable and uniform heteroplasmy that we observed in the clonally-derived cell-lines. This suggests the existence of an active mechanism to counteract this drift.
Exchange of mtDNA between cells within a population is the simplest explanation for the uniformity of heteroplasmy and its stability. Exchange can counteract the effects of drift by bringing the haplotype distribution closer to the average of the distribution across cells within the population. Other explanations, such as a balancing selection[28] could also be invoked to explain the lack of drift. This can be discounted because most variants are neutral and specific to each cell line, suggesting the selection needs to be different for each cell line without an obvious selective pressure.
Experimental demonstration of mtDNA exchange between cells
In order to explicitly demonstrate the exchange of mtDNA between cells, we co-cultured cell-lines with distinct private haplotypes. Two sets of pairs including MDA-MB-157 and HCC as well as A382 and U20S were used. For each pair, one of the cell-lines was labeled with GFP (by transfection with a vector expressing GFP). After approximately 20 passages, the cells were sorted for the GFP marker by FACS, and mtDNA from the sorted cells were sequenced. The sorted cells were greater than 99% pure based on FACS.
Tables 1 and 2 shows the results of sequencing mtDNA from these co-culture experiments. We detected variants private to one cell-line in the co-cultured partner cell-line, suggesting the transfer of mtDNA between the cell-lines. Not every private variant was transferred, arguing against the results arising from errors in sorting or cytoplasmic/nuclear exchange between cells. The purity of the sorted cells, based on FACS, further suggests that nuclear exchange does not account for the findings.
Discussion
Accurate sequencing of mtDNA is important for sensitive measurements of heteroplasmy, whose variability can have clinical signifcance, as a biomarker and in disease progression[29]. Mseek provides a means to purify and deeply sequence mtDNA and determine heteroplasmy accurately by eliminating Numts and PCR-related biases. The sensitivity of Mseek is a function of sequencing depth alone. This is one of the most detailed and extensive survey of mtDNA from cell lines yet obtained.
Accurate identification of variant frequencies is not possible through deep sequencing methods currently in use. So far, deep sequencing approaches to mtDNA have used either long-range PCR[21, 30], or a multitude of mtDNA-specific PCR primers to amplifiy short overlapping mtDNA fragments (≈ 650 nt) which are ligated to each other, fragmented and prepared for sequencing [15]. Mining of whole-genome[24, 31] and whole-exome[23, 16] data has also been used to identify mtDNA fragments. A new approach uses methyl-specific endonucleases MspJI and AbaSI to deplete nDNA that is likely to be methylated[32]. A failing here is the Numts are not always methylated. PCR-amplicon biases, the inability to identify polymerase errors and contamination from Numts call into question the sensitivity of these methods to low frequency variants. The ability of Numts to confound analyses is highlighted by a study that used whole-genome data from the TCGA and inferred that deleterious mtDNA mutations are more common in cancer cells compared to normal tissue[24]. In contrast, findings of low mutations rates in tumor mtDNA from a colorectal cancer study[15] are more in line with our findings that cell lines don’t exhibit higher rates of deleterious mutations compared to normal cells from human tissues.
We have shown here that cells from a wide-range of cell lines and human samples exhibit heteroplasmy, in accord with results from several studies[15, 16]. This suggests that heteroplasmy might be an essential feature of mtDNA. In fact, heteroplasmy seems to provide a fingerprint that can identify cells. A larger survey is needed to understand the resolution of this fingerprint and its ability to distinguish cellular origins. We found that mtDNA from transformed human cell-lines and primary human lymphocytes are similar with respect to the distributions of densities and frequencies of mutations (benign and deleterious ones). Non-coding gaps between mtDNA genes are highly conserved, indicating they might be control elements.
Clonal amplification of cells does not lead to a selection of particular mtDNA haplotypes, in fact, heteroplasmy is very stable, at least over the 25 or so divisions of cell lines that we have studied. This stability of heteroplasmy in cell lines is surprising in light of 1) the higher rates of mutation in mtDNA[33] which should increase the diversity of mtDNA, and 2) drift, which should lead to homoplasmy in about 70 generations[26, 15]. The stability of heteroplasmy against drift could arise from exchanges of mtDNA between cells which can be inferred from our cell-line data (Fig. 4) in conjunction with simulations (Fig. 5) and co-culturing experiments (Tables 1 and 2). The transfer of mtDNA seems to occur in a selective manner, suggesting either there are incompatibilities between the mtDNA haplotypes or between certain haplotypes and the nuclear genome. There is some indication that the amount of transfer increases over the number of passages of co-culture, based on our limited set of experiments, establishing this definitively requires a more long-standing experiment with sampling at different time points. A co-evolution of mtDNA and nDNA has in fact been suggested earlier[33]. This is also consistent with a study in mice that suggests that mitochondria from different species cannot co-exist[34]. The selective advantage of certain mtDNA haplotypes can additionally contribute to the stability of the mtDNA.
The exact mechanisms of mtDNA transfer are not known. Horizontal transfer of genetic material between species of yeasts has been shown[35] and there is increasing interest in organelle transfer between cells through microtubule formation[36]. Within a cell, networks of mitochondria are created through fusion, mediated by fusin, which leads to the exchange of mtDNA[37]. This is necessary for functional mitochondria; knocking out fusin causes muscles to atrophy through the accumulation of deleterious mutations[37]. In vivo, exchanges of mitochondria between cells has also been demonstrated in the rejuvenation of cells with damaged mitochondria by transfer of functional mitochondria from mesenchymal stem cells[38]. Rejuventation of cells containing damaged mtDNA by transfer of functional mtDNA from neighboring cells in culture has also been observed[39]. Ours is the first demonstration of mtDNA transfer between cells with functional mtDNA.
This is the first explicit demonstration of mtDNA transfer between cells with functional mtDNA whereas previous studies have shown transfer from cells with functional mtDNA into ones with non-functioning mtDNA[39, 38]. The proposed exchange of mtDNA between cells can explain its stability over the lifetime of an organism, and over generations, inferred from the relative lack of major agerelated disorders originating in the mtDNA and the ability to infer geographic origins of a person from the mtDNA sequence. The stability of mtDNA against deleterious mutations could also be enhanced by a coupling between replication and transcription[40], ensuring the depletion of non-functional mtDNA by inefficiencies in their replication.
By making mtDNA sequencing economical, Mseek enables large-scale studies of heteroplasmy for GWAS applications and clinical monitoring of mtDNA in tissues. The sequencing of mtDNA in cell-lines allows us, for the first time, to understand the nature of mtDNA variability and its maintenance in cell populations. There is great value in surveying large populations in order to establish the normal range of heteroplasmy for use in GWAS studies. The transfer of functional mtDNA into diseased cells could be used as therapy to treat disorders arising from mtDNA defects. Somatic mutations in mtDNA could play a role in various human disorders and in aging, especially when the transfer between cells is impeded and mechanisms involved in mtDNA transfer might be fruitful targets for therapeutic intervention.
Methods
Mseek
We have developed a new method of isolating and sequencing mtDNA (Fig. 2). The results section contains details of its performance. Briefly, the method consists of the following steps, total DNA is isolated from the sample. The nDNA is digested using Exonuclease V. The products are purified using Ampure beads to remove short fragments. Using PCR primers specific to mtDNA and nDNA, the purity of the treated samples is tested (Fig. 1B). Following this, the sample is fragmented using Covaris and end-repaired. Barcoded adapters compatible with the sequencing platform are ligated to the fragments. The universal adapters are used to amplifiy the library and prepare it for deep sequencing.
Cell Culture
mtDNA was isolated and sequenced from several cell lines including, 293 T (a kidney-cancer derived cell line), U2OS and Saos-2 (human osteosarcoma cell lines) and MDA-MB-157 (metastatic human breast cancer cell line).
All cells were grown in Dulbecco’s modifed Eagle’s medium (DMEM; Invitrogen), 10% heat-inactivated fetal bovine serum (FBS; Invitrogen) and 50 U/ml penicillin and streptomycin (Pen/Strep; Invitrogen). Cultures were maintained at 37° C in 5% CO2.
Clonal isolation of tumor cells was performed by serial dilution into 96-well plates and visual examination of wells for single cells, which were then expanded for an additional 28-30 population doublings.
Analyses
Sequences that map to repeat elements (which occur only in the nDNA) allow reliable estimation of the level of nDNA contamination, which ranged from 0.5 - 1.5%.
MiST[25], a variant detection tool for whole-exome data, was used to call mtDNA variants. The reference mitochondrial genome has the accession NC_012920 from Genbank. The mtDNA annotations are from MITOMAP13, and SNP annotations are from dbSNP14. The error rate in Miseq and Hiseq reads are approximately 1 in a 1000, so requiring at least 3 non-clonal reads to have the variant to make the call, reduces the error rate to well under 1 in a million. Variants with reads predominantly in one strand are excluded to firther reduce errors, based on our previous experience[25].
We developed a pipeline to assemble the mitochondrial genome from the deep-sequencing data, to demonstrate that the reads assemble into a circle and no large deletions, duplications or other large-scale structures were detected.
Mutation Assessor[27] was used to assess the impact of mtDNA mutations on protein function. This tool uses conservation of structure across orthologues to identify mutations in the DNA (and consequent changes in amino-acids) with potentially deleterious effects. The mutations are rated high, medium, low, or neutral based on their impact on protein function. We highlight the high and medium impact mutations in our graphs, as they may affect mitochondrial function.
Competing interests Author’s contributions
AJ and RS designed Mseek and designed several experiments. SA and MW designed several experiments with cell-lines. LL provided human samples and suggested applications. AJ, JS, RL performed Mseek, EB performed cell-line work. AJ and RS wrote the paper.
Author details
3Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, USA.
Acknowledgements
RS, AJ, LL were partially supported by the grant 1R21HG007394-01 from the NIH. Some experiments were also funded by a pilot grant from the Venture Capital Research Funding Program of Children’s Environmental Health Center (CEHC) at Mount Sinai. Brian Brown helped craft the message. Comments from Viviana Simon, and Benjamin tenOever added clarity. Avinash Waghray, Sunniva Bjorklund and Vessela Kristensen caught numerous errors and gave many suggestions to improve the writing.