Abstract
To function properly, mitochondria utilize products of 37 and >1,000 genes encoded by the mitochondrial and nuclear genomes, respectively, which should be compatible with each other. Discordance between mitochondrial and nuclear genetic ancestry could contribute to phenotypic variation in admixed populations. Here we explored potential mito-nuclear incompatibility in six admixed human populations from the Americas: African Americans, African Caribbeans, Colombians, Mexicans, Peruvians, and Puerto Ricans. For individuals in these populations, we determined nuclear genome proportions derived from Africans, Europeans, and Native Americans, the geographic origins of the mitochondrial DNA (mtDNA), as well as mtDNA copy number in lymphoblastoid cell lines. By comparing nuclear vs. mitochondrial ancestry in admixed populations, we show that, first, mtDNA copy number decreases with increasing discordance between nuclear and mitochondrial DNA ancestry, in agreement with mito-nuclear incompatibility. The direction of this effect is consistent across mtDNA haplogroups of different geographic origins. This observation suggests suboptimal regulation of mtDNA replication when its components are encoded by nuclear and mtDNA genes with different ancestry. Second, while most populations analyzed exhibit no such trend, in Puerto Ricans and African Americans we find a significant enrichment of ancestry at nuclear-encoded mitochondrial genes towards the source populations contributing the most prevalent mtDNA haplogroups (Native American and African, respectively). This likely reflects compensatory effects of selection in recovering mito-nuclear interactions optimized in the source populations. Our results provide the first evidence of mito-nuclear effects in human admixed populations and we discuss its implications for human health and disease.
Introduction
Mitochondria participate in some of the most vital functions of eukaryotic cells, such as generation of ATP via oxidative phosphorylation (OXPHOS), regulation of calcium uptake, apoptosis, and metabolism of essential nutrients 1. Mitochondria harbor their own genome (mitochondrial DNA, or mtDNA) with 37 genes encoding 13 proteins essential for OXPHOS and the ribosomal and transfer RNAs required for their translation. Additionally, >1,000 nuclear genes encode proteins involved in mitochondrial function 2,3. The nuclear genome encodes most of the subunits of the OXPHOS complexes, and proteins required for replication and transcription of mtDNA. The nuclear-encoded mitochondrial genes must be transcribed in the nucleus, translated in the cytoplasm, and directed to mitochondria with the help of translocases and mitochondrial membrane proteins, which are themselves encoded by the nuclear genome 2,4. Thus, mitochondrial functions, and therefore many cellular functions in general, rely on fine-tuned interactions between mtDNA and the products of mtDNA-encoded and nuclear-encoded mitochondrial genes. As a result, we expect mtDNA and nuclear-encoded mitochondrial genes to coevolve, i.e. to undergo mito-nuclear coevolution 5–11. This expectation is especially plausible because of a smaller effective population size, higher mutation rate, and thus faster evolution, of mtDNA compared to the nuclear genome 6.
Evidence of mito-nuclear coevolution comes primarily from inter-population hybrids resulting from laboratory crosses of model organisms, such as fruit flies 12–18, marine copepods 19–22, and yeast 23–25. Inter-population hybrids in these organisms frequently exhibit reduced viability and fecundity 16,17,19–21. These phenotypes are associated with altered expression of OXPHOS genes 13,26, reduced OXPHOS activity, decreased ATP production 14, altered mtDNA copy number 20,26, and elevated oxidative damage 21. Fitness can often be restored if the hybrids are backcrossed with the maternal line but not with the paternal line 20, suggesting that their reduced fitness was caused by differences in ancestry between mitochondrial and nuclear genomes, hereafter called ‘mito-nuclear DNA discordance’.
Mito-nuclear incompatibility –- which we define as any phenotypic manifestation of mito-nuclear discordance –- has also been observed in naturally occurring hybrids of diverging populations. For instance, Morales and colleagues (2016) analyzed two populations of the eastern yellow robin (Eopsaltria australis) –- which have adapted to different climates and carry different mtDNA haplotypes –- and discovered that highly differentiated regions of their nuclear genomes are enriched in nuclear-encoded mitochondrial genes. This observation suggests that divergence between the two populations has been maintained by mito-nuclear incompatibility, in spite of continued gene flow 27. Similarly, Baris and colleagues (2017) analyzed two killifish (Fundulus heteroclitis) populations with divergent mtDNA haplogroups, as well as hybrids between these two populations, and found that admixture fraction across differentiated nuclear loci in hybrids is associated with decreased OXPHOS activity 28. This finding points towards the role of mitochondrial and nuclear ancestry in altering the efficacy of mitochondrial function 28. Such studies demonstrate the utility of naturally occurring hybrid, i.e. admixed, populations in studying mito-nuclear coevolution and the phenotypic effects brought about by its disruption (i.e. by mito-nuclear incompatibility).
Currently, we know very little about the extent of mito-nuclear incompatibility and its contribution to phenotypic variation in humans. Several studies have shown that altered (due to mutations) interactions between mtDNA-and nuclear-encoded factors can modulate disease phenotypes for cardiomyopathy, predisposition to type 2 diabetes, as well as possibly for hearing loss and Huntington’s disease 11,29. Two additional studies have recently explored mito-nuclear incompatibility in humans in more detail. First, Sloan and colleagues (2015) tested for elevated mito-nuclear linkage disequilibrium (LD) across a set of 51 human populations from the Human Genome Diversity Project30–32. They hypothesized that as human populations diverged, certain allelic combinations between mtDNA and nuclear-encoded mitochondrial genes would have been disfavored if they were incompatible and if the effect sizes of such epistatic interactions were sufficiently large. However, they found no evidence of increased LD at single-nucleotide polymorphisms (SNPs) in such genes relative to the genomic background 30. In the second study, Sharbrough and colleagues (2017) investigated whether nuclear-encoded mitochondrial genes in modern humans are depleted in Neanderthal and Denisovan ancestry 33. Such a depletion is expected because previous studies have found no evidence of introgression of Neanderthal and Denisovan mtDNA into modern humans 34,35. If the divergence between humans and archaic hominins was sufficient to cause mito-nuclear incompatibility, selection could have purged archaic hominin alleles at nuclear-encoded mitochondrial genes because of incompatible or unfavorable interactions with human mtDNA. The authors found a significant underrepresentation of Neanderthal, but not Denisovan, ancestry at such genes in modern humans. These results suggest a complex history of mito-nuclear coevolution in modern humans and other hominins.
Studying recently admixed populations is another approach to test for mito-nuclear coevolution in humans. If mito-nuclear coevolution occurred in diverging human populations, then previously co-adapted mito-nuclear interactions could have been disrupted as a result of recent admixture, and mito-nuclear incompatibility might be observed in admixed populations at individual and population levels (Fig. 1). Within any given admixed individual (Fig. 1A), the nuclear genome, because of its biparental inheritance, recombination, and independent assortment, represents a mosaic of ancestry segments 36,37. However, mtDNA, because it is exclusively maternally inherited, maintains maternal ancestry only (Fig. 1A). Any discordance between mitochondrial and nuclear ancestry may cause mito-nuclear incompatibility within admixed individuals, depending on the degree of divergence between the ancestral populations and the extent to which the nuclear genome differs in ancestry from the mitochondrial genome (Fig. 1A).
Signatures of mito-nuclear coevolution can also manifest at a population level (Fig. 1B), particularly if there is sex bias in the genetic contributions from the source populations, as in the case of admixture in the Americas. Colonization and slave trade in the last 500 years resulted in admixture among Native Americans, Africans, and Europeans in the Americas. Historical and genetic studies agree that the early European settlers were primarily males. This population history, compounded with the social stratification that resulted in directionally skewed gene flow between European males and African and Native American females, has resulted in differences in the frequency of European ancestry among the autosome, X chromosome, Y chromosome, and mtDNA 38–47. Increasing levels of sex-biased admixture can lead to increasing discordance between mitochondrial and nuclear ancestry proportions in admixed populations (Fig. 1B). If coadapted mito-nuclear combinations are disrupted in admixed populations, leading to a reduction in fitness, then we expect selection to act towards restoring them, i.e. to shift the average ancestry fraction at nuclear-encoded mitochondrial genes in favor of the source population contributing the highest proportion of females (Fig. 1B).
In this manuscript, we explored signatures of mito-nuclear incompatibility and coevolution in admixed human populations from the Americas using the data from the 1,000 Genomes Project (1000 Genomes Project Consortium et al. 2015). We first tested whether the discordance between mitochondrial and nuclear ancestry in an individual’s genome has an effect on their mtDNA copy number, a cellular phenotype that is a known biomarker for many health-related outcomes, including aging, fertility, and several types of cancers 48–51. Next, we investigated whether within admixed populations there are systematic shifts in ancestry frequency at nuclear-encoded mitochondrial genes towards the source population contributing the highest proportion of mtDNA. Our results present evidence of mito-nuclear incompatibility, and suggest presence of selection geared towards overcoming it, in human admixed populations. Thus, our observations support the notion that mito-nuclear coevolution has occurred in both non-admixed and admixed human populations.
Materials and Methods
mtDNA copy number estimation
Given the average sequencing depth of the autosomes and mtDNA, and the fact that there are two autosomal chromosome copies per cell, we can compute the number of copies of mtDNA per cell:
In the equation for average depth, N is the total number of reads aligning to the chromosome, L is the average length of a read, and G is the size of the chromosome in base pairs. We first calculated mtDNA copy number for each autosomal chromosome separately, and then calculated the mean across all chromosomes.
A subset of samples (a total of 24) in the 1000 Genomes Project Data were sequenced at both low (2-4x) and high coverage (20-40x). We used these to validate whether mtDNA copy numbers calculated from the low-vs. high-coverage alignments agree with each other. As shown in Fig. S1, the copy numbers generally agree, with a few exceptions. Some samples show appreciable mtDNA copy numbers when calculated using the high-coverage alignments but low copy numbers when calculated using the low-coverage alignments (Fig. S1). The source annotation of these samples indicates that some of them (samples for which this information is available) were sequenced from peripheral blood mononuclear cells (PBMCs), instead of Lymphoblastoid Cell Lines (LCLs) (Fig. S1, Table S1). The difference in mtDNA copy number seen between the two cell lines is consistent with previous observations that LCLs are known to carry significantly higher mtDNA copy numbers than PBMCs 52–54. Because annotation for the source DNA is not available for all samples (Table S1), we plotted the density of mtDNA copy number calculated from the low-coverage alignments and observed a clear separation between samples sequenced from PBMCs and LCLs (Fig. S2). We removed all samples with less than 250 mtDNA copies per cell from the downstream analysis to exclude samples which were sequenced from PBMCs in an effort to limit variation due to DNA source. After removing such samples, the correlation coefficient between mtDNA copy number from low-coverage and high-coverage alignments is 0.71, as opposed to 0.66 before removing them.
Global ancestry and mtDNA haplogroup
We downloaded the 1000 Genomes phase 3 vcf files and retained individuals who belonged to the ACB, ASW, CEU, CLM, MXL, PEL, PUR, or YRI populations in subsequent analyses. Global ancestry was calculated using ADMIXTURE 55. For this purpose, we merged the 1000 genomes genotype data with the genotype data from Native American groups published by Mao and colleagues 56. We included SNPs that overlapped across both datasets (a total of 691,435 SNPs) and converted the genotypes to binary format for use with Plink 57,58. We further removed palindromic (A/T, G/C) SNPs to ensure strand consistency across both datasets. Subsequently, the two datasets were merged and SNPs were pruned for LD (r2 threshold of 0.1), which resulted in 88,442 SNPs. We ran unsupervised ADMIXTURE 55 analysis on this genotype dataset for k = 1, 2, 3, 4, and 5 and used the ancestry proportions for k = 3 for all downstream analyses, since it had the lowest cross-validation error (Fig. S3). The three ancestry components, C1-3, correspond to European, African, and Native American ancestry, respectively (Fig. 2A-B).
We used Haplogrep version 2.02 59 to determine the mtDNA haplogroup, for individuals from the 1000 Genomes Project only, as the individuals from Mao et al. 56 were not genotyped for mitochondrial variants. All mtDNA haplogroups were called with high accuracy (minimum posterior probability of 0.78). To increase statistical power, we grouped together haplogroups belonging to the same major haplogroup (e.g. L1b1a3 was grouped with L3d1b1 under the L major haplogroup; Fig. 2C). We further grouped major haplogroups into regional groups, corresponding to pre-colonization origins (L: African; A, B, C, D: Native American; H, J, K, T, U, V, W: European) (Figs. 2C-D, S4). We excluded two individuals (HG01272 and NA19982), whose mtDNA was predicted to belong to the M haplogroup most frequently found in South Asia. This way, both nuclear and mitochondrial ancestry was categorized into only three regional groups: Native American, European, and African (Fig. 2D).
Local ancestry
Local ancestry for autosomes was generated using RFMix 60 by the 1000 Genomes Project admixture working group as described by Martin and colelagues 61 https://personal.broadinstitute.org/armartin/tgp_admixture/snp_pos/). For downstream analyses, we masked out regions of the genome where local ancestry was called with less than 0.9 maximum posterior probability.
Sex-biased admixture
The relative contribution of males and females from each of the three relevant ancestral groups (African, European, and Native American) was inferred by comparing the ancestry fractions estimated from the X chromosome and the autosome using the approach described in 62. If is the proportion of ancestors of the admixed group, who were female and from population i, and who were male and from population i, we assume that for each admixed group (e.g. PUR, CLM etc.), , the mean autosomal ancestry fraction from population i, where i ε {African, Native American, European}. Furthermore, we assume that . Thus, for values of and , the expected ancestry fraction for the X chromosome in the population, is 63:
We performed a grid search for the values of and that equal and minimize the squared deviation between , predicted using equation (3), and the mean ploidy-adjusted X-chromosomal ancestry fraction inferred from genotype data. Estimated values of and and confidence intervals around these estimates, generated by bootstrapping, are shown in Fig. S5.
Simulation of drift since admixture
We simulated the expected amount of drift in local ancestry since admixture in Puerto Ricans and Colombians using a simple hybrid-isolation demographic model 64,65 where all three ancestry groups mix at some time in the past in proportions equal to the mean global ancestry averaged across individuals in the population. The resulting population is then allowed to mate randomly for g generations, simulated by drawing 2N autosomes, and N/2 copies of mtDNA, in each generation with the probability of drawing a locus of European, African, or Native American ancestry determined by the relative ancestry proportions in the previous generation. After g generations, the final ancestry frequency at each locus, averaged across individuals in the population, is recorded. We repeated this process 10,000 times to simulate the amount of drift for 10,000 independent loci. We used 17 generations for g in PUR and 14 generations for g in CLM, similar to values estimated for these populations by Gravel et al. (2013), and 1,250 for N following Tang et al. (2007). While our model is simple and does not reflect the true demographic history of the populations, which likely involved continued gene flow for many generations as well as complex, non-random mating patterns, our goal is not to infer the true demographic history of these populations but solely to simulate the amount of drift in local ancestry since admixture. As we show in Fig. S6, the simulated data match the observed distribution of local ancestry quite well.
Local ancestry enrichment in nuclear-encoded mitochondrial genes
First, we calculated the frequency of Native American, European, and African ancestry at every SNP in the genome by averaging across all individuals in a population. These were subtracted from the the mean ancestry fraction across all SNPs to calculate the deviation in local ancestry at each SNP. We used the list of genes from MitoCarta 2.0 and split the list into mitochondrial and non-mitochondrial genes according to the classification provided 66. We further split the mitochondrial genes into two subsets to separately analyze a list of 167 genes curated by Sloan et al. (2015), which code for proteins that are part of the replication and transcription machinery of mtDNA, and the ribosomal and OXPHOS complexes in the mitochondria. We classified this list of genes as ‘High-mt’ and the remaining mitochondrial genes as ‘Low-mt’. An unweighted block bootstrap approach was used to generate the distribution of mean deviation in local ancestry for each gene category. We generated windows of 5 Mb spanning each gene (± 2.5 Mb on either side of a gene’s midpoint) to take into account LD among SNPs. Subsequently, we used bedtools 67 to intersect SNPs, at which local ancestry deviation was calculated previously, with these windows. For each gene category, 167 windows were sampled with replacement, to match the number of genes in the smallest category (i.e. High-mt), and the mean ancestry deviation was calculated, first for each window, and then across windows. This process was repeated 1,000 times to generate a distribution of mean deviation in local ancestry for each gene category.
Results
Nuclear and mitochondrial ancestry proportions in the admixed populations
To study mito-nuclear incompatibility and coevolution in humans, we analyzed genomic alignments and genotype data from six admixed populations that are part of the 1000 Genomes Project 68: (1) African Americans from the Southwest (ASW); (2) African Caribbeans from Barbados (ACB); (3) Colombians from Medellin, Colombia (CLM); (4) Mexicans from Los Angeles (MXL); (5) Peruvians from Lima, Peru (PEL); and (6) Puerto Ricans from Puerto Rico (PUR). We also used data from Utah residents of Northern and Western European ancestry (CEU) and from Yorubans from Ibadan, Nigeria (YRI), who serve as proxies for the European and African source populations, respectively. To represent the Native American ancestry component, we analyzed previously published genotype data 56 from the following four groups: (1) Aymara; (2) Nahuan; (3) Quechua; and (4) Maya. The data set is summarized in Table 1.
We determined global ancestry –- the overall contribution of African, European, and Native American ancestry to the nuclear genome –- of each analyzed individual using ADMIXTURE (Fig. 2A-B; see Methods for details). In agreement with previous studies 44,61, the individuals from admixed populations derive their genetic ancestry from three primary source populations: Native American, European, and African (ADMIXTURE cross-validation error is lowest at k = 3; Fig. S3; see Methods). The proportion of ancestry from each source population varies among the admixed populations (Fig. 2B; Table S2 and S3) because of differences in admixture histories 44,45. We also determined the mtDNA haplogroup for each individual (Table S4; see Methods for details). Among the admixed individuals, African (L) and Native American (A, B, C, and D) mtDNA haplogroups are more frequent than European haplogroups (H, J, K, T, U, V, and W; Fig. S4), consistent with female bias in the non-European contribution 69.
mtDNA copy number decreases with increasing mito-nuclear DNA discordance
We hypothesized that increasing discordance between nuclear and mitochondrial ancestry in admixed individuals will lead to an increase in the degree of incompatibility (Fig. 1), for instance, between mtDNA origins of replication and nuclear-encoded mtDNA replication machinery 70,71 and therefore, to a decrease in mtDNA replication efficiency. If our hypothesis is correct, then mtDNA copy number should decrease with increasing degree of mito-nuclear DNA discordance (Fig. 1A). To evaluate this prediction, we determined the mtDNA copy number from sequence alignments of DNA extracted from lymphoblastoid cell lines (LCLs; see Methods). We computed mtDNA copy number for each individual from the six admixed populations as well as from the CEU and YRI populations (which were used as proxies of the European and African source populations, respectively (there are no ‘non-admixed’ Native Americans who are part of the 1000 Genomes Project Dataset 68). An advantage of using LCLs to study mtDNA copy number variation is that they exhibit high mtDNA content and elevated expression of genes involved in mtDNA replication and transcription, as well as of respiratory genes, consistent with elevated mitochondrial biogenesis 72. Furthermore, because they are maintained following standard protocols in a laboratory, variation due to differences in individuals’ environments, from whom the LCLs are derived, is unlikely to systematically confound our analysis of mtDNA copy number.
We found that mtDNA copy number decreases as nuclear ancestry becomes increasingly dissimilar to mtDNA ancestry (Fig. 3A and Fig. 4), consistent with our hypothesis (Fig. 1A). To obtain this result, we regressed mtDNA copy number against the degree of discordance between mtDNA ancestry and nuclear ancestry, as measured by the fraction of the nuclear genome that is from a different geographical origin than the mtDNA haplogroup. For instance, for individuals with Native American mtDNA haplogroups, mito-nuclear DNA discordance is the proportion of nuclear ancestry that is not Native American (i.e. African plus European). There is a significant negative correlation between mtDNA copy number and mito-nuclear DNA discordance in admixed individuals (Fig. 3A; Beta = −0.193, one-sided P-value = 1.14 × 10−04, r = −0.19). The intercept of this slope, i.e. when mito-nuclear DNA discordance is zero, is similar to the median mtDNA copy number in the individuals from source populations, CEU and YRI, who are not admixed (Fig. 3B). The negative correlation between mtDNA copy number and mito-nuclear DNA discordance in admixed individuals is consistent across mtDNA haplogroups from three different geographic origins –- Native American (Beta = −1.06; P = 1.53 × 10−05), African (Beta = −0.63; P = 0.026), and European (Beta = −0.30; P = 0.370) –- even though it is not always statistically significant. Moreover, in each case, copy number for mtDNA of one geographic origin decreases with increasing nuclear ancestry from each of the other two geographic origins (panels outside of the top-left to bottom-right diagonal in Fig. 4). For instance, for individuals with Native American mtDNA haplogroups, the mtDNA copy number decreases with increase in both African and European ancestry. Conversely, as the ancestry between mtDNA and the nuclear genome becomes more similar, mtDNA copy number increases (top-left to bottom-right diagonal in Fig. 4). The lack of power, especially in admixed individuals carrying European haplogroups, is likely due to the limited female European contribution (Fig. 4). Overall, our results show that a phenotype, mtDNA copy number, is negatively correlated with mito-nuclear DNA discordance, consistent with mito-nuclear incompatibility in admixed individuals (Fig. 1A).
Sex-bias in admixture inferred from the nuclear and mitochondrial genomes
According to our second hypothesis, we expect mito-nuclear DNA discordance at a population level to increase with increasing degree of sex bias in admixture (Fig. 1B). For example, if population 1 contributes only females and population 2 contributes only males to the admixed population (third admixture scenario in Fig. 1B), the expected percentage of mtDNA ancestry from population 1 is 100% compared to 50% for autosomal loci. Our results corroborate previous studies 44–46,61, suggesting that gene flow in the Americas was sex-biased. To quantify the degree of sex bias from a source population, we estimated the male and female contributions to each admixed group from that source population, by comparing the global ancestry proportions on the autosomes and the X chromosome (Figs. 5 and S5; see Methods). In all of the six admixed populations, the European contribution is male-biased and non-European (African and Native American) contribution is female-biased. For example, there were likely more males, than females, who contributed European ancestry to Peruvians (Fig. 5).
We further compared the observed proportion of mtDNA haplogroups of African, European, or Native American origin with the estimated proportion of contributing females from these source populations as inferred from the nuclear genome (Fig. 6). Both independently measure the female contributions and thus are expected to be similar. We found that, in most admixed populations analyzed, the observed frequency of mtDNA haplogroups falls within the expected distribution (generated using a non-parametric bootstrap, see Methods) of the proportion of females from each source population (Fig. 6). However, in Colombians (CLM) and Puerto Ricans (PUR), the frequencies of Native American mtDNA haplogroups are much higher and, concomitantly, the frequency of European haplogroups are much lower, than expected (Fig. 6). At least two factors can explain this result. First, since mtDNA represents a single genealogical history, it yields a ‘noisier’ estimate of the proportion of females from each parental population than the estimate based on autosomal and X-chromosomal loci, which represent multiple genealogical histories because of recombination and independent assortment. Second, we expect larger fluctuations in mtDNA ancestry as a result of drift because of its smaller effective population size compared to that of autosomal or X-chromosomal loci. To test whether genetic drift since admixture can account for the deviation in mtDNA frequency in Puerto Ricans and Colombians, we simulated the amount of drift expected for mtDNA in these populations based on their admixture history. In both Colombians and Puerto Ricans, drift since admixture is not sufficient to account for the observed deviation in mtDNA ancestry (Fig. S7).
Selection on local ancestry at nuclear-encoded mitochondrial genes
Based on our second hypothesis, we expected selection at nuclear-encoded mitochondrial genes to favor ancestry from the source population contributing the highest proportion of females, to compensate for potentially maladaptive mito-nuclear combinations resulting from recent admixture. For example, in Puerto Ricans, since there is a high frequency of Native American mtDNA, we expect selection to favor Native American ancestry at nuclear-encoded mitochondrial genes. Our ability to detect such a signature at individual loci is limited because of: (1) the large number of genes involved in mitochondrial function (>1000) 2, (2) the low amount of genetic differentiation among human populations (Fst ≈ 0.1) 73, (3) only a relatively recent admixture history in the populations analyzed 44,45,61, and (4) the relatively small sample size of our data set. However, we might be able to detect systematic shifts in the ancestry frequency for nuclear-encoded mitochondrial genes as a group. To test for such a signal of ancestry enrichment at nuclear-encoded mitochondrial loci, for each admixed population analyzed, we first calculated the deviation in local ancestry at each SNP by subtracting the global Native American, European, and African ancestry proportion in that population (see Methods). Subsequently, we downloaded a list of nuclear genes from MitoCarta 2.0 66,74 and split them into mitochondrial (N = 960) vs. non-mitochondrial (N = 17,456) based on their classification (Table S5). The mitochondrial genes encode proteins with experimental evidence of mitochondrial localization, whereas the non-mitochondrial genes have no such evidence 66. We next followed a published approach 30 and further split the 960 mitochondrial genes into two subsets –- 167 high-confidence, or ‘High-mt’, genes (genes encoding proteins that are part of the mtDNA replication and transcription machinery, and of ribosomal and OXPHOS complexes) and remaining 793 ‘Low-mt’ genes (Table S5). We used a block bootstrap approach to generate a distribution of the mean deviation in local ancestry for each functional gene category (for 167 High-mt, 793 Low-mt, and 17,456 non-mitochondrial genes, see Methods for more details). Based on our hypothesis, local ancestry should deviate from neutral expectations in favor of the source population contributing the highest proportion of mtDNA (Fig. 1B).
We find that in three out of six admixed populations analyzed, the mean ancestry of nuclear-encoded mitochondrial genes does not significantly deviate from expectation (the 95% bootstrapped confidence interval spans the zero line; Fig. 7), consistent with no evidence of selection for local ancestry at such genes. However, we found a significant enrichment in Native American ancestry at High-mt genes in Puerto Ricans, and an enrichment in African ancestry at High-mt genes in African Americans (Fig. 7). Because Native American mtDNA haplogroups are more frequent in Puerto Ricans, and African mtDNA haplogroups are more frequent in African Americans (Fig. 5), these results are consistent with the predictions of our hypothesis (Fig. 1B). However, we also observe a significant enrichment in European ancestry at High-mt genes in Mexicans (Fig. 7), which contradicts our hypothesis. This lack of a consistent pattern of enrichment across admixed populations paints a complex picture of the selection regimes acting on nuclear-encoded mitochondrial genes. It should be noted that the systematic shifts in ancestry that we observe for the High-mt genes are inconsistent with sampling artifacts because we do not observe such patterns for non-mitochondrial genes (category ‘Non-mt’ in Fig. 7).
Discussion
The potential contribution of epistatic interactions between nuclear- and mtDNA-encoded mitochondrial genes to phenotypic variation in human populations is poorly understood. Admixed populations provide a unique opportunity to investigate this question. Because of the difference in the mode of inheritance between mitochondrial and nuclear DNA, and the frequently sex-biased nature of admixture, the ancestry between mitochondrial and nuclear genomes is frequently mismatched in admixed individuals. Such a discordance can potentially disrupt fine-tuned interactions between mtDNA-encoded and nuclear-encoded mitochondrial genes. We investigated the potential consequences of the mito-nuclear DNA discordance in six admixed American populations represented in the 1,000 Genomes Project. Due to complex admixture history 44,45,61, the degree of discordance in ancestry between mtDNA and nuclear DNA differs both across populations and among individuals from the same population.
First, we tested for effects of mito-nuclear DNA discordance on the phenotype of admixed individuals. We found that mtDNA copy number, a biomarker for several phenotypes 48, decreases as the nuclear and mtDNA become increasingly dissimilar in ancestry. Because mito-nuclear compatibility is important for regulation of mtDNA copy number, its reduction due to mito-nuclear DNA discordance might reflect the incompatibility between the origins of mtDNA replication and nuclear-encoded proteins involved in mtDNA replication 75. Interestingly, several of the studied haplogroups –- B, D, H, T, U and J –- have fixed differences in the experimentally established mtDNA recognition sites for proteins of the replication machinery 76. These substitutions could lead to population differences in mtDNA binding affinity of polymerase-γ (POLG), mtDNA helicase (TWINKLE), and mitochondrial single strand binding proteins (mtSSB), three of the major proteins involved in mtDNA replication 71, which themselves also exhibit substantial sequence diversity in humans (unpublished data from the Makova Lab). Whether this diversity in the mtDNA recognition sites and in the replication machinery is the result of mito-nuclear co-evolution remains to be tested. Future functional experiments should validate effects of mtDNA haplogroup and nuclear DNA combinations on mtDNA copy number directly. This can be performed, for instance, in cybrids –- cellular hybrid lines carrying the same nuclear genetic background but different mtDNA haplotypes 77. Cybrids carrying different mito-nuclear ancestry combinations would also be helpful in elucidating whether mito-nuclear DNA discordance leads to higher mutation and heteroplasmy burden in mtDNA. Nevertheless, we have shown that mito-nuclear DNA discordance contributes to phenotypic variation in admixed individuals, despite relatively low differentiation among human populations 73,78.
Even though a number of biological and technical factors can influence mtDNA copy number, they are unlikely to bias our results in a systematic way. Mitochondrial biogenesis and decay are dynamic processes which can change in response to environmental factors, such as nutrient availability, oxidative stress, and temperature 79–81. While such factors can covary with ancestry in admixed populations (e.g. socioeconomic status, a predictor of stress, is highly correlated with ancestry in Latino populations 82), we do not think that they have significantly affected our results. This is because we measure mtDNA copy number in lymphoblastoid cell lines (derived from peripheral blood cells) that have been cultured under standard laboratory conditions for long periods of time. Thus, environmental variation among cells is minimal and does not reflect the initial variability that existed among samples when they were first collected. Additionally, even though many environmental variables can covary with ancestry in admixed populations, they could not have confounded our results as the ancestry proportions of the cells were unknown when they were first cultured. Thus, biological variation, minimal due to standardized cell culturing, and technical variation, due to sampling and sequencing, would add some noise to the measurement of mtDNA content rather than a systematic bias and would not lead to the observed correlation between mito-nuclear DNA discordance and mtDNA copy number, a pattern consistent across mtDNA haplogroups. This correlation should be replicated and explored in future studies, ideally with mtDNA copy number measured across biological and technical replicates to reduce noise. It would also be interesting to test the effect of mito-nuclear DNA discordance on other mitochondrial phenotypes such as mitochondrial morphology and rate of ATP production.
Next, we leveraged sex-biased admixture to study effects of mito-nuclear incompatibility at the population level. Consistent with previous studies 44–46,61, we found a significant sex bias in admixture for all six admixed populations studied. In particular, more European males than females, and more African and/or Native American females than males contributed to admixture. In most admixed populations we found a high level of concordance between the female contribution estimated from nuclear genetic markers and that estimated from mtDNA, for each source population. However, in Colombians and Puerto Ricans, the proportion of Native American ancestry in mtDNA is significantly higher than expected based on the ancestry information in the nuclear genome. This deviation cannot be explained by genetic drift experienced by mtDNA since admixture due to its smaller effective population size, and could be due to selection for the Native American haplogroups in Colombians and Puerto Ricans, if the Native American mtDNA was better adapted to the environment. Indeed, climate adaptation may have played a role in mtDNA diversification across human populations 83,84. Differences in the local environment among admixed populations might explain why we do not observe similar deviations in the other admixed populations.
The assumed model of admixture dynamics has several limitations, which could lead to biased estimates of the female and male contributions. Specifically, we assume a hybrid-isolation model with equal reproductive variance and similar generation times between males and females. First, models that incorporate continuous gene flow are likely to yield slightly different estimates, especially if admixture occurred recently, i.e. within the last five generations (Goldberg et al. 2015). Because admixture for the populations used in this study started much earlier (>10 generations ago, 44,45,61), this is not a major concern in our case. Second, shorter generation times in females than in males would result in a smaller effective population size for the mtDNA compared to the nuclear genome 85. This would lead to stronger drift in mtDNA, which might explain the observed discrepancy in ancestry proportions between mtDNA and the nuclear genome in Colombians and Puerto Ricans without invoking selection. Third, men typically tend to have higher variance in reproductive success relative to women 86,87, which would increase the effective population size of the mtDNA compared to the nuclear genome and might render the observed discrepancies in Colombians and Puerto Ricans non-significant. A detailed discourse of how these competing processes affect the inference of sex bias in admixture dynamics is beyond the scope of this paper, but needs to be addressed in future studies. Despite these concerns, it is clear that the frequency of non-European mtDNA haplogroups is much higher than the frequency of non-European ancestry at nuclear loci, in all six populations.
Finally, we hypothesized that the discordance in the ancestry frequency between mtDNA and nuclear loci would lead to selection on nuclear-encoded mitochondrial genes due to mito-nuclear coevolution. In three out of the six admixed populations studied (African Caribbeans, Colombians, and Peruvians, we found the ancestry of nuclear-encoded mitochondrial genes to be consistent with the expectation based on female contribution estimated from mtDNA data (Fig. 7). This could mean that there has been no selection on mito-nuclear interactions in these populations or that we do not have power to detect it because of drift and/or relatively limited sample size. We observed an enrichment of Native American ancestry in Puerto Ricans, and an enrichment of African ancestry in African Americans, at nuclear-encoded mitochondrial genes. Because Puerto Ricans predominantly carry Native American mtDNA haplogroups, whereas African Americans primarily carry African haplogroups, this observation is consistent with our expectation of selection against mito-nuclear incompatibility in admixed populations. However, we also found a significant enrichment of European ancestry in Mexicans –- another population with predominantly Native American mtDNA haplogroups. This observation is inconsistent with selection against mito-nuclear incompatibility in admixed populations. This complex picture among different admixed populations suggests that other competing selective pressures on mitochondrial genes might be involved and it may not be straightforward to detect signatures of mito-nuclear incompatibility from deviations in local ancestry alone.
Our ability to detect mito-nuclear incompatibility signatures is influenced by several factors including degree of sex bias, time since admixture, effective population size, selection strength, and the number of loci under selection. One potential limitation of this analysis is that we assume that all nuclear-encoded mitochondrial genes have equal effect sizes for mitochondrial function. Since mitochondrial function is a highly complex trait, this assumption is likely incorrect. A more accurate way of testing for selection at these genes would be to weigh the contribution of each locus by its effect size. Unfortunately, we do not know what these effect sizes are because genome-wide association studies of mitochondrial phenotypes, such as mtDNA copy number and rate of ATP production, are yet to be conducted in humans. While systematic collection of such data for large cohorts of individuals is pending, it would also be highly informative to explore the effects of mito-nuclear interactions on various health-related phenotypes in large-scale datasets such as the UK Biobank 88.
In conclusion, our results demonstrate that discordance between mtDNA and nuclear ancestry for mitochondrial genes can affect the phenotype of admixed individuals. Note, however, that the observed effect on mtDNA copy number is relatively small and needs to be explored in other cells and tissue types. Other phenotypes potentially affected by mito-nuclear DNA discordance, e.g. ATP production, should be examined as well. Mito-nuclear DNA discordance contributes to disease phenotypes of non-admixed individuals (reviewed in 11, therefore we expect this phenomenon to contribute even more to disease variation in admixed individuals, and this needs to be evaluated in future studies. Such evaluation is also critical for making advances in mitochondrial replacement therapy (MRT), a technique in which the mtDNA carrying disease-associated mutations in a patient’s oocyte is replaced with mtDNA from a healthy donor oocyte 89–91. Despite the success of MRT, many human and non-human primate embryos created via mitochondrial replacement do not develop normally 89. Mitochondrial replacement can also lead to detrimental effects on growth, development, respiration, metabolism, aging, fertility, and survival in non-primate animals 89. Human cybrid lines also show variation in mtDNA copy number, ATP turnover rates, reactive oxygen species production, and expression of OXPHOS genes 92. Despite these observations, the degree to which the mitochondrial haplogroup of a donor should match the genetic background of the ‘nuclear’ parents in MRT in humans remains unanswered. The answer to this question is even less clear for admixed individuals whose nuclear genomes are a mix of ancestry from different populations. Our work highlights the potential of studying admixed individuals to better understand phenotypic effects of mito-nuclear DNA discordance, which will be useful in elucidating MRT-associated risks and evaluating disease susceptibility in contemporary admixed and non-admixed populations 11,93.
Description of Supplemental Data
Supplementary data include eight figures and five tables.
Declaration of Interests
The authors declare no competing interests.
Supplementary Figures
Acknowledgments
We thank Rasmus Nielsen, Mark Shriver, and William Chase for their comments on the manuscript. This project was supported by a seed grant awarded to AAZ and KDM from the Center of Human Evolution and Development (CHED) at The Pennsylvania State University, and by a grant from NIH (R01GM116044). Additional funding was provided by Penn State Eberly College of Sciences, The Huck Institute of Life Sciences at Penn State, and the Penn State Institute for CyberScience, as well as, in part, under grants from the Pennsylvania Department of Health using Tobacco Settlement and CURE Funds. The department specifically disclaims any responsibility for any analyses, responsibility, or conclusions.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.
- 8.
- 9.
- 10.
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.
- 50.
- 51.↵
- 52.↵
- 53.
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.
- 91.↵
- 92.↵
- 93.↵