ABSTRACT
Mitochondrial DNA copy number (mtDNA-CN) has been associated with a variety of aging-related diseases, including all-cause mortality. However, the mechanism by which mtDNA-CN influences disease is not currently understood. One such mechanism may be through regulation of nuclear gene expression via the modification of nuclear DNA (nDNA) methylation. To investigate this hypothesis, we assessed the relationship between mtDNA-CN and nDNA methylation in 2,507 African American (AA) and European Americans (EA) participants from the Atherosclerosis Risk in Communities (ARIC) study using the Infinium Human Methylation 450K Beadchip (485,764 CpGs). Thirty-four independent CpGs were associated with mtDNA-CN at genome-wide significance (P<5×10−8). To validate our findings we assayed an additional 2,528 participants from the Cardiovascular Health Study (CHS) (N=533) and Framingham Heart Study (FHS) (N=1995). Meta-analysis across all cohorts identified 6 mtDNA-CN associated CpGs to be validated across cohorts at genome-wide significance (P<5×10−8). Additionally, over half of these CpGs were associated with phenotypes known to be associated with mtDNA-CN, including CHD, CVD, and mortality. Experimental modification of mtDNA-CN through knockout via CRISPR-Cas9 of TFAM, a regulator of mtDNA replication, demonstrated that modulation of mtDNA-CN directly drives changes in nDNA methylation and gene expression of specific CpGs and nearby transcripts. Strikingly, the ‘neuroactive ligand receptor interaction’ KEGG pathway was found to be highly overrepresented in the ARIC cohort (P= 5.24×10−12), as well as the TFAM knockout methylation (P=4.41×10−4) and expression (P=4.30×10−4) studies. These results demonstrate that changes in mtDNA-CN influence nDNA methylation at specific loci and result in differential gene expression of specific genes, including those acting in the ‘neuroactive ligand receptor interaction’ pathway that may impact human health and disease via altered cell signaling.
INTRODUCTION
Mitochondria are cytoplasmic organelles primarily responsible for cellular metabolism, and have pivotal roles in many cellular processes, including aging, apoptosis and oxidative phosphorylation1. Dysfunction of the mitochondria has been associated with complex disease presentation including susceptibility to disease and severity of disease2. Mitochondrial DNA copy number (mtDNA-CN), a measure of mtDNA levels per cell, while not a direct measure of mtDNA damage, is associated with mitochondrial enzyme activity and adenosine triphosphate production. mtDNA-CN is regulated in a tissue-specific manner and in contrast to the nuclear genome, is present in multiple copies per cell, with the number being highly dependent on cell type3. Further, levels of mtDNA-CN correlate with mitochondrial function4. mtDNA-CN is therefore a relatively easily attainable biomarker of mitochondrial function. Cells with reduced mtDNA-CN show reduced expression of vital complex proteins, altered cellular morphology, and lower respiratory enzyme activity5. Variation in mtDNA-CN has been associated with numerous diseases and traits, including cardiovascular disease6–8, chronic kidney disease9, diabetes10, 11, and liver disease12, 13. Lower mtDNA-CN has also been found to be associated with frailty and all-cause mortality10.
Communication between the mitochondria and the nucleus is bi-directional and it has long been known that cross-talk between nDNA and mtDNA is required for proper cellular functioning and homeostasis14, 15. Specifically, bi-directional cross-talk is essential for the maintenance and integrity of cells16, 17, and interactions between mtDNA and nuclear DNA (nDNA) contribute to a number of pathologies18, 19. However, the precise relationship between mtDNA and the nuclear epigenome has not been well defined despite a number of reports which have identified a relationship between mitochondria and the nuclear epigenome. For example, mtDNA polymorphisms have been previously demonstrated to alter nDNA methylation patterns20 and hyper- and hypo- methylation of nuclear sites has been observed in mitochondria-depleted cancer cell lines21. Additionally, differential DNA methylation in brain tissue and corresponding differential gene expression were observed between strains of mice having identical nDNA, but different mtDNA18 and reduced mtDNA-CN has been associated with inducing cancer progression via hypermethylation of nuclear DNA promoters22. Further, mtDNA-CN has been previously associated with changes in nuclear gene expression23.
Thus, gene expression changes identified as a result of mitochondrial variation may be mediated, at least in part, by nDNA methylation. Further, given that it has been well-established that mtDNA-CN influences a number of human diseases we propose that one mechanism by which mtDNA-CN influences disease may be through regulation of nuclear gene expression via the modification of nDNA methylation.
To this end, we report the results of cross-sectional analysis of this association between mtDNA-CN and nDNA methylation in 5,035 individuals from the ARIC, CHS and FHS cohorts. Further, to determine the causal direction of the association between mtDNA-CN and nDNA methylation, we present results from experimental modification of mtDNA-CN followed by assessment of nDNA methylation and gene expression profiles in mtDNA-CN depleted cell lines.
RESULTS
mtDNA-CN is associated with nuclear DNA methylation at independent genome-wide loci in cross-sectional analysis
We performed an epigenome-wide association study (EWAS) in 2,507 individuals from the Atherosclerosis Risk in Communities (ARIC) study, comprised of 1,567 African American (AA) and 940 European American (EA) subjects (Figure 1, Table 1, Table S1). 34 independent CpGs were significantly associated with mtDNA-CN (P<5×10−8) in a meta-analysis combining the race groups (Figure 2, Figure S1, Table 2, Table S2A). This conservative P-value cutoff was confirmed by permutation testing. In stratified analysis of ARIC AA and EA participants, we identified 23 and 15 independent CpGs at epigenome-wide significance, respectively (Figure S2, Table S2B,C). Two CpGs were shared by both race groups (cg26094004 and cg21051031). ARIC AA and EA effect sizes for significant results were strongly correlated (R2=0.49) (Figure S3). Further, 16/23 (70%) of AA cohort-identified CpGs showed the same direction of effect in EA participants (P=0.06) and 12/14 (86%) of EA cohort-identified CpGs displayed the same direction of effect in AA participants (P=0.008). Given these observations, we have focused on the ARIC results from combining both races (N=2,507) in further analyses.
Additionally, an association was observed between increased mtDNA-CN and global hypermethylation (P<2.2×10−16, ß=0.1487) in ARIC AA, however no such association was seen in ARIC EA (P<0.77, ß=0.013) (Figure S4).
Pathway and biological process analysis displays associations with cell signaling functions and the ‘Neuroactive ligand-receptor interaction’ pathway
To assess the potential mechanism underlying the identified associations we performed GO and KEGG pathway analysis. mtDNA-CN associated CpGs were annotated with their nearest gene. KEGG analysis identified the neuroactive ligand-receptor interaction pathway (path:hsa04080) to be the top overrepresented pathway (P= 5.24×10−12, Permuted P=3.84×10−5) (Table 3a). Further, GO analyses identified a number of biological processes related to cell signaling and ligand interactions including Cell-cell signaling (P=1.42×10−3), Trans-synaptic signalling (P=1.88×10−3) and Synaptic signaling (P=1.88×10−3), among others (Table 3b). These results were confirmed by both permutation testing and through their robustness to ten different associated-CpG cutoffs (cutoff used for final analysis: 300 CpGs).
Validation of CpG associations in independent cohorts
We performed a validation study to replicate findings from the ARIC discovery population in 239 AA and 294 EA participants from the Cardiovascular Health Study (CHS) as well as 1,995 EA participants from the Framingham Heart Study (FHS), for a total of 2,528 individuals (Table 1). 7/34 CpGs identified in the discovery cohort were nominally significant (P<0.05 and displaying the same direction of effect as the ARIC cohort results) (Table 2) and the effect sizes from the ARIC results and the validation meta-analysis were largely correlated (R2=0.36) (Figure 3). Overall, the results were consistent across individual cohorts (Figure S5, Table S2) and analysis of the results from the 34 CpGs across all 3 cohorts (ARIC, CHS and FHS, N=5,035), identify 6 CpGs as validated mtDNA-CN associated CpGs (P<5×10−8) (Table 2, Figure S6).
Establishing causality via TFAM knockout
mtDNA-CN is causative of changes in nuclear DNA methylation at loci of interest
To assess if modification of mtDNA-CN drives changes to nuclear DNA methylation we used CRISPR-Cas9 to knock out the TFAM gene, which encodes a regulator of mtDNA replication and has been shown to reduce mtDNA-CN24. Heterozygous knockout of the TFAM gene in HEK293T cells resulted in a 5-fold reduction in the expression of TFAM, negligible protein production, and an 18-fold reduction in mtDNA-CN across three biological replicates (Figure 4). We then assayed methylation of the validated mtDNA-CN associated CpGs using the Illumina Infinium Methylation EPIC Beadchip (Table S3). Specifically, the direct assessment of methylation levels for 4 of the 6 validated mtDNA-CN associated CpGs and one surrogate CpG, as 2 CpGs were not present on the EPIC array and for one missing CpG a reasonable surrogate was not available (see Methods). Reduction of mtDNA-CN in TFAM knockout cell lines led to subsequent site-specific changes to DNA methylation for 3 of the 5 EWAS-identified CpGs (nominally significant, P<0.05), two of which were in the expected direction of effect (reduced mtDNA-CN led to an increase in methylation) (Table 4, Figure S7). Further, two of the validated mtDNA-CN associated CpGs were differentially methylated even after Bonferroni correction (P<0.01) (Table 4). Pyrosequencing was also performed for 3 of the 6 sites (the other sites did not pass pyrosequencing quality control) which confirmed methylation changes at all assayed sites (P<0.05) in the expected direction of effect (Table S4).
Global methylation patterns did not show differences between negative control and TFAM knockout cell lines suggesting that these differences are site-specific (Figure S8).
mtDNA-CN is causative of changes in nuclear gene expression at nearby genes of interest
The same TFAM knockout and negative control cell lines were analyzed for differential gene expression nearby the methylated mtDNA-CN associated CpGs using RNA-seq (Table S5). RNA-seq resulted in expected clustering of knockout and control lines (Figure S9). All nominally differentially expressed genes (P<0.05) within 1Mb of the TFAM knockout differentially methylated CpGs were identified (Table S6). Five genes nearby the three differentially methylated CpGs were differentially expressed after Bonferroni correction for the number of genes within 1Mb of each CpG (P<6.41×10−4) (Table 5). The five differentially expressed genes were: IFI35 (P=3.76×10−5) and RAMP2 (P=5.51×10−4) near cg26094004; RPIA near cg26563141 (P=5.04×10−6); and HLA-DRB5 (P=6.50×10−7) and MSH5 (P=2.50×10−4) near cg08899667. These results demonstrate that modulation of mtDNA-CN drives changes in nDNA methylation and gene expression of specific CpGs and transcripts in a cell culture model.
Pathway and biological process analysis of TFAM KO methylation and expression results independently identify pathways identified in cross-sectional analysis
We sought to independently assess the underlying pathways and biological processes that were overrepresented following TFAM knockout in our cell-culture model. Specifically, we analyzed the most over-represented terms resulting from GO and KEGG analysis of our full list of differentially methylated CpGs and differentially expressed genes as well as a list of integrated methylation and expression results (Cutoffs used: TFAM Methylation - top 300 differentially methylated CpGs, TFAM Expression – differentially expressed genes (169 genes), TFAM Integrated Methylation/Expression – top 188 genes). The independent results confirmed the findings from our ARIC cross-sectional analysis. Specifically, KEGG analysis of TFAM knockout results identified the neuroactive ligand-receptor interaction pathway (path:hsa04080) to be the second most overrepresented pathway in the TFAM knockout methylation analysis (P=4.41×10−4) and the top overrepresented pathway in the TFAM knockout RNA sequencing analysis (P=4.30×10−4) (Table 3a). Accordingly, integration of results from TFAM knockout methylation and expression also resulted in strong association with this pathway (P=8.77×10−6). Further, combining of P-values (Fisher’s method) across ARIC meta-analysis, TFAM knockout methylation and TFAM knockout expression analyses yielded a combined P-value of 8.96×10−16 for this pathway which was also the top pathway identified in integrated analysis (Table 3a).
The specific genes identified by each analysis to be part of the neuroactive ligand receptor interaction pathway were unique to each study (Table S7), with only one gene (GABRG3) in common between ARIC analyses and TFAM knockout methylation analysis and only one gene (GABRB1) in common between TFAM knockout methylation and expression analyses (Table S7).
GO analyses of TFAM knockout cell lines also confirmed the finding from cross-sectional analysis that biological processes related to cell signaling and ligand interactions including Cell-cell signaling (combined P=7.63×10−8), Trans-synaptic signaling (combined P=2.89×10−7) and Synaptic signaling (combined P=2.97×10−7) were over-represented, among others (Table 3b). These results suggest that mtDNA-CN drives changes to nDNA methylation at sites nearby genes relating to cell signaling processes which in turn may cause gene expression changes to these genes and contribute to disease.
Establishing causality via Mendelian Randomization (MR): Nuclear DNA methylation does not appear to be causative of changes in mtDNA-CN at identified CpGs
Mendelian randomization, a form of instrument variable analysis, was used to further test the direction of causality between mtDNA-CN and nuclear methylation by exploring the relationship between methylation quantitative trait loci (meQTLs) and mtDNA-CN (Table S8). Specifically, if nDNA methylation at our sites of interest is causative of changes in mtDNA-CN, then meQTL SNPs for these CpGs of interest would be expected to also be associated with mtDNA-CN. Alternatively, if mtDNA-CN is not associated with meQTL SNPs, then it would follow that changes to nDNA methylation likely do not drive changes to mtDNA-CN at these CpGs.
We identified 4 independent cis meQTLs in the ARIC EA cohort (Permuted P=7.84×10−4) and 6 independent cis meQTLs in the ARIC AA cohort (Permuted P=9.12×10−4) across 5 mtDNA-CN associated CpGs for use as an instrument variable for MR (Table S8A). We further identified 2 independent meta-analysis derived meQTLs by combining results from ARIC EA and AA cohorts (Permuted P=3.97×10−5, fixed effects (FE) model) (Table S8B).
We then assessed the relationship between meQTL SNPs and mtDNA-CN. The results of the MR were null for each independent meQTL (Bonferroni P=0.005) (Table S8). While our power for a single meQTLvaried depending on the specific meQTL assessed, with power to detect an individual association ranging from 0.18 to 0.99 across the 12 meQTLs, overall power was >99% to detect at least 1 associated meQTL. These results support the experimentaly established direction of causality by suggesting that modification of nDNA methylation at CpG sites of interest does not directly drive alterations in mtDNA-CN.
Association of CpG methylation with mtDNA-CN associated phenotypes
Since decreased mtDNA-CN has been associated with a number of aging-related diseases, and given our hypothesis that mtDNA-CN leads to nDNA methylation changes which influence disease outcomes, associated CpGs should also be associated with mtDNA-CN related phenotypes. To test these associations, we performed linear regression and survival analysis for prevalent and incident diseases, respectively, for each of the 6 validated CpGs as they relate to CHD, CVD, and mortality in the ARIC, FHS and CHS cohorts (Table 6, Table S9). Results from each cohort were meta-analyzed to derive an overall association for each validated CpG with each outcome of interest.
We identify nominally significant phenotype associations with at least one of the mtDNA-CN associated traits of interest for 4 of the 6 validated mtDNA-CN associated CpGs (P<0.05). Specifically, results in the expected direction of effect for prevalent CHD and prevalent CVD were identified for two mtDNA-CN associated CpGs (cg26094004 and cg08899667). Similarly, results in the expected direction of effect were identified for the association between all-cause mortality and cg26563141 and cg08899667. Thus, we found cg08899667 to be associated with three of the five mtDNA-CN associated phenotypes, including all-cause mortality (Table 6).
DISCUSSION
We report evidence that changes in mtDNA-CN influence nDNA methylation at specific, validated loci and lead to changes in gene expression of nearby genes, including those acting in the ‘neuroactive ligand receptor interaction’ pathway which may impact human health and disease via altered cell signaling. A number of these associations were validated across three independent cohorts and identified both cross-sectionally and experimentally. Interestingly, these associations were found to be site-specific in nature. It is important to note that the methods used to estimate mtDNA-CN differed between the three cohorts with a qPCR based approach used for CHS, a whole-genome sequencing approach for FHS and microarray analysis for ARIC. This may reflect the robustness of results across mtDNA-CN estimation methods and also explain why some but not all CpGs replicated in our validation analysis25. We also report that our experimental approach using cell lines replicated some but not all of the cohort validated CpGs. These findings likely reflect both the intrinsic differences between cell line data and cross-sectional data as well as the inherent complexity of mitochondrial-to-nuclear signaling which would be expected to vary across cell-types, developmental timepoints and environmental conditions.
DNA methylation as the link between mtDNA-CN and changes in nuclear gene expression
A symbiotic relationship between the nuclear and mitochondrial genomes has developed in eukaryotes. This relationship strongly implicates communication between the mitochondrial and nuclear genomes as vital for proper cell functioning. Epigenetic mechanisms allow for control of gene expression beyond DNA sequence and have the capacity to be influenced by environmental stimuli. Given the function of the mitochondria in meeting cellular energy demands, mitochondria may play an important role in translating environmental stimuli into epigenetic changes. In addition, mtDNA-CN levels are sensitive to a number of chemicals26, highlighting the role of mtDNA as an environmental biosensor. Also supporting the notion that bioenergetics are involved in modulating the epigenetic status of the cell is the observation that clinical phenotypes of mitochondrial diseases are strikingly similar to those found in a number of epigenetic diseases such as Angelmans, Rett and Fragile X syndromes27. Further, epigenetic changes in nuclear DNA correlate with reduced cancer survival and low mtDNA-CN correlates with poor survival across a number of cancer types28, 29. Thus, retrograde signals from the mitochondria to the nucleus may be crucial in sensing homeostasis and translating extracellular signals into altered gene expression18.
Our results implicate the neuroactive ligand receptor interaction pathway and in general additional processes involved in cellular signaling. The results also show that although the same pathways are implicated across our independent datasets, the specific genes affected differ between conditions. Interestingly, the neuroactive ligand receptor interaction pathway has been identified as having the second highest number of atherosclerosis candidate genes of any KEGG pathway, harboring 53 atherosclerosis candidate genes (272 total genes in the pathway)30. This is an interesting finding given the association of mtDNA-CN with cardiovascular disease6–8. Perhaps unsurprisingly, this pathway also belongs to the class of KEGG pathways that are responsible for environmental information processing.
Proposed mechanisms for the methylation of nDNA as a result of changes in mtDNA
The precise identity of the signal(s) coming from the mitochondria that might be responsible for modifying nDNA methylation has not yet been identified and warrants further experimentation. It is likely that metabolite intermediates, non-coding RNA, and/or histones, may play a role in this signaling process. For example, mitochondria-to-nucleus retrograde signaling has been shown to regulate histone acetylation and alter nuclear gene expression through the heterogenous ribonucleoprotein A2 (hnRNAP2)23. In fact, histone modifications co-vary with mitochondrial content and are linked with chromatin activation, namely H4K16, H3L4me3 and H3K36me231.
The differentially expressed genes identified from the experimental knockout may provide some evidence with regards to the mechanism behind these findings. For example, IFI35, a gene involved in Interferon response, is associated with mtDNA-CN through the antiviral innate immune response32. Futher, the differentially expressed genes, RAMP2 and MSH5, are known to be related to oxidative phosphorylation protein expression and genome stability, respectively33, 34.
Uncovering the precise nature of this signaling from mitochondria to the nucleus would be expected to expose essential clues that will integrate epigenetic regulation, mitochondrial and genomic polymorphisms, and complex phenotypes. Further assessment of the functional mechanisms underlying the crosstalk between mtDNA-CN, methylation and disease will be required to fully appreciate the diagnostic and therapeutic utility of the interaction between mtDNA and nDNA as identified in this study.
Influence of findings on complex disease etiology
The observation that differential methylation occurred at specific-sites throughout the nuclear genome as a result of changes to mtDNA-CN, provides an explanation for how mtDNA could alter normal homeostasis as well as susceptibility and/or severity of diseases. It is particularly interesting to note that these changes appear to be site-specific rather than global in nature. The association of mtDNA-CN associated CpGs with mtDNA-CN related disease states lends further support to the hypothesis that modulation of mtDNA-CN not only modifies the nuclear epigenome, and the expression of nearby genes, but does so at locations which may be relevant to disease outcomes, including cardiovascular disease and all-cause mortality. In particular, these observations may explain how mitochondrial-to-nuclear signaling could influence polygenic traits with complex etiology and in particular those for which environmental insults play a role. Together, mitochondrial signaling, and subsequent nDNA methylation, may have an important role in modifying gene expression which may in turn lead to disease outcomes or influence the severity of disease manifestation. Thus, the mechanism(s) by which mtDNA-CN influences disease status may be, at least in part, through modification of nDNA methylation and subsequent modification and/or regulation of nuclear gene expression.
Further, these findings have direct implications for the recent emergence of mitochondrial donation in humans as they suggest that mitochondrial replacement into recipient oocytes may lead to unexpected changes to the nuclear epigenome. Thus, with the recent development of mitochondrial replacement therapy, unravelling the complex interplay of the mitochondria and nucleus is also critical to properly informing medical decision makers.
This study design had a number of strengths and limitations. A possible limitation of the cross-sectional analysis is the potential for some common factor we have not been able to account for to influence both mtDNA-CN and nDNA methylation. In experimental analysis, we used HEK293T cells for our knockdown studies and we note that the use of a blood cell line may be more relevant to direct interpretation of the results. Further, prevalent disease is subject to reverse causality and therefore the results on prevalent phentoypes should be interpreted with caution. Strengths of this study include the well phenotyped and carefully collected incident disease data, the robustness of the findings across multiple cohorts and ethnic groups, as well as the carefuly quality control employed. Further, our results stood up to rigorous permutation testing which increases the reliability of these observations.
CONCLUSION
Cross-sectionally we have shown that variation in mtDNA-CN is associated with nuclear epigenetic modifications at specific CpGs across multiple independent cohorts. Specifically, six mtDNA-CN associated CpGs were robustly identified across three independent cohorts, three of which were confirmed in experimental analysis. Second, we found meQTL SNPs to not be associated with mtDNA-CN, suggesting that nuclear methylation at these CpGs does not directly cause altered mtDNA-CN. Third, functional results show that modulation of mtDNA-CN causes site-specific changes to nuclear DNA methylation and RNA expression near genes relating to cell signaling processes including those in the neuroactive-ligand-receptor interaction pathway. Further, mtDNA-CN associated CpGs display association with mtDNA-CN related phenotypes, namely cardiovascular disease and all-cause mortality. These findings demonstrate that the mechanism(s) by which mtDNA-CN influences disease is at least in part via regulation of nuclear gene expression through modification of nDNA methylation. Specifically, the data presented here support the model that modification of mtDNA-CN leads to changes to nDNA methylation which in turn influence nuclear DNA expression of nearby genes which contribute to disease pathology. These results have implications for understanding the mechanisms behind mitochondrial and nuclear communication as it relates to complex disease etiology as well as the consequences of mitochondrial replacement therapeutic strategies. Taken together, the results confirm that in elucidating the underpinnings of complex disease, knowledge of only nuclear DNA dynamics is not sufficient to fully elucidating disease etiology.
ONLINE METHODS
A flow chart of general methods can be found in Figure 1.
Ethics
The Atherosclerosis Risk in Communities (ARIC) study, Cardiovascular Health Study (CHS) and Framingham Heart Study (FHS) have been approved by the Institutional Review Board (IRB) at each participating institution. All participants provided written informed consent.
The ARIC study design and methods were approved by four different IRBs at each of the collaborating medical institutions: University of Mississippi Medical Center Institutional Review Board (Jackson Field Center); Wake Forest University Health Sciences Institutional Review Board (Forsyth County Field Center); University of Minnesota Institutional Review Board (Minnesota Field Center); and Johns Hopkins University School of Public Health Institutional Review Board (Washington County Field Center). FHS is approved by the IRB at Boston University Medical Center. CHS recruited participants from Medicare lists at 4 sites and IRBs at each site were involved in human subjects approval.
Discovery Study Analysis
The Atherosclerosis Risk in Communities Cohort (ARIC)
The ARIC study is a prospective cohort intended for the study of cardiovascular disease in subjects from four communities across the USA: Forsyth County, NC, northwest suburbs of Minneapolis, MN, Jackson, MS, and Washington County, MD35. Sample characteristics are available in Table 1. Following quality control, 1,567 African Americans (AA) and 940 European Americans (EA) were used as a discovery cohort. Participants for ARIC EA were derived from two existing projects, Brain MRI (81.7%) and OMICS (18.3%). DNA was extracted from peripheral blood leukocyte samples from visit 2 or 3 using the Gentra Puregene Blood Kit (Qiagen; Valencia, CA, USA) according to the manufacturer’s instructions (www.qiagen.com) and hybridized to the Illumina Infinium Human Methylation 450K BeadChip and the Genome-Wide Human SNP Array 6.0.
Estimation of mtDNA-CN from Affymetrix Human SNP 6.0 Arrays
The Affymetrix Genome-Wide Human SNP 6.0 Array was used to estimate mtDNA-CN for each participant as previously described36. Briefly, mtDNA copy number (mtDNA-CN) was determined utilizing the Genvisis software package (http://www.genvisis.org). Initially, a list of high-quality mitochondrial SNPs were hand-curated by employing BLAST to remove SNPs without a perfect match to the annotated mitochondrial location and SNPs with off-target matches longer than 20 bp. The probe intensities of the 25 remaining mitochondrial SNPs was determined using quantile sketch normalization (apt-probeset-summarize) as implemented in the Affymetrix Power Tools software. To correct for DNA quality, DNA quantity, hybridization efficiency and other technical artifacts, surrogate variable analysis was applied to the BLAST filtered, GC corrected LRR of 43,316 autosomal SNPs. These autosomal SNPs were selected based on the following quality filters: call rate >98%, HWE P-value >0.00001, PLINK mishap for non-random missingness P-value >0.0001, association with sex P-value 0.00001, linkage disequilibrium pruning (r2 <0.30), maximal autosomal spacing of 41.7 kb. The median of the normalized intensity, log R ratio (LRR) for all homozygous calls was GC corrected and used as initial estimates of mtDNA-CN for each sample. The final measure of mtDNA-CN is represented as the standardized residuals from a race-stratified linear regression adjusting the initial estimate of mtDNA-CN for 15 surrogate variables (SVs), age, sex, sample collection site, and white blood cell count. Technical covariates such as DNA quality, DNA quantity, and hybridization efficiency were captured via surrogate variable analysis (SVA) as previously described7, 37.
Illumina Infinium Human Methylation 450K Beadchip Analysis
The Infinium Human Methylation 450K BeadChip was used to determine DNA methylation profiles from blood for >450,000 CpGs across the human genome.
Bisulfite Conversion
Bisulfite conversion of 1 ug genomic DNA was performed using the EZ-96 DNA Methylation Kit (Deep Well Format) (Zymo Research; Irvine, CA, USA) according to the manufacturer’s instructions (www.zymoresearch.com). Bisulfite conversion efficiency was determined by PCR amplification of the converted DNA before proceeding with methylation analyses on the Illumina platform using Zymo Research’s Universal Methylated Human DNA Standard and Control Primers.
Normalization and Quality Control
Probes included on the list of cross-reactive 450K probes as reported by Chen et al were removed prior to analysis38. The cross-reactive target had to match a minimum of 47 bases to be considered cross-reactive. This led to the removal of ∼28,000 probes.
Genome studio background correction and BMIQ normalization were performed39 and the wateRmelon R package was used to conduct QC filtering40.
Samples were removed for the following reasons: 1. Failed bisulfite conversion, 2. Call rate <95%, 3. Sex mismatch using minfi, 4. Weak correlation between available genotypes and genotypes on 450K array, 5. Weak clustering according to sex in MDS plot, 6. PCA analysis identified them as an outlier (≥4SD from mean), 7. Failed sex check, 8. Sample pass rate <99%, 9. Only sample to pass on a chip. These filtering settings led to the removal of 68 samples in the AA group and 24 samples in the EA group. If samples were run in duplicate, the sample with the lowest missing rate was retained.
Surrogate Variable Analysis (SVA)
SVAs were generated using the package SVA in R and protecting mtDNA-CN37.
Control Probe Principal Components in ARIC European Americans
The control probe principal components are based on 42 measures, which are transformed from control probes and out-of-band probes in the 450K data41.
Statistical Analysis
All statistical analyses were performed using R (version 3.3.3).
Linear Mixed Model – Association between mtDNA-CN and nuclear DNA methylation
Linear-mixed-effects regression analysis was performed to determine the association between mtDNA-CN and nuclear DNA methylation at specific CpGs (Table S1).
ARIC AA
Methylation ∼ MtDNA-CN + Age + Sex + Site + Visit + Chip Position + Plate + CD8 Count + CD4 Count + B-Cell Count + Monocyte Count + Granulocyte Count + Smoking Status + First 10 Surrogate Variables + Chip (as random effect).
ARIC EA
Same model as ARIC AA but further inclusion of Project (Brain MRI or Omics) as well as the first 10 PCs derived from methylation microarray control probes and the composition of natural killer (NK) cells. Cell types were imputed using the method of Houseman et al.42. All correlations were performed using the Pearson method. Global methylation distributions were assessed by a chi-square test to compare observed to expected site-specific methylation.
Meta-Analysis
A meta-analysis was performed to combine the results from the individual ARIC AA and EA analyses (Table S1). This analysis was done using the standard error scheme implemented in Metal43. CpGs had to have a P-value cutoff of P<0.05 in ARIC AA and EA analyses to be included in the meta-analysis. Associations that met genome-wide significance were included in subsequent analyses (P=5.0×10−8). 100 meta-analysis permutations were also performed (Permuted P=3.94×10−8).
Residual Bootstrapping
Residual bootstrapping was used to determine the most appropriate genome-wide significance cutoff in ARIC EA and AA cohorts (AA: P<6.22×10−8, EA: P<3.03×10−7). The steps taken were as follows: 1) Residuals were derived from the full model, 2) Fitted values were derived from the null model (model without mtDNA-CN as independent factor), 3) The residuals from Step 1 were resampled and added to the fits from Step 2, 4) Each resulting matrix from Step 3 was run as pseudonull input in the formula lme(pseudonull∼CN+covariates) to refit the full model and obtain null statistics, 4a) The most extreme P-value was pulled from each iteration, 4b) The resulting 100 most extreme P-values were ranked from least to most significant and the 95th value was chosen to be the ‘genome-wide significance level’ for the corresponding cohort. Additionally, the qq plots show minimal inflation in ARIC AA, EA and meta-analysis (Figure S1).
Significant CpGs with high correlation (R2≥0.6) were identified as non-independent and the CpGs with the more significant P-value was retained. Highly correlated CpGs were consistent between AA and EA results, specifically these CpGs were cg21051031 and cg03964851 (R2: AA=0.62, EA=0.63) and cg06809544 and cg13393978 (R2: AA=0.65, EA=0.70).
Validation Cohorts
The Cardiovascular Health Study (CHS)
The CHS is a population-based cohort study of risk factors for coronary heart disease and stroke in adults ≥65 years conducted across four field centers44. The original predominantly European ancestry cohort of 5,201 persons was recruited in 1989-1990 from random samples of the Medicare eligibility lists; subsequently, an additional predominantly African-American cohort of 687 persons was enrolled in 1992-1993 for a total sample of 5,888. The validation cohort includes 239 AA participants and 294 EA participants from CHS with mtDNA-CN and 450K methylation derived from the same visit (Table 1).
mtDNA-CN Estimation using Quantitative PCR
mtDNA copy number (mtDNA-CN) was determined utilizing a multiplexed real time quantitative polymerase chain reaction (qPCR) assay with ABI TaqMan chemistry (Applied Biosystems) as previously described7. Briefly, each well consisted of a VIC-labeled, primer limited assay specific to a mitochondrial target (ND1), and a FAM-labeled assay specific to a region of the nuclear genome selected for being non-repetitive (RPPH1). Each sample was run in triplicate on a 384 well plate in a 10µL reaction containing 20ng of DNA. The cycle threshold (Ct) value was determined from the amplification curve for each target by the ABI Viia7 software. A ΔCT value was computed for each well as the difference between the Ct for the RPPH1 target and the Ct for the ND1 target, as a measure of mtDNA copy number relative to nuclear DNA copy number. For samples with a standard deviation of ΔCT for the three replicates >0.5, an outlier replicate was identified and excluded. If the ΔCT standard deviation remained >0.5 after exclusion, the sample was completely excluded from future analyses. Replicates with Ct values for ND1 > 28, Ct values for RPPH1 > 5 standard deviations from the mean, or ΔCT values >3 standard deviations from the mean of the plate were removed. Additionally, due to an observed linear increase in ΔCT value by the order in which the replicate was pipetted onto the plate, a linear regression was used to correct for pipetting order. Plate effects are controlled for by performing a linear regression whereby the plate a sample is run on is treated as a random effect. The final measure of mtDNA-CN is represented as the standardized residuals from a race-stratified mixed linear regression adjusting for age, sex, and sample collection site.
Methylation Analysis
Methylation measurements were performed at the Institute for Translational Genomics and Population Sciences at the Harbor-UCLA Medical Center Institute for Translational Genomics and Population Sciences (Los Angeles, CA). DNA was extracted from Buffy coat fractions and subsequently underwent bisulfite conversion using the EZ DNA Methylation kit (Zymo Research, Irvine, CA). Methylation was then assayed using the Infinium HumanMethylation450 BeadChip (Illumina Inc, San Diego, CA).
Quality control was performed in in the minfi R package45 (version 1.12.0, http://www.bioconductor.org/packages/release/bioc/html/minfi.html). Samples with low median intensities of below 10.5 (log2) across the methylated and unmethylated channels, samples with a proportion of probes falling detection of greater than 0.5%, samples with QC probes falling greater than 3 standard deviation from the mean, sex-check mismatches, failed concordance with prior genotyping or > 0.5% of probes with a detection P-value > 0.01 were removed. Probes with >1% of values below detection were removed. In total, 11 samples were removed for sample QC resulting in a sample of 323 European-ancestry and 326 African-American samples. Methylation values were normalized using the SWAN quantile normalization method46. Since white blood cell proportions were not directly measured in CHS they were estimated from the methylation data using the Houseman method42.
Regression Analysis
CHS was analysed using linear regression with methylation beta values as the dependent variable and mtDNA-CN as the independent variable. Analyses were adjusted for age, sex, batch, measured white blood cell count and estimated cell type counts.
The Framingham Heart Study (FHS)
FHS is a prospective study of individuals from Framingham, Massachusetts47. The validation cohort includes 1,995 EA participants from FHS with mtDNA-CN and 450K methylation derived from the same visit (Table 1).
mtDNA-CN Estimation from Whole Genome Sequencing
Cohort-specific mtDNA-CN residuals were obtained by regressing mtDNA-CN on age, sex, and WBC counts. Mitochondrial DNA copy number was estimated by applying the fastMitoCalc software48 to harmonized build 37 mappings of TOPMed deep whole genome sequencing data (freeze 5). The estimated mitochondrial copy number is twice the ratio of average mitochondrial sequencing depth to average autosomal sequencing depth. We applied inverse normal transformation to mtDNA-CN residuals.
Methylation Analysis
DNA extraction, methylation quantification (450k-BeadChip), and QC were detailed previously49. We obtained lab-specific and cohort-specific DNA methylation residuals by regressing methylation beta values on age, sex, batch effects (plate, col, row), and WBC counts. We applied inverse normal transformation to DNA methylation residuals.
Regression Analysis
A linear mixed model was applied with inverse normal transformed DNA methylation residuals as the dependent variable and inverse normal transformed mtDNA-CN residuals as the independent variable, accounting for family structure.
Validation and all-cohort meta-analyses
A meta-analysis was performed of all validation cohorts (FHS EA, CHS EA, CHS AA). We also performed an all-cohort meta-analysis (ARIC AA, ARIC EA, FHS EA, CHS EA, CHS AA). Both meta-analyses were performed using the standard error scheme implemented in Metal43.
Mendelian Randomization
meQTL Analysis
meQTLs were identified using MatrixEQTL50. Imputed genotypes which were previously derived from ARIC for the relevant participants as well as normalized residuals from our 450K methylation dataset were used in regression analysis. Haplotype phasing was performed using ShapeIt51 and imputation was performed using IMPUTE252. SNPs were filtered for allele frequency >0.05, and imputation quality >0.4. Genotypes were imputed to the 1000G reference panel (Phase I, version 3). The same covariates used for the ARIC EWAS analysis were used to call meQTLs as well as the addition of genotyping PCs (4 for EA, 10 for AA). Only meQTLs which had an individual cohort P value >0.05 were included in the meta-analysis.
A linear model was used for MatrixEQTL and a cis meQTL was defined as having a distance less than 100 kb. Only cis meQTLs derived from the 6 CpGs of interest and which met a cohort-specific permuted P-value cutoff (Permuted P: EA=7.84×10−4, AA=9.12×10−4) or a permuted meta-analysis P-value cutoff (Permuted P, fixed effects (FE) model: 3.97×10−5) were retained for use in Mendelian randomization. Metasoft53 was used for meta-anlaysis; in addition to the fixed effects (FE) model, a random effects (RE) and Han and Eskin’s Random Effects model (RE2) were also used and yielded very similar results (Table S8).
Mendelian Randomization Methods
Independent meQTLs were used for MR. Independence was defined by including SNPs in the same linear model. MR with mtDNA-CN as the outcome and methylation as the exposure was undertaken. meQTLs served as the known relationship of genotype on exposure (methylation) and the results of the linear model, lm(mtDNA∼meQTL SNP) were calculated. Power for the MR was calculated using the YZ association function in mRnd54.
Phenotype Analysis
We compared methylation at the 6 validated CpGs to phenotypes that are known to be associated with mtDNA-CN. Phenotypes included prevalent diseases (CHD, CVD) as well as incident diseases (CHD, CVD, Mortality). The analysis was performed as follows for each cohort:
A) Prevalent diseases (CHD, CVD): glm(PRVCVD ∼ resids(methyl) + AGE + SEX + CENTER + RACE, family=binomial(logit))
B) Incident diseases (CHD, CVD, Mortality): coxph(Surv(STime, dead) ∼ resids(methyl) + AGE + SEX + CENTER + RACE))
Where resids(methyl) represents methylation adjusted for all relevant covariates from the EWAS. The event adjudication process in ARIC, CHS and FHS consisted of expert committee review of hospital records, telephone interviews, and death certificates. In addition, adjudicated events between visit 1 and the baseline visit for this study were considered prevalent events.
Analyses of prevalent and incident events in CHS were adjusted for age, sex, clinic site and batch.
In ARIC, prevalent coronary heart disease (CHD) was defined as history of myocardial infarction (MI) or cardiac procedures (heart or arterial surgery, coronary bypass, or angioplasty). Cardiovascular disease (CVD) was defined as either CHD or stroke. Prevalent stroke was defined as stroke at baseline. For all phenotypes, prevalent disease was a combination of self-report at visit 1 plus adjudicated events between visit 1 and the baseline visit. Incident CHD was defined as the first incident MI or death owing to CHD. Incident stroke was defined as the first nonfatal stroke or death owing to stroke. In ARIC, the mean follow-up time was 20.6 years in the EA cohort and 18.1 years in the AA cohort. Follow-up for incident events was administratively censored at December 31, 2016.
CHS and FHS followed similar phenotype definitions as ARIC. For FHS, the mean follow-up time was 6.0 years and individuals were removed if follow-up years equaled 0, FHS events were adjudicated through 12/2016. In CHS, prevalent CVD/CHD was excluded during sampling and events were adjudicated through June 30, 2015. The follow up time for incident events from the time of methylation measurement was 23 years.
Results from each of the 5 individual cohorts were meta-analyzed across cohorts using an inverse weighted standard error method43 to derive an overall phenotype association for each CpG of interest.
CRISPR-Cas9 Knockout of TFAM
Generation of TFAM Knockout
The stable TFAM CRISPR-Cas9 knockout was generated in HEK293T cells using the Origene TFAM – Human Gene Knockout Kit via CRISPR (catalog number: KN215488) following the manufacturer’s protocol. The following sgRNA guide sequence was used to generate the stable TFAM knockout lines: GCGTTTCTCCGAAGCATGTG. Lipofection was conducted using Turbofectin 8.0 (catalog number: TF81001). Puromycin was used for selection at a concentration of 1.5 µg/mL. Fluorescence-activated cell sorting (FACS) was used for single cell sorting and clonal expansion. HEK293T cells were grown in DMEM containing 10% FBS and 1% penicillin-streptomycin at 37°C and 5% CO2. Sequencing primers used to confirm the TFAM knockout and proper insertion of the Donor plasmid are as follows:
TFAM_Left_Forward_Primer_2: AGCGACTGTGGACAACTAGC, GFP_Reverse_Primer_2:
TCATCTTGTTGGTCATGCGG, Puro-Forward_Primer_1: CACAACCTCCCCTTCTACGAG,
TFAM_Right_Reverse_Primer_1: CCCCAAACTCCTTACCTGGG.
DNA Isolation
DNA extraction was performed on harvested HEK293T TFAM knockout cells using the AllPrep DNA/RNA Mini Kit (Qiagen #80204) following the manufacturer’s protocol. DNA was eluted in 100 µL ultrapure water. DNA was quantified using a Nanodrop 1000. Low purity samples were subjected to ethanol precipitation.
RNA Isolation
Total RNA was extracted from confluent T75 culture flasks of TFAM CRISPR Negative Control and KO cell lines (p32) using the AllPrep DNA/RNA/Protein Kit (Qiagen #80004). RNA was extracted using the provided kit manual/instructions for RNA extraction, except all microcentrifuge spins were performed at 10,000 x g. RNA was eluted twice in 50 uL molecular biology grade water and stored in a -80C freezer.
mtDNA-CN Estimation on TFAM Knockout Cell Lines
qPCR was used to measure mtDNA-CN as described above for CHS in section “mtDNA-CN Estimation using Quantitative PCR”.
TFAM Expression Assay
cDNA synthesis was performed with the SuperScript III First-Strand Synthesis System for RT-PCR (ThermoFisher #18080-051) following the manufacturer’s protocol. 1.5 µg of total RNA from each cell line was used as input and primed with 50 ng random hexamers using the appropriate incubation conditions from the manufacture’s protocol. Following completed cDNA synthesis, samples were quantified using the Qubit ssDNA assay kit (Invitrogen #Q10212) and Qubit 2.0 Fluorometer. Synthesized cDNA was then diluted to 10 ng/µL using ultrapure water and stored in -20°C.
qPCR to determine TFAM gene expression for TFAM KO
20 ng of synthesized cDNA from each cell line was used as input for a 10 µL volume reaction. TFAM cDNA were amplified using TaqMan probe Hs00273372_s1 (20x, FAM-labeled, Applied Biosystems #4331182). GAPDH cDNA served as a housekeeping reference control and was amplified with probe Hs03929097_g1 (20x, VIC-labeled, Applied Biosystems #4448489). Both probes were multiplexed together and all qPCR reactions were conducted at 50° C for 2 min, 95°C for 10 min, and then 40 cycles of 95°C for 15 sec and 60°C for 1 min. Expression fold change was determined using double delta cycle threshold using GAPDH as the housekeeping reference control.
Total Protein Extraction
Total protein lysates from HEK293T TFAM CRISPR knockout cell lines were extracted using ice-cold radioimmunoprecipitation assay buffer (RIPA) buffer supplemented with Halt Protease and Phosphatase Inhibitor Cocktail (Thermo Scientific #78440). Protein concentrations were quantified using the Pierce BCA Protein Assay Kit (Thermo Scientific #23227) and lysates were stored at -80°C.
Western Blotting
Equal amounts of each lysate were diluted 1:1 with 2x Laemmli Sample Buffer (Bio-Rad #161-0737) supplemented with 5% β-mercaptoethanol. Samples were then heated at 95°C for 5 minutes to denature the proteins. 30 µg of each protein lysate was separated on a 12% polyacrylamide Mini-PROTEAN TGX Gel (Bio-Rad #456-1044) and then transferred to a PVDF membrane (Bio-Rad #1704156) using the Trans-Blot Turbo Transfer System. The membrane was blocked overnight at 4° C in Tris-Buffered Saline and Tween 20 (TBST) containing 5% nonfat milk with gentle shaking. After blocking, the membrane was incubated with rabbit anti-TFAM primary antibody diluted 1:2000 in 5% milk (Abcam #ab131607) and rabbit anti-β-Tubulin primary antibody diluted 1:3000 in 5% milk (Invitrogen #PA5-27552) for 1 hour at room temperature with gentle shaking. The membrane was washed 5-times with TBST after primary antibody incubation, then incubated with goat anti-rabbit secondary antibody conjugated with horseradish peroxidase (1:20,000 dilution, Abcam #ab97080) in the dark for 1 hour at room temperature with shaking. Signals were visualized by enhanced chemiluminescent substrate (SuperSignal West Pico PLUS, Thermo Scientific #34577) and photographed digitally using the ChemiDoc-It2 Imager.
Methylation Analysis of TFAM Knockout Lines
TFAM KO cell lines were hybridized to the Illumina Infinium EPIC BeadChip at The University of Texas Health Science Center at Houston (UTHealth). Bisulfite conversion efficiency was reviewed in the laboratory using the Bead Array Controls Reporter (BACR) tool, and Illumina chemistry (sample independent controls) performed within acceptable specifications. All samples passed with detected CpG (0.01) >97%.
EPIC BeadChip analysis was performed using the minfi package55. Data was normalized using Functional Normalization41 and differential methylation was calculated using the dmpFinder function in minfi (Table S3).
In the cases where the CpG from the 450k array was not represented on the EPIC array a CpG surrogate was chosen if there was a nearby CpG within 1000 bp upstream or downstream of the original CpG that was highly correlated with the original CpG (R2 ≥0.6) and associated with mtDNA-CN in the ARIC analysis (P<5×10−8).
RNA sequencing of TFAM Knockout Lines
RNA Preparation
RNA quantification was performed using the Qubit RNA BR Assay (Invitrogen #Q10211) and Qubit 2.0 Fluorometer. The Agilent BioAnalyzer was used for quality control of the RNA prior to library creation, with a minimum RIN of 8.5. Samples were diluted to 300 ng/uL in 12 uL molecular biology grade water, and then submitted to the Genetic Resources Core Facility for RNA sequencing.
Library Preparation and Sequencing
Illumina’s TruSeq Stranded Total RNA kit protocol was used to generate libraries. Specifically, total RNA is converted to cDNA and size selected to 150 to 200 bp in length with 3’ or 5’ overhangs. End repair is performed where 3’ to 5’ exonuclease activity of enzymes removes 3’ overhangs and the polymerase activity fills in the 5’ overhangs. An ‘A’ base is then added to the 3’ end of the blunt phosphorylated DNA fragments which prepares the DNA fragments for ligation to the sequencing adapters, which have a single ‘T’ base overhang at their 3’ end. Ligated fragments are subsequently size selected through purification using SPRI beads and undergo PCR amplification techniques to prepare the ‘libraries’. The BioAnalyzer is used for quality control of the libraries to ensure adequate concentration and appropriate fragment size. The resulting library insert size is 120-200 bp with a median size of 150 bp. Libraries were uniquely barcoded and pooled for sequencing. DNA sequencing was performed in duplicate on an Illumina® HiSeq 2500 instrument using standard protocols for paired end 150 bp sequencing. As per Illumina’s recommendation, 3% PhiX was added to each lane as a control, and to assist the analysis software with any library diversity issues.
Primary Analysis
Illumina HiSeq reads were processed through Illumina’s Real-Time Analysis (RTA) software generating base calls and corresponding base call quality scores. CIDRSeqSuite 7.1.0 was used to convert compressed bcl files into compressed fastq files.
Secondary Analysis
Each independent cell-line was sequenced twice. RNA sequencing fastq files were pseudoaligned to Genome Reference Consortium Human Build 37 (GRCh37) using Kallisto56. 100 bootstraps were performed using Kallisto. The R package Sleuth was used for RNA sequencing analysis57 (Table S5). Lane was included as a covariate in the Sleuth model. Differentially expressed genes were defined as those with a P<0.05.
Integrated analysis of TFAM knockout methylation and expression
The linear-gwis method in FAST (genotype mode) was used to collapse TFAM KO methylation data into one gene level P-value per gene58. These gene-level methylation results were combined with gene-level gene expression results for the same gene using the Fisher P-value combination method to generate an integrated gene level Methylation/RNA sequencing P-value.
GO/KEGG Analysis
Each CpG was annotated with the nearest gene as defined by the closest gene which harbors the CpG within 1,500 bp of the transcriptional start site and extending to the polyA signal. A bias exists when performing gene set analysis for genome-wide methylation data that occurs due to the differing numbers of CpG sites profiled for each gene59. Due to this, we used gometh for GO and KEGG analysis since it is based off of the goseq method which accounts for this bias60. We analyzed our individual ARIC/TFAM datasets as well as our TFAM integrated (meth/expression) dataset. We also combined GO/KEGG results for ARIC, TFAM methylation and TFAM RNA sequencing using the Fisher P-value combination method to generate an overall combined P-value for each term. 10 stepwise cutoffs ranging from 75 CpGs to 300 CpGs were performed to ensure robustness of results. Final P-value cutoffs used for each analysis were as follows: ARIC Meta-Analysis (300 CpGs, P=5.24×10−12), TFAM Methylation (300 CpGs, P=4.41×10−4), TFAM Expression (169 genes, P=4.30×10−4), TFAM Integrated (Methylation/Expression) (188 genes, P=8.77×10−6).
All statistical analyses were performed using R (version 3.3.3).
AUTHOR CONTRIBUTIONS
Concept and design
Castellani, Guallar, Pankratz, O’Rourke, Coresh, Arking.
Acquisition, analysis, or interpretation of data
Castellani, Longchamps, Newcomb, Sumpter, Lane, Brody, Bartz, Grove, Fornage, Floyd, Bressler, Pankow, Tin, O’Rourke, Guallar, Pankratz, Taylor, Wang, Liu, Boerwinkle, Arking.
Drafting of the manuscript
Castellani, Arking.
Critical revision of the manuscript for important intellectual content
Castellani, Longchamps, Floyd, Liu, Tin, Fornage, O’Rourke, Brody, Pankow, Bartz, Arking.
Statistical analysis
Castellani, Longchamps, Lane, Brody, Liu, Guallar, Pankratz, Arking.
Obtained funding
Coresh, Guallar, Boerwinkle, Arking.
Administrative, technical, or material support
Newcomb, Sumpter, Grove, Bressler.
Supervision
Sotoodehnia, Levy, Guallar, Arking.
COMPETING INTERESTS STATEMENT
The authors declare no competing interests.
ACKNOWLEDGEMENTS
Infinium Methylation EPIC BeadChip array hybridization was performed at the UTHealth Human Genetics Center, The University of Texas, Houston, TX. Illumina sequencing was conducted at the Genetic Resources Core Facility, Johns Hopkins Institute of Genetic Medicine, Baltimore, MD. This research was supported by grant R01HL131573 from the US National Institutes of Health. Castellani was supported by a CIHR Postdoctoral Fellowship.
ARIC Acknowledgements The Atherosclerosis Risk in Communities study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services (contract numbers HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I and HHSN268201700005I). The authors thank the staff and participants of the ARIC study for their important contributions. Funding was also supported by 5RC2HL102419 and R01NS087541.
CHS Acknowledgements Infrastructure for the CHARGE Consortium is supported in part by the National Heart, Lung, and Blood Institute grant R01HL105756. The CHS research was supported by NHLBI contracts HHSN268201200036C, HHSN268200800007C, HHSN268201800001C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086; and NHLBI grants U01HL080295, U01HL130114, K08HL116640, R01HL087652, R01HL092111, R01HL103612, R01HL103612, R01HL111089, R01HL116747 and R01HL120393 with additional contribution from the National Institute of Neurological Disorders and Stroke (NINDS). Additional support was provided through R01AG023629 from the National Institute on Aging (NIA), Merck Foundation / Society of Epidemiologic Research as well as Laughlin Family, Alpha Phi Foundation, and Locke Charitable Foundation. A full list of principal CHS investigators and institutions can be found at CHS-NHLBI.org. The provision of genotyping data was supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR000124, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
FHS Acknowledgements Whole genome sequencing (WGS) for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). WGS for “NHLBI TOPMed: Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study” (phs000974.v1.p1) was performed at the Broad Institute of MIT and Harvard (HHSN268201500014C). Centralized read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1). Phenotype harmonization, data management, sample-identity QC, and general study coordination, were provided by the TOPMed Data Coordinating Center (3R01HL-120393-02S1). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed.
This work is supported by National Institutes of Health (NIH) contract N01-HC-25195 and HHSN268201500001I and grant R01 HL092577, also supported by intramural funding of Dan Levy, National Heart, Lung, and Blood Institute (NHLBI) (for DNA methylation profiling), and Trans-Omics for Precision Medicine (TOPMed) sponsored by NHLBI/NIH. The Framingham Heart Study thanks the study participants and the multitude of investigators who over its 70 year history continue to contribute so much to further our knowledge of heart, lung, blood and sleep disorders and associated traits.
The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services.