Abstract
Accurate quantification of cellular and mitochondrial bioenergetic activity is of great interest in medicine and biology. Mitochondrial stress tests performed with Seahorse Bioscience XF Analyzers allow estimating different bioenergetic measures by monitoring oxygen consumption rates (OCR) of living cells in multi-well plates. However, studies of statistical best practices for determining OCR measurements and comparisons have been lacking so far. Therefore, we performed mitochondrial stress tests in 126 96-well plates involving 203 fibroblast cell lines to understand how OCR behaves across different biosamples, wells, and plates. We show that the noise of OCR is multiplicative, that outlier data points can concern individual measurements or all measurements of a well, and that the inter-plate variation is greater than intra-plate variation. Based on these insights, we developed a novel statistical method, OCR-Stats, that: i) robustly estimates OCR levels modeling multiplicative noise and automatically identifying outlier data points and outlier wells; and ii) performs statistical testing between samples, taking into account the different magnitudes of the between- and within-plates variations. This led to a significant reduction of the coefficient of variation across plates of basal respiration by 36% and of maximal respiration by 32%. Moreover, using positive and negative controls, we show that our statistical test outperforms existing methods, which either suffer from an excess of false positives (within-plates methods), or of false negatives (between-plates methods). Altogether, the aim of this study is to propose statistical good practices to support experimentalists in designing, analyzing, testing and reporting results of mitochondrial stress tests using this high throughput platform.
1. Introduction
Mitochondria are double membrane enclosed, ubiquitous, maternally inherited organelles present in most eukaryotic cells (1). They are mostly known as the powerhouses of the cell (2,3) due to their pivotal function in the cellular energy supply where ATP is generated by the mitochondrial respiratory chain in a process referred to as oxidative phosphorylation. Furthermore, mitochondria are involved in regulating reactive oxygen species (4), apoptosis (2), amino acid synthesis (5,6), cell proliferation (6), cell signaling (7), and in the regulation of innate and adaptive immunity (8). A decline in mitochondrial function, reflected by a diminished electron transport chain activity, is related to many human diseases ranging from rare genetic disorders (9) to common ones such as cancer (7,10), diabetes (11), neurodegeneration (12), and aging (3). One of the most informative tests of mitochondrial function is the quantification of cellular respiration, as it directly reflects electron transport chain impairment (9) and depends on many sequential reactions from glycolysis to oxidative phosphorylation (13). One of the last steps of cellular respiration is the oxidation of cytochrome c in complex IV which reduces oxygen to form water. Therefore, estimations of oxygen consumption rates (OCR) expressed in pmol/min, are conclusive for the ability to synthesize ATP and mitochondrial function, even more than measurements of intermediates (such as ATP or NADH) and potentials (16,17).
OCR was classically measured using a Clark-type electrode, which is time consuming, limited to whole cells in suspension and high yield, and does not allow automated injection of compounds (17). Also, experimentation with isolated mitochondria is ineffective because cellular regulation of mitochondrial function is removed during isolation (18). In the last few years, a new technology that calculates O2 concentrations from fluorescence (19) in a microplate assay format has been developed by the company Seahorse Bioscience (now part of Agilent Technologies) (20). It allows simultaneous real-time measurements of both OCR and ECAR in multiple cell lines and conditions, reducing the amount of required sample material and increasing the throughput (14,20).
Typically, OCR and ECAR are measured using the Seahorse XF Analyzer in 96 (or 24) well-plates at multiple time steps under three consecutive treatments (Fig. 1), in a procedure known as mitochondrial stress test (21). Under basal conditions, complexes I-IV exploit energy derived from electron transport to pump protons across the inner mitochondrial membrane. The thereby generated proton gradient is subsequently harnessed by complex V to generate ATP. Blockage of the proton translocation through complex V by oligomycin represses ATP production and prevents the electron transport throughout complexes I-IV due to the unexploited gradient, thus generating ATP-ase independent OCR only (Figs. 1A-B). Administration of FCCP, an ionophor, subsequently dissipates the gradient uncoupling electron transport from complex V activity and increasing oxygen consumption to a maximum level (Figs. 1A-B). Finally, mitochondrial respiration is completely halted using rotenone, a complex I inhibitor. There is still some remaining oxygen consumption that is independent from electron transport chain activity (Figs. 1A-B). This approach is label-free and non-destructive, so the cells can be retained and used for further assays (15).
OCR differences in the natural scale between the various stages of this procedure lead to the estimation of six different bioenergetics measures: basal respiration, proton leak, non-mitochondrial respiration, ATP-linked respiration, spare respiratory capacity, and maximal respiration (14,17) (Table 1). An increase in proton leak and a decrease in maximal respiration are indicators of mitochondrial dysfunction (17). ATP-linked respiration, basal respiration, and spare capacity alter also in response to ATP demand, which is not necessarily mitochondrion-related as it may be the consequence of deregulation of any cellular process altering general cellular energy demand.
Current literature describing the Seahorse technology addresses experimental aspects regarding sample preparation (22,23), the amount of cells to seed (23,24), and compound concentration in different organisms (13,22,25). However, studies regarding statistical best practices for determining OCR levels and testing them against another are lacking. The sole definition of bioenergetic measures varies between authors, as well as the number of time points in each interval (usually three time points, but in some cases: one (26), two (27), and four or more (11)); and whether differences (6,13,28), ratios (12,29), or both (24,25) should be computed. Consequently, comparison of results across studies is difficult. Moreover, statistical power analyses for experimental design are often not provided. Differences in OCR between biosamples (e.g. patient vs. control, or gene knockout vs. WT) can be as low as 12 – 30% (30–32). Therefore, to design experiments with appropriate power to significantly detect such differences, it is important to know the source and amplitude of the variation within each sample, and reduce it as much as possible.
We performed and analyzed a large dataset of 126 experiments in 96-well plate format involving 203 different fibroblast cell lines, out of which 26 were seeded in more than one plate (Table S1). The large amount of between-plate and within-plate replicates allowed us to statistically characterize the nature and magnitude of biases and random variations in these data. We developed a statistical procedure called OCR-Stats, to extract robust and accurate oxygen consumption rates for each well, which translates into robust summarized values of the multiple replicates within and between plates. The OCR-Stats algorithm includes automatic outlier identification and controls for well and plates biases, which led to a significant increase in accuracy over state-of-the-art methods.
Between-well and between-plate biases, as well as random variations, were found to be multiplicative. This motivated us for a definition of bioenergetics measures based on ratios: ETC-dependent OC proportion, ATPase-dependent OC proportion, ETC-dependent proportion of ATPase-independent OC, and Maximal OC fold change (Table 1).
We provide estimators for each instance and show that they are empirically normally distributed. This permitted the use of linear regression models for assessing the statistical significance of bioenergetics measures comparisons between two biosamples. Using positive and negative controls from individuals known to have mitochondrial respiratory defects, we show that OCR-stats outperforms currently used statistical tests, which either suffer from an excess of false positives (within-plates methods), or of false negatives (between-plates methods).
Furthermore, our study provides experimental design guidance by i) showing that between-plate variation largely dominates within-plate variation, implying that it is important to seed the same biosamples in multiple plates, and ii) providing estimates of variances within and between plates for each bioenergetic measure allowing for statistical power computations. A free and pose source implementation of OCR-stats in the statistical language R is provided at github.com/gagneurlab/OCR-Stats.
2. Results
2.1 Experimental design and raw data
We measured OCR, ECAR, and cell number of 203 dermal fibroblast cultures derived from patients suffering from rare mitochondrial diseases and control cells from healthy donors (normal human dermal fibroblasts - NHDF, Methods, Table S1). These were assayed in 126 plates, all using the same protocol (Methods). Also, 26 cell lines were grown independently and measured in multiple plates. We will refer to these growth replicates as different biosamples. The NHDF cell line was seeded in all plates for assessment of potential systematic plate biases. The corners of each plate were left as blank, i.e. filled with media but not cells, to control for changes in temperature (22). One common layout of a plate is depicted in Fig. 1C, showing how each biosample is present in many well replicates. We seeded between 3 and 7 biosamples per plate (median = 4). This variation reflects typical set-ups of experiments in a lab performed over multiple years. Then, we used the standard mitochondrial stress test assay (21) leading to four time intervals with three time points each and denoted by Int1 (before adding any treatment), Int2 (after oligomycin), Int3 (after FCCP) and Int4 (after rotenone) (Fig. 1A). We also flagged wells that did not react as expected to the treatments and discarded them from the statistical analysis (Methods).
2.2 Random and systematic variations between replicates within plates
Representative replicate time series are shown in Fig. 2A, with data from 12 wells for one biosample in a single plate depicting commonly observed variations.
First, outlier data points occurred frequently. We distinguished two different types of outliers: entire series for a well (e.g., well G5 in Fig. 2A) and individual data points (e.g., well B6 at time point 6 in Fig. 2A). In the latter case, eliminating the entire series for well B6 would be too restrictive and result in losing valuable data from the other 11 valid time points. Therefore, methods to detecting outliers considering these two possibilities must be devised.
Second, we noticed a proportional dependence of OCR value and variance between replicates (Fig. 2B), suggesting that the error is multiplicative. Unequal variance, or heteroscedasticity, can strongly affect the validity of statistical tests and the robustness of estimations. We therefore propose modeling OCR on a logarithmic scale, where the dependency between variance and mean disappears (Figs. 2B, 2C). Respiratory chain enzyme activities such as NADH-ubiquinone reductase have also been shown to obey log-normal distributions (33).
Third, we observed systematic biases in OCR between wells (e.g., OCR values of well C6 are among the highest, while OCR values of well B5 are among the lowest at all time points, Fig. 2A). Variations in: cell number, initial conditions, treatment concentrations, or fluorophore sleeve calibration can lead to systematic differences between wells, which we refer to as well biases. To investigate whether well biases could be corrected using cell number to a large extend as in (26), we counted the number of cells after the experiments using Cyquant (Methods). As expected, median OCR for each interval grows linearly with cell number measured at the end of the experiment (Spearman rho between 0.32 and 0.47, P < 2.2e-16, Fig. S1A). However, the relation is not perfect reflecting important additional sources of variations and also possible noise in measuring cell number. Strikingly, dividing OCR by cell count led to a higher coefficient of variation (standard deviation divided by the mean) between replicate wells than without that correction (Fig. S1B). This analysis showed that normalization for cell number by division by raw cell counts is insufficient and motivated us to derive another method to capture well biases. Finally, we found that sex does not significantly associate with OCR levels (Fig. S2), in agreement with (34).
2.3 A statistical model of OCR
Building on these insights, we next introduced a statistical model for OCR within plates. For a given biosample in one plate, we modeled the logarithm of OCR yw,t of well w at time point t as a sum of well biases, interval effects and noise, i.e.,: where αi(t)is the effect of the interval i(t) of time point t, βw is the relative bias of well w compared to a reference well, and εw,t is the error.
We defined the OCR levels θi as the expected log OCR per interval, averaged over all wells: where n is the number of wells.
Note that the well bias is modeled independently for each plate, i.e., the bias of a certain well in one plate is different from the bias of the well at the same location in another plate.
We present now the OCR-Stats algorithm. For a given plate:
Fit the log linear model (1) using the least-squares method, which consists in minimizing ∑w∑t(yw,t-ai(t)- βw)2, thus obtaining the coefficients αi, βw. Compute using (2).
For each time point t in interval i and well w, define the OCR residual: , which is used to identify outliers (Methods, Fig. S3).
Identify and remove well level outliers, fit again, iteratively, until no more are found (Fig. S3A-B).
Identify and remove single point outliers, fit again, iteratively, until no more are found (Fig. S3C-D).
Scale back to natural scale in order to compute the bioenergetics measures (e.g.: Basal respiration = exp(θl) - exp (θ4)), or take the ratio-based metrics (Tables 1 and 2).
2.4 Variations within plates
We were then interested in determining the amplitude of the OCR variation between wells inside each plate, in order to compute the number of wells needed to obtain robust estimates . Using only the controls NHDF, we computed the standard deviation of the logarithm of OCR across all wells for each plate j and interval.Then, we computed the median across plates, thus obtaining one value per interval . As we work in the logarithmic scale, the error in the natural scale becomes multiplicative and relative. The standard error of the estimates can be expressed as , where nw is the number of wells. The highest value of was 0.16, therefore cells should be seeded in 10 wells in order to get a relative error of 5%. This result is derived from variation after removing outliers, so considering that around 16.5% of wells were found to be outliers, ideally 10/(1 − 0.165) ≈ 12 wells should be used per biosample to get a relative error of 5%.
2.5 Variations between plates
After analyzing the OCR variation among wells inside plates, we set up to study the variation across multiple plates. Using data from the controls NHDF, we found that the variability between plates in all four intervals is much larger than between wells (Table S2, Fig. S4). We next asked whether a systematic plate bias exists that could be corrected for. We indeed observed a similar increase in OCR on interval 1 for both biosamples on plate #20140430 with respect to plate #20140428 (Fig. 3A). To test whether this tendency held across the repeated biosamples, we compared all replicate pairings with their respective NHDF controls and found a positive correlation (Fig. 3B). These differences can come from changes in temperature or the use of different sensor cartridges (13). Because the plate biases are systematic, they can be corrected for by using a log linear model (Methods). Nonetheless, the biases do not explain all the between plate variation, as the remaining variance is large (relative variance of the residuals: I1: 49.8%, I2: 51.6%, I3: 65.6% and I4: 55.9%). Therefore, when comparing two samples, it is important that they are seeded in the same plate, and that the test is performed multiple times.
2.6 Statistical testing for the comparison of biosamples
In order to compare bioenergetics measures of two biosamples, we first need to evaluate the suitability of testing using differences versus testing using ratios of the OCR levels in the natural scale. As there is a remaining cell number effect after correcting for well biases (Fig. 3C), we recommend testing using ratios of OCR levels (or differences in the logarithmic scale) (Table 3).
Subsequently, for any given OCR ratio b (eg. M/Ei - fold change), we test the differences of the log OCR ratios of a cell line f versus a control c using the following linear model: where db,f,pcorresponds to the difference of ratio b of a cell line f and the respective control on plate p. We solve it using linear regression, thus obtaining one value µb,fper each ratio b and cell line f. We then compare these µb,fvalues (which follow a t-Student distribution) against the null hypothesis µb,f =0 to compute p-values and confidence intervals (Figs. 4A, 4B, Methods).
2.7 Benchmark of OCR-Stats algorithm
In order to benchmark the OCR-Stats algorithm, we computed the coefficient of variation (standard deviation divided by mean) of the six bioenergetics measures in the natural scale of all repeated biosamples across plates for the following methods: i) the default Extreme Differences (ED) method (Methods) provided by the vendor, ii) the log linear (LL) corresponding to steps 1 and 2 of the OCR-Stats algorithm, iii) complete OCR-Stats (LL + outlier removal), and iv) OCR-Stats after correcting for plate effect (OCR-PE) using (4) (Methods).
Each step contributed to decreasing the coefficient of variation, obtaining a final significant reduction of 36% and 32% in basal and maximal respiration, respectively, from plate corrected OCR-Stats (OCR-PE) with respect to ED (P < 0.03, one-sided Wilcoxon test) (Fig. 5).
2.8 Benchmark of OCR-Stats statistical testing method
We applied OCR-Stats statistical testing, Extreme Differences plus Wilcoxon test within each plate (within-plate ED), and Extreme Differences plus Wilcoxon test across plates (across-plate ED) to obtain the M/Ei ratio and maximal respiration (MR) of all the 26 cell lines that were seeded in more than one plate (Methods). For every approach, we computed p-values for significant fold changes against the controls. Six of these cell lines come from patients with rare variants in genes associated with an established cellular respiratory defect, allowing for assessing the sensitivity of each approach (Table S3, (35–39)). Additionally, two cell lines (#73901 and #91410) that showed no significant respiratory defects in earlier studies (40,41) served as negative controls.
The within-plate ED method reported significantly higher or lower MR for 56 out of 69 (81.2%) biosamples with respect to the control (Fig. 4A, Table S3). Moreover, all 26 cell lines had one or more significant biosamples on every plate, and 11 cell lines had one or more not significant sample (Fig. 4A). These ambiguous results show the importance of testing using multiple plates and advocate for a more robust approach than within-plate ED.
One approach to evaluate samples seeded in multiple plates is to perform a Wilcoxon test on the ED values averaged per plate (across-plate ED, Methods). However, this requires at least five plate replicates in order to obtain significant results. Here, only one cell line, #78661, was found significant this way. On this data, OCR-Stats was much more conservative than within-plate ED and found only 7 out of 26 (26.9%) cell lines to have aggregated significantly lower M/Ei than the control, including all 6 positive control cell lines (Figs. 4A, 4B, Table S3). Moreover, OCR-Stats did not report significant M/Ei differences for the two negative controls. There was no evidence against the normality and homoscedasticity assumption of OCR-Stats as the quantile-quantile plots of the residuals aligned well along the diagonal (Figs. 4C, S5). Altogether, these results show that OCR-Stats successfully identifies and removes variation within and between plates, providing more stable testing results which translates into less false positives.
Discussion and conclusion
Mitochondrial studies using extracellular fluxes, specifically the XF Analyzer from Seahorse, are gaining popularity; therefore, it is of paramount importance to have a proper statistical method to estimate the OCR levels from the raw data. Here, we have developed such a model, the OCR-Stats algorithm, which includes approaches to control for well and plate biases, and automatic outlier identification. By doing so, we were able to significantly reduce the coefficient of variation of replicates across plates. Additionally, after analyzing the intra-plate variation, we suggest that the minimum number of wells replicates per biosample in a 96 well-plate should be 12.
We found that dividing cellular OCR by cell number was introducing more noise than was seen for uncorrected data. Here, we seeded always the same number of cells. Hence, the variations across wells that we observed in cell number at the end of the experiments are largely overestimated by noise in measurements. In other experimental settings in which different numbers of cells are seeded, we suggest to include an offset term to the model (1) equal to the logarithm of the seeded cell number to control for this variation by design. Also, the Seahorse XF Analyzer can be used on isolated mitochondria and on isolated enzymes, where a normalization approach is to divide OCR by mitochondrial proteins or enzyme concentration (42). However, as described here for cellular assays, robust normalization procedures require careful analysis.
We showed that there is roughly multiplicative bias between plates that can be controlled for to some extent by including control samples on every plate. To handle this plate bias, we proposed an extension of our within-plate robust linear regression approach adding a plate specific term. We demonstrated that OCR comparisons should be done using ratios rather than differences, as this eliminates sources of variation like cell number. We introduced a linear model, the OCR-Stats statistical testing, and showed that the results agree with previous results of patients diagnosed with mitochondrial disorders.
Significance with the OCR-Stats statistical algorithm can be reached by seeding a biosample in one plate only; provided there were other between-plate replicates to compute the inter-plate variance. Nevertheless, we still recommend performing at least 3 independent experiments of the same cell lines as one result alone can lead to wrong conclusions (Fig. 4A). Also note that a contaminated sample can increase the variability, affecting the significance of other samples. Therefore, it is important to detect them and discard them from further analysis.
In principle, OCR-Stats should be able to estimate ECAR levels. Similar analyses as performed here should be done beforehand in order to guarantee that the method is indeed applicable. Preliminary investigations suggest that the nature of noise (outliers, multiplicative) is similar than for OCR.
Methods
Biological material
All biosamples come from primary fibroblast cell lines of humans suffering from rare mitochondrial diseases, established in the framework of the German and European networks for mitochondrial disorders mitoNet and GENOMIT. The controls are primary patient fibroblast cell lines, normal human dermal fibroblasts (NHDF) from neonatal tissue, commercially available from Lonza, Basel, Switzerland.
Measure of extracellular fluxes using Seahorse XF96
We seeded 20,000 fibroblasts cells in each well of a XF 96-well cell culture microplate in 80 ml of culture media, and incubated overnight at 37°C in 5% CO2. The four corners were left only with medium for background correction. Culture medium is replaced with 180 ml of bicarbonate-free DMEM and cells are incubated at 37°C for 30 min before measurement. Oxygen consumption rates (OCR) were measured using a XF96 Extracellular Flux Analyzer (21). OCR was determined at four levels: with no additions, and after adding: oligomycin (1 μM); carbonyl cyanide 4-(trifluoromethoxy) phenylhydrazone (FCCP, 0.4 μM); and rotenone (2 μM) (additives purchased from Sigma at highest quality). After each assay, manual inspection was performed on all wells using a conventionally light microscope. Wells for which the median OCR level did not follow the expected order, namely, median(OCR(Int3)) > median(OCR(Int1)) > median(OCR(Int2)) > median(OCR(Int4)), were discarded (977 wells, 10.47%). It is important to notice that other cell lines, or cell lines under certain conditions may not react as expected to the standard treatments, so they should not be discarded. We also excluded from the analysis contaminated wells and wells in which the cells got detached (461 wells, 4.94%, Methods). All the raw OCR data is available in Table S4.
Cell number quantification
Cell number was quantified using the CyQuant Cell Proliferation Kit (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer’s protocol. In brief, cells were washed with 200 µL PBS per well and frozen in the microplate at −80°C to ensure subsequent cell lysis. Cells were thawed and resuspended vigorously in 200 µL 1x cell-lysis buffer supplemented with 1x CyQUANT GR dye per well. Resuspended cells were incubated in the dark for 5 min at RT whereupon fluorescence was measured (excitation: 480 nm, emission: 520 nm).
Extreme Differences (default) Method to compute bioenergetics measures
On every plate independently, for each well, on interval 1 take the OCR corresponding to the last measurement, on intervals 2 and 4 take the minimum and on interval 3 the maximum OCR value (14). Then, do the corresponding differences to estimate the bioenergetics measures. Report the results per patient as the mean across wells plus standard deviation or standard error, separately for each plate.
Outlier Removal
For each sample s and well w, compute the mean across time points of its squared residuals: thus obtaining a distribution r. Identify as outliers the wells whose rw> median(r) +5· mad(r), where mad, median absolute deviation, is a robust estimation of the standard deviation (Fig. S3A). We found that deviations by 5 mad from the median were selective enough in practice. Compute the vector of estimates using the remaining wells and iterate this procedure until no more wells are identified as outliers. It required 8 iterations until convergence and around 16.5% of all the wells were found to be outliers (Fig. S3B).
Single point outliers are identified in a similar way. After discarding the wells that were found to be outliers in the previous step, categorize as outliers single data points whose (Fig. S3C). Iterate until no more outliers are found. It required 19 iterations until convergence and approximately 6.1% of single points were found to be outliers (Fig. S3D).
Plate effect model
In an attempt to correct for plate effect, we propose a log linear model where the levels er depend on interval i, samples s and plate p: thus obtaining one coefficient βi,pfor each plate-interval combination. These effects are added to the previous estimates: obtaining the final estimates As for (1), the model is solved using linear regression.
For benchmarking, we cannot test using the estimates , because we would fall into circularity, as correcting using βi,pforces replicates to have a closer value. Therefore, just for benchmarking purposes, we correct for plate effect using only the data from the controls NHDF c of each plate, namely: We solved it using linear regression and used the effects as offsets in (1). Then, we recomputed values accordingly. We scaled back to natural scale to calculate the bioenergetics measures and the coefficient of variation of all repeated biosamples (except the control).
Multi-plate averaging method
In case of inter-plate comparisons, the multi-plate averaging methods takes the average and standard error of the bioenergetics measures obtained using the ED method of all repeated biosamples across plates (Agilent Technologies, 2016).
OCR-Stats statistical testing
To evaluate the OCR ratios between a sample f and a control, both located on a plate p, we use the corresponding tested difference d (Table 3). We define where i and j are any two different intervals. From there, we can obtain a t-statistic: , where d0 = 0 as that is the value that we want to compare μ against, and se is the standard error. The t-statistic follows a t-distribution with n − 2 degrees of freedom, from which we can compute p-values. Moreover, we can obtain confidence intervals: where (1 − α) is the confidence level and the (1 − α/2) quantile of the tn − 2 distribution. Note that the normality assumption holds for the residuals Eb,f,p(Figs. 4C, S5).
Author contributions
J.G. and H.P. planned the project and overviewed the research. H.P. designed the experiments. V.A.Y. curated and analyzed the data. J.G. devised the statistical analysis. L.S.K., A.I., E.K., M.G., R.K., and A.N. performed the mitochondrial stress test experiments and cell number quantification. V.A.Y., L.W. and J.G. made the figures. V.A.Y. and J.G. wrote the manuscript. All authors performed critical revision of the manuscript.
Acknowledgements
We would like to thank Daniel Bader, Žiga Avsec, Jun Cheng and all members from the Gagneur Lab for valuable discussions and manuscript revision. This study was supported by the German Bundesministerium für Bildung und Forschung (BMBF) through the German Network for mitochondrial disorders (mitoNET, 01GM1113C to H.P.) E-Rare project GENOMIT (01GM1207 to H.P.), the Juniorverbund in der Systemmedizin ‘mitOmics’ (FKZ 01ZX1405A J.G., L.W. and V.A.Y.) and the DZHK (German Centre for Cardiovascular Research, L.S.K.). A Fellowship through the Graduate School of Quantitative Biosciences Munich (QBM) supports V.A.Y. H.P. is supported by EU FP7 Mitochondrial European Educational Training Project (317433). J.G., V.A.Y., L.S.K. and R.K. and H.P. are supported by EU Horizon2020 Collaborative Research Project SOUND (633974). We thank the Cell lines and DNA Bank of Pediatric Movement Disorders and Mitochondrial Diseases of the Telethon Genetic Biobank Network (GTB09003).