ABSTRACT
Genetic integrity of induced pluripotent stem cells (iPSCs) is essential for their validity as disease models and for potential therapeutic use. We describe the comprehensive analysis in the ForIPS consortium: an iPSC collection from donors with neurological diseases and healthy controls. Characterization included pluripotency confirmation, fingerprinting, conventional and molecular karyotyping in all lines. In the majority, somatic copy number variants (CNVs) were identified. A subset with available matched donor DNA was selected for comparative exome sequencing. We identified single nucleotide variants (SNVs) at different allelic frequencies in each clone with high variability in mutational load. Low frequencies of variants in parental fibroblasts highlight the importance of germline samples. Somatic variant number was independent from reprogramming, cell type and passage. Comparison with disease genes and prediction scores suggest biological relevance for some variants. We show that high-throughput sequencing has value beyond SNV detection and the requirement to individually evaluate each clone.
INTRODUCTION
Genetic variants influence cellular mechanisms, thus leading to specific phenotypic presentations in the organism, both in rare and common disease. Neurological disorders like Parkinson’s disease (PD) typically comprise both rare and common genetic risk variants with large and small effect sizes, respectively. Studying the pathomechanism in patient cells is often limited because the disease relevant tissues are not accessible. Human embryonic stem cells (ESC) can be differentiated into cells from all three germ layers (endoderm, mesoderm, ectoderm) but pose legal and ethical issues. In contrast, induced pluripotent stem cells (iPSCs) can be derived from adult tissues using exogenous expression of four transcription factors (POU5F1, SOX2, KLF4, MYC) and can be differentiated into somatic cells in vitro.1–3 Human iPSCs promise not only easy access to cells for scientists interested in disease modelling but also personalized medicine for patients affected by rare diseases.
While different protocols (non- / integrating viral, non-integrating non- / viral) for the generation of iPSC lines have been established, quality control (QC) during reprogramming, differentiation and culturing steps remains an area of active development.4 Loss of genetic integrity as a source of variability in iPSCs5 and in therefrom derived cells is a possible confounder compromising their validity as disease models. Certain genetic variants could be associated with increased risk of cancer or dysfunction when using these cells for regenerative therapeutic interventions. Indeed, tumorigenicity has been reported in transplanted stem cells,6 and a recently published clinical trial using autologous iPSC derived retinal cells7 was temporarily halted due to concerns of tumorigenic potential. Finally, somatic TP53 mutations previously identified in tumors were found in iPSC lines by applying exome sequencing.8 Taken together, a detailed characterization of genetic differences between donor and derived cells should be a central part of any iPSC-QC pipeline to ensure validity and safety.
Several groups and large consortia have studied the origin, quality and quantity of genetic variants found in iPSCs but absent from the donor’s germline.5,9–11 There is high variability in the methods used and the results reported. Also, the nomenclature for variants of different origin is inconsistent and often derives from research on cancer and developmental disorders.
Aneuploidies affecting the number of whole chromosomes in a cell are widely accepted as undesirable aberrations with potentially large effects in cells. Hence, conventional karyotyping is a standard QC measure used to detect these abnormalities in iPSCs. Similarly, somatic copy number variants (CNVs) like microdeletions and -duplications, typically comprising several genes or regulatory elements, are unfavorable. Although CNVs can be detected using chromosomal microarrays (CMA), this technique is not yet generally used to investigate iPSCs. High-throughput sequencing methods (“next-generation sequencing”; NGS) have enabled the exome and genome wide detection of single nucleotide variants (SNVs/indels). Several reports have shown considerable load of SNVs in iPSC.12–15
Here, we describe the ForIPS stem cell biobank resource, a national consortium with the primary goal to establish iPSC technologies to study molecular and cellular mechanisms involved in neurological disorders like PD. We present our approach to a stringent genetic workup, including conventional karyotyping, genetic fingerprinting and CMA in all cell samples. We report results of high coverage exome sequencing in a subset of this cohort selected to establish a suitable pipeline for iPSCs.
RESULTS
Characteristics of individuals included and iPSCs generated in the ForIPS biobank resource
The ForIPS study (Figure 1A) included 23 individuals (11 females and 12 males) of which 9 individuals (5 females, 4 males) were healthy controls without any neurologic disease (CT), 14 were patients affected (AP) by one of three neurological diseases: PD (1 female, 8 males), hereditary spastic paraplegia (HSP, SPG11 gene, OMIM #604360 and *610844; 3 females), monogenic intellectual disability (ID; 2 females). The age at donation of fibroblasts ranged from 22 to 73 years (y) with a median of 45y. In CTs the age range was 23 to 70y with a median of 45y, and in APs the age range was 22 to 73y with a median of 45.5y. The oldest subgroup included individuals with PD (age range 36 to 73y, median 54y). Nine individuals were members of 4 families: “J2C” and “JF” are father and son, “88H”, “O3H” and “82A” are siblings, “PT1” and “CT1” are siblings and “55O” and “G7G” also are siblings (Figure 1B; see also Figure S1 and File S1).
In the iPSC lines derived from fibroblasts, pluripotency was confirmed by positive staining for POU5F1 and NANOG for all iPSC lines, and fluorescence-activated cell scanning (FACS) analysis for TRA-1 −60 was positive for >90% of the cells in each line (Figure S1). All fibroblasts and iPSC lines generated in the ForIPS consortium, which passed these pluripotency criteria were send to genetic QC. A cell suspension from each culture was subject to an initial integrity screening (Figure 1A: “step 1”) using conventional karyotyping to detect aneuploidies and larger chromosomal aberrations. In this first QC step ~15%of iPSC cultures were discarded due to significant chromosomal aberrations (Figure S1 and File S2). In addition, DNA-based fingerprinting (PowerPlex assay) was employed to verify sample identity in most samples or was replaced by CMA based fingerprinting (Figure S1; File S2). Three iPSC lines did not match DNA from donor fibroblasts and were excluded from further analysis. For the remaining lines, fingerprinting matched with the respective fibroblast and with the reported donor sex. Samples which passed the first QC step were included into our subsequent studies (Figure 1B; File S1). This group included 72 primary iPSC lines with a median number of 3 iPSC lines per individual (range 2 to 6) and a median passage number of 14 at time of analysis (range 2 to 39). Forty-nine of these iPSC lines were generated by using integrating retroviral reprogramming (RiPSC) and 23 lines using non-integrating Sendai reprogramming (SiPSC) Yamanaka transcription factors.2,16 RiPSC had a higher median passage number of 15 (range 2 to 39) at analysis compared to 5 for SiPSCs (range 3 to 15). Four RiPSC lines from two individuals (“AY6”, “82A”) were differentiated into midbrain neuronal progenitor cells1(NPCs) and had a median passage number of 7.5 (range 5 to 13). To investigate the relationship between passage number and somatic variants, four RiPSC lines from the same individuals were cultured to higher passages of 30 and 40, respectively.
Detection of somatic CNVs by high density SNP-based CMA
In a second analysis step all study samples passing step 1 (23 fibroblast cultures, 49 RiPSCs, 4 hereof derived NPCs, 4 RiPSCs at passages 30 and 40, and 23 SiPSCs) were screened for CNVs with a high-resolution, single-nucleotide polymorphism (SNP)-based chromosomal microarray (CMA). We used the Affymetrix CytoScan HD array as it is an established and reliable tool in routine germline diagnostics at our Center for Rare Diseases.17,18 Array QC measures passed manufacturer recommended thresholds in 97.2% of analyzed samples (105/108). The CMA data for the other three samples were only marginally below these thresholds and after manual review considered to be of sufficiently good quality (File S2; Figure S3). The CMA for each analyzed culture was visually screened by a trained expert (M.K.) for aberrations d100 kilobases (kb) and absent from donor fibroblasts (Supplementary information). We identified a total of 93 sub-chromosomal CNVs with sizes ranging from 100 kb to 6.4 Mb (megabases) including 48 deletions and 45 duplications (Figure 2). Most aberrations (91/93) were smaller than the lower detection limit of 5 to 10 Mb typically assumed for G-banded karyotyping.19In addition, we observed trisomy of chromosome 12 in three RiPSC cultures (“Ì1JF-R1-002”, “Í1E4-R1-012”, “Ì1E4-R1-016”), twice only present in a sub-population of cells. In the SiPSC line “CT1-S1-010” we detected a copy number gain affecting all terminal markers on chromosome 17q. Despite its size of 5.9 Mb this CNV was not detectable by conventional karyotyping. The chromosomal position indicated the possibility of an unbalanced translocation which was confirmed by fluorescence in situ hybridization (FISH) analysis as a 14p/17q unbalanced translocation probably of somatic origin (Figure 3A, B). While karyotyping and CNV analysis based on intensity data of chromosome 9 showed unremarkable results in SiPSC line “i82A-S1-004”, SNP allele peak distribution uncovered a copy neutral allelic imbalance on the long arm of chromosome 9 indicating a ~30% sub-clonal cell population carrying a partial uniparental isodisomy (Figure 3C; Figure S2).
Next, we compared RiPSCs and SiPSCs to reveal method-specific differences: 58 somatic CNVs were detected in 34 of 49 (69.4%) RiPSCs, and 35 somatic CNVs in 17 of 23 (73.9%) SiPSCs. Only 15 of the RiPSCs (30.6%) and six SiPSCs (26.1%) showed no somatic CNVs. CNV size varied between 106 kb and 6.4 Mb in RiPSC, and between 100 kb and 5.9 Mb in SiPSC lines. The number of affected genes based on Genbank annotation varied between 0 and 139 with a higher variability in RiPSC lines. Three aberrations in RiPSC contained no genes, whereas all aberrations in SiPSC included genes. Our data showed no significant differences regarding number, size and gene content of somatic CNVs between RiPSC and SiPSC clones, indicating a comparable genetic cell quality (Figure 2A, B, C). Also, there was no significant difference between sexes, relatives- and affected-status confounding the analyses (Figure S3).
In four RiPSC clones cultured to higher passages we could not observe any CNV differences during passaging (File S3), and the average somatic CNV number aggregated per individual showed no correlation with passage number (Figure 2D). Additionally, the somatic CNV count was not correlating with the probands’ age at the time of biopsy (Figure 2E).
NPCs showed the same CNVs detected in the corresponding RiPSC clones indicating genetic stability during differentiation (Figure 2A, B, C; File S3). In the NPC culture derived from the RiPSC “i82A-R1-001” we observed two previously fixed CNVs which had lower intensities in the NPC compatible with a ~50% sub-population: A somatic deletion affecting the DLG2 gene and a deletion affecting the genes VCX and PNPLA4(Figure S2). This observation shows that the RiPSC culture was initially oligoclonal, and points to either selective pressure of culture conditions or random genetic drift introduced by manual picking as the cause of the allelic shift in this NPC culture.
Although the identified somatic CNVs were scattered throughout the genome (Figure 2F), we detected three regions representing possible, specific hotspots. First, two overlapping deletions affecting the CTNNA3 gene in 10q21.3 were identified in a RiPSC clone of individuals “88H” and “O3H”, respectively (Figure 3D, File S3). Second, three aberrations within the DLG2 gene were detected: two overlapping deletions in the iPSC clones “i82A-R1-001” and “i82A-R1-002” of “82A” as well as a duplication in the SiPSC clone “iK22-S1-001” of “K22” (Figure S2). Many smaller and overlapping aberrations in both regions were observed in healthy control individuals (Database of Genomic Variants20). Furthermore, a mosaic gain in 20q11.21 including the BCL2L1 gene was revealed in two different RiPSC clones of “PX7” and one clone of “1JF”.
Exome sequencing comparing iPSC and germline donor material to detect SNVs/indels
We selected a subset of samples for comparative exome sequencing with following inclusion criteria: (1) Availability of a germline DNA sample of the donor (blood) which was not a direct progenitor of the cultured cells (fibroblasts). (2) Availability of SiPSC, RiPSC and differentiated NPC lines of the same donor. (3) Access to higher passage samples of the lines. (4) Different affected status, age and sex. As the individuals “AY6”, “PX7”, “88H” and “82A” met these criteria, we selected a total of 34 samples (4 blood, 4 fibroblast, 8 RiPSC, 4 RiSPC passage 30, 4 RiSPSC passage 40, 6 SiPSC, 4 NPC). Exome sequencing on an Illumina HiSeq2500 machine and standard preprocessing resulted in aligned BAM files (Supplementary information) with a median on-target coverage of 163x (range 117x to 264x) and e95% of the exome target being covered by at least 20 reads (File S2).
Based on an initial feasibility test run with six exomes (File S1 and File S4; Figure S4; Supplementary information) and previous experience from pooled21 and somatic variant calling,22 we used the freebayes software,23 which simultaneously calls all classes of small nucleotide variants (SNVs = single nucleotide variants, MNPs = multiple nucleotide polymorphisms, indels = small insertions/deletions; when not specifically stated we use the term SNV/indel for all classes of small variants). All 34 exome samples were called together with 53 in-house controls from the same machine runs with freebayes and resulting variants were annotated with SnpEff.24 From here on we describe somatic variants obtained after applying hard filters to exclude variants with read evidence in the blood samples (Supplementary information; File S4). We considered resulting variants with alternate allele fractions (AF) e 30% as fixed somatic and variants with AF < 30% as low frequency somatic variants (File S4; Figure S4). We identified a median of 38 fixed (minimum 17, maximum 256) and 1651 low frequency (minimum 739, maximum 3988) somatic SNVs/indels per sample in the coding target regions. We only report the results for the fixed variants and did not perform orthogonal validation (e.g. deep amplicon sequencing or digital PCR) for the low frequency somatic variants (see Figure S4) as previously analyzed by others.13
In analogy to the CNV analysis, we investigated SiPSC, RiPSC and NPC exome data for reprogramming or differentiation specific effects. No significant differences were detected for somatic SNV/indel numbers between RiPSC and SiPSC clones or between RiPSC and their derived NPCs (Figure 4A, B). Notably, the variance was higher for RiPSC (Figure 4A), an effect resulting from specific cultures (compare Figure 5A, B) with a much higher variant load.
Like for CNVs, we found no correlation between somatic SNV/indel variant load and passage number (Figure 4C). In contrast to the CNV analysis, the somatic SNV/indel count aggregated per individual showed a strong positive correlation with the probands’ age at the time of biopsy (Figure 4D). However, this observation is influenced by above mentioned iPSC cultures from older (Figure 5).
Next, we analyzed specific properties of the identified somatic SNVs/indels. Variants predicted to have a moderate impact on gene function (mainly missense variants) represent the largest proportion of identified somatic variants (range 35% to 69%) per sample (Figure 5A). In most iPSC samples, somatic variants were mainly SNVs, with only a small portion of indels and MNPs identified. However, four samples showed an unusual high proportion of MNPs (Figure 5B). A closer examination of these samples (File S4) showed that the MNPs are mainly CC>TT dinucleotide mutations at dipyrimidines and that they additionally had an increase in C>T/G>A transitions (Figure 5C), both mutational signatures typical for ultraviolet light (UV) irradiation damage.25
Missense variants represented a large part of the identified somatic SNVs in the iPSC cultures. Compared to truncating variants their functional interpretation is difficult. We used different computational prediction scores to assess their potential pathogenicity. Interestingly the scores obtained for a large portion (CADD: 44,1%, M-CAP: 35,0%, REVEL: 12,6%) of these somatic missense SNVs are above the respective recommended pathogenicity thresholds (Figure 5D).26–28
Our exome study design with concurrent sequencing and analysis of blood germline and parental fibroblast culture samples enabled us to search for evidence of low frequency somatic variants in fibroblasts due to polyclonality (“somatic mosaicism”). While low frequency variants in bulk sequencing data are inherently noisy when analyzed alone, prior knowledge of a fixed variant in a descendent culture sample increases the locus specific probability of low frequency reads being bona fide somatic variants.13,29 Accordingly, the allele fraction (AF) for fixed variants in the analyzed iPSC cell cultures followed an expected normal distribution of around 0.5, while most of the variants with read evidence in the fibroblasts had a lower AF. In addition, variants at the lower coverage tails had a larger variance in AF influenced by random sampling (Figure 5E; Figure S4). We found a correlation between read coverage at somatic variant positions in the iPSC cultures and AF in the corresponding fibroblast culture, indicating that somatic variants at low AF can only be found in the fibroblast if sufficient read coverage is available. Using a simple binomial draw model, we demonstrate that most variants potentially identifiable as being present in the fibroblasts (somatic) indeed do have reads supporting them (Figure 5F). It is likely that the remaining somatic variants are still somatic but only present at a very low AF in the original fibroblast culture and that they were just not detectable by bulk exome sequencing13.
Multiple secondary analyses revealed additional iPSC culture characteristics
While the mitochondrial genome (“chrM”) is not targeted in most commercial exome designs, exome data still contain considerable mitochondrial coverage due to their high copy number in each cell. We calculated the average coverage of chrM (median 263x, minimum 66x, maximum 765x) and normalized it to the coverage of chromosome 1 (File S5). Fibroblast and R/SiPSC cell cultures showed a significantly higher mitochondrial genome dosage than NPC cultures and peripheral blood lymphocytes (PBLs) (Figure 6A).
Likewise, telomeric genomic regions are not targeted in exome designs but have a high relative coverage in the genome. We used two recently described software algorithms (telomerecat30, telomerehunter31) to compute the relative telomere content from exome data and to correlate it with the passage numbers. While the estimates from both algorithms showed a trend towards less telomere content in higher passages, these results were not significant (Figure 6B). It should be noted that the telomeric content of the 53 in-house exome controls used, when correlated with age, also showed a non-significant trend (Figure S5).
In our initial exome variant calling test in RiPSCs we identified variants in the POU5F1 gene locus absent from the parental fibroblast. These were confirmed to be single nucleotide variants from the integrated viral vector (Figure S6). We therefore excluded the genomic regions of all transcription factors used for reprogramming from variant calling (Supplementary information). When examining these regions, we noticed the coverage profile of the RiPSCs having sudden breaks at the exon-intron boundaries like the profile seen in RNAseq. In contrast, fibroblasts and SiPSCs show bell-like shapes over the capture probes, which is typical for capture-based enrichment (Figure 6C). Our observation indicated multiple genomic integrations (Figure S6) of the plasmid with intron-free transcription factor inserts used for reprogramming of the RiPSC lines.
We wondered whether algorithms for CNV detection from exome data could replace or supplement the widely accepted CMA analysis. The CNVkit algorithm32 uses intergenic reads to achieve a more uniform marker coverage across the genome. While several CNVs detected previously by CMA were also called from exome data using this software, several others were missed (Figure 6D; Figure S6; File S3).
Off-target reads can also be used to check sequencing data for DNA of microorganisms like mycoplasma or cross-individual contamination. We used the MinHash based BBSketch algorithm (https://jgi.doe.gov/data-and-tools/bbtools/) to screen our exome files for cell culture contamination but did not find any evidence for high-grade contamination (Figure S5; File S5). Similarly, we could exclude significant cross-individual contamination, a known problem in iPSC cultures.33 using the ContEst34 software (Figure S5; File S5).
DISCUSSION
Since the discovery of reprogramming methods for somatic cells into pluripotency, the stem cell field has rapidly progressed.23 Precise disease modelling and personalized treatment are some of the promises the iPSC technology is beginning to fulfill.7 Though advances are increasingly encouraging, there is still considerable heterogeneity in research practices.4,35 This is especially evident in genetic QC, which in recent years only has received systematic attention in large cohorts.5,9 Despite a wealth of available experience from pioneering genetic fields regarding rare diseases or cancer genetics, the community has not yet agreed upon common minimal standards for an iPSC line to be acceptable as a model and to be safe for therapeutic use. Here, we describe the application of diagnostic grade technologies to ensure genetic integrity for a collection of iPSCs and differentiated progeny cells from the ForIPS consortium.
We confirm the minimal standard of conventional karyotyping and genetic fingerprinting. G-banded karyotyping led to the exclusion of an appreciable proportion of cell lines with numerical chromosomal anomalies, at a comparable frequency with other reports.36 but also large structural chromosomal rearrangements, which are quite frequent in iPSCs (Figure 3A, B; Figure S1; File S2). While this technique is considered relatively cheap, it requires a lot of hands-on work and does not produce results in a computable electronic form. CMA analysis for copy number aberrations can also identify aneuploidies. However, chromosomal rearrangements in a balanced state would be missed (Figure 3C; Figure S1). Some groups perform optical mapping as an alternative screening method.14 Despite its currently higher costs and the need for specific DNA extraction methods, its higher resolution and computational accessibility might make optical mapping a method of choice for structural aberrations. Also, genetic fingerprinting proved to be a valuable first line QC step which allowed us to resolve sample mix-ups. While short tandem repeat (STR)-based methods, like the one we used, are widely employed for identity testing, these do not allow sample tracking in a complete genetic pipeline. A single nucleotide polymorphism based profiling panel for sample tracking37 would likely be more valuable for biobanks.
Our results using high density CMA showed that about 70%of iPSC lines have a detectable somatic CNV e100 kb, independent of the reprogramming method used (Figure 2A, B, C). This fraction is higher than in previous large reports,5,9,38 which can be attributed to variable CMA resolution and differences in filtering and analysis between the studies. Indeed, a smaller study using the same CMA platform we choose, did also find CNVs in a relatively large portion of iPSCs,39 We could point out several genomic regions affected by recurrent CNVs in iPSCs, explainable either by genetic fragility of the locus40or by proliferative survival advantage.41 By applying high coverage exome sequencing we identified SNV/indel variants in the coding gene regions in every iPSC line analyzed, independent of the reprogramming method used (Figure 4A). Interestingly, every primary iPSC line had at least one fixed somatic high impact (truncating) SNV/indel and several somatic missense variants of which a large portion was predicted as damaging to the protein function by different computational scores (Figure 5A, B, D). Several of the identified somatic variants affect genes implicated in cancer or monogenic diseases as well as genes with elevated expression in the brain (Table 1). These findings are well in line with previous reports.12 Our results suggest a functional impact of certain somatic variants in the iPSC lines. Together with the high variability in somatic variant load observed for all variant classes (Figure 2A, Figure 4A), even in isogenic lines, these observations signify that each line must be individually assessed before use in downstream experiments or therapeutic applications. In addition, we found no significant differences between integrating and non-integrating reprogramming methods regarding somatic CNVs (Figure 2A, B) and SNV/indel (Figure 4A) counts, thus supporting a recent publication for SNVs/indels.14 This information is of special value to researchers working with established RiPSC lines.
The relationship between culture passaging and somatic variants count has been controversially discussed in the literature. While early analyses have described a negative correlation between CNV count and passaging,42 recent studies using low resolution CMA5,12 or whole genome sequencing13 could not confirm this. Furthermore, an older study showed an increase in coding SNV counts from 7 to 13 for a single analyzed iPSC line between passage 9 and 40,43 Our results do not support a strong effect of passaging on either CNV or SNV/indel counts (Figure 2E, Figure 4C). The four NPC lines differentiated from RiPSCs in our study showed no additional CNVs (Figure 2A, B, C) and have not significantly acquired SNVs/indels during differentiation (Figure 4B). Together, these data argue against a strong effect of passage number on somatic variant count.
Based on increasing numbers of somatic CNVs in aging individuals as demonstrated in cancer studies,44–46 one would expect to find higher frequencies of this mutation type in iPSCs derived from older donors. Our results, however, demonstrated no significant correlation between donor age and somatic CNV count, confirming similar recent reports.5,12 In contrast to CNVs, somatic SNV/indel load in exome regions has been shown to linearly increase with donor age in iPSCs derived from peripheral blood mononuclear cells.12 We also confirm this observation in our iPSC sample collection derived from skin fibroblasts (Figure 4D). Altogether, our findings and the descriptions in the literature point to differences in the mutational mechanisms and cellular processes involved in the formation of somatic CNVs and SNVs/indels. Our results point to UV irradiation damage related somatic sub-clonality in the parental fibroblasts as a source for SNVs/MNPs and inter-culture variability (Figure 4A, D; Figure 5A, B, C). Recent studies suggest that most variants identified in iPSC, but absent from the donor germline, are already present in a subpopulation of the cells of origin.12,13,15 We also show extensive somatic mosaicism in the parental fibroblast cultures as a source for fixed somatic variants in iPSCs (Figure 5F). Considering the data regarding passaging, we propose that random genetic drift induced by colony picking from poly/oligoclonal cell cultures and not positive selection is a major cause of somatic variation in iPSC clones (Figure 1C). This model is very different from the typical situation in cancer, where few “driver” mutations pose a strong advantage47 in an environment of selective pressure, while most “passenger” variants are neutral (Figure 1C). The goal in iPSC research is not to find detrimental driver mutations but to produce intact cells resembling the donor, thus successful strategies in cancer and iPSC fields will differ.
Mitochondria are crucial for cellular senescence and pluripotency in iPSCs48. Differences in mitochondrial morphology, count49 and mitochondrial DNA (mtDNA) content50,51 during pluripotent stem cell reprogramming and differentiation have been reported.52 Our analysis of the mitochondrial genome content showed significant differences between PBLs, iPSCs and differentiated NPCs, but not between fibroblasts and iPSCs (Figure 6A). A similar method for relative quantification of mtDNA from exome data has recently been compared to gold standard methods.53 These data highlight the added value of high-throughput sequencing reads for complementary analyses with potential use in iPSC characterization. The application of our method in large studies will likely expand our current knowledge of mitochondrial function in iPSCs and their progeny. Our exemplary attempts to telomere content analysis, viral integration and CNV analysis from exome data show that these analyses are in principal possible but need further evaluation and calibration (Figure 6B, C, D). Albeit applicable to exome data, most of the described techniques will likely lead to better results using whole genome sequencing data.
In conclusion, we applied high-resolution diagnostic methods in a systematic pipeline to ensure genetic stability of iPSCs generated in the ForIPS consortium and confirmed several previous associations in an iPSC collection from diverse donors. Most importantly, we showed that different clones have a high variability regarding somatic variant load. This highlights that the genetic evaluation of each individual iPSC clone is fundamental prior to its use as model or for therapeutic purposes. A combination of karyotyping by optical mapping, CMA and exome sequencing will likely provide the best combination regarding cost and efficiency in the next years. As even the smallest variant classes can have detrimental effects on important genes (Table 1), we recommend an inspection of all iPSCs based on three pillars: karyotyping for balanced aberrations, CMA for CNV detection, and NGS to search for SNVs/indels. Ideally these analyses should be performed on the initial iPSC cultures in comparison to an independent germline sample to find the best iPSC line before using these for experiments and again on later derivatives to ensure validity of functional results before publication.
METHODS
Inclusion of subjects in the ForIPS resource
The ForIPS research consortium (http://forips.med.fau.de/) has established an institutional iPSC biobank resource to explore diseases of the brain, particularly PD. All reported iPSC lines with adequate consent have been registered in hPSCreg54. To exchange selected lines for research purposes the scientific board of the UKER biobank will consider each request.
Twenty-three individuals were recruited at the Department of Molecular Neurology (Universitatsklinikum Erlangen). All individuals were phenotypically examined by a clinician experienced with neurological diseases. PD patients were diagnosed by board-examined movement disorder specialists according to consensus criteria of the German Society of Neurology, which are similar to the UK PD Society Brain Bank criteria for diagnosis of PD55. Age at tissue donation, gender, ethnicity and family history were assessed. All participants gave written informed consent to the study prior to donating a skin biopsy from a typically sun unexposed area of the inner upper arm. From this biopsy, a fibroblast stock culture was created. Four individuals additionally donated PBLs for an independent germline DNA sample (Figure 1A). Symptomatic individuals had targeted genetic testing to exclude or confirm monogenic forms of PD, HSP and ID (see Supplementary information). Study approval was granted by the local ethics committee (No. 4485 and 4120).
Reprogramming, differentiation, culture conditions and genetic QC
Detailed methods used for generation of iPSC, differentiation of NPCs, cell culture conditions and for the genetic QC analyses performed are described in the Supplementary information.
AUTHOR CONTRIBUTIONS
A.Re. and B.W. conceived and supervised the study. B.P., M.K., B.W. and A.Re. conceived the methodology. Z.K., J.W., R.A., A.Ra. and F.E. provided patient samples and clinical data. S.P., A.S., J.G, R.A. and M.F. generated iPSC and NPC cultures and verified the pluripotency of iPSCs. C.K. and M.K. performed DNA extraction and genetic fingerprinting. A.B.E. and S.U. generated data for molecular karyotyping and high-throughput sequencing. U.T. and M.K. performed karyotyping and FISH analysis. B.P. and M.K. analyzed and interpreted the molecular data. B.P. and M.K. prepared figures and tables. B.P., M.K., B.W. and A.Re. wrote and edited the manuscript with input from all co-authors.
ADDITIONAL INFORMATION
The consent and ethics approval for the ForIPS study does not cover the deposition of identifiable germ line genomic data of study participants into public repositories. We follow the DFG (German Research Foundation) recommendations for safeguarding good scientific practice and thus internally archive all data for this study. We provide file checksums for all primary array and sequencing data (File S2). These shall be accessible for any legitimate request from the corresponding author (A.Re.). With future consent updates we plan to submit this genetic data to public repositories.
ACKNOWLEDGMENTS
We thank all participating individuals for donating materials. The authors thank Brigitte Dintenfelder and Michaela Kirsch for excellent technical assistance. This study was supported by the Bavarian Ministry of Education and Culture, Science and the Arts within the framework of the Bavarian Network for Induced Pluripotent Stem Cells: ForIPS. Additional support came from the Bavarian Molecular Biosystems Research Network: BioSysNet, the German Federal Ministry of Education and Research (BMBF: 01GQ113, 01GM1520A, 01EK1609B), the DFG funded research training group GRK2162 (B.W., M.R., J.W., A.R.), and the Interdisciplinary Centre for Clinical Research (University Hospital of Erlangen, E23 to A.R., E25 to B.W., J52 to M.R.).
Footnotes
↵6 Co-first author
Competing Interests The authors declare no competing interests.