Abstract
Genetic susceptibility to Intellectual disability (ID), autism spectrumdisorder (ASD) and schizophrenia (SCZ) often arises from mutations in the same genes, suggesting that they share common mechanisms. We studied geneswith de novo mutations in the three disorders and genes implicated by a SCZ genome-wide association study (GWAS). Using biological annotations and brain gene expression, we show that the type of mutation explains enrichment patterns. Across disorders, genes with loss of functio mutations and genes with missense mutations were enriched with different pathways, shared with genes highly intolerant to mutations. Expression patterns in the brain account for differences between disorders. Compared to ID, ASD genes are preferentially expressed also in fetal cerebellum and striatum; genes associated with SCZ were most significantly enriched in adolescent cortex. Our study suggests that convergence across neuropsychiatric disorders stems from pathways that are vulnerable to genetic variations, but the spatiotemporal activity of genes contributes to specific phenotypes
Introduction
The successful genetic dissection of Autism spectrum disorder (ASD), intellectual disability (ID) and schizophrenia (SCZ) has resulted in the discovery of a large number of candidate genes (De Rubeis et al., 2014; Hamdan et al., 2014; Iossifov et al., 2014; O’Roak et al., 2012; Rauch et al., 2012). Surprisingly, analysis has suggested that the same genes and the same biological pathways are involved in more than one disorder. First, significant overlap has been found between genes with de novomutations in ID, ASD and SCZ (Fromer et al., 2014; Hoischen et al., 2014; McCarthy et al., 2014). Second, similar pathways have been identified for each of the three disorders: chromatin regulators and synaptic proteins, especially glutamatergic synapses, are pathways affected by genes with de novo mutations in ID, ASD and SCZ (Ben-David and Shifman, 2013; De Rubeis et al., 2014; Fromer et al., 2014; Hamdan et al., 2014; McCarthy et al., 2014; Parikshak et al., 2013; Willsey et al., 2013). Third,in all three diseases, de novo mutations affect genes intolerant to mutations (so-called constrained genes). de novo loss of function (LoF) mutations in ASD and ID are enriched in a set of 1,003 genes that are significantly depleted from mutations in the generalhuman population (and are presumably under the most significant selective constraint) (Samocha et al., 2014).
While analyses of each disorder separately have identified shared pathways between genes, a systematic cross-disorder comparison has not yet beenperformed. Such an analysis could yield further insight into the shared, convergent etiology, but critically it could also identify distinct, divergent features that hallmark the genes associated with each of the differentdisorders.
In this study, we investigate the convergence and divergence between genes associated with ID, ASD and SCZ using a methodology applied for individual disorders (Ben-David and Shifman, 2012; Gilman et al., 2012, 2011; Gulsuner et al., 2013; O’Roak et al., 2012; Willsey et al., 2013). To make the analyses comparable and to avoid ascertainment bias, we studied genes identified using the same experimental strategy in each disorder. We focused on functional de novo mutations from exome sequencing studies as they represent a set of prime candidate genes.Our analyses revealed specific pathways that are associated with LoF mutations and different pathways with missense mutations across disorders. Despite the shared attributes, unique patterns of expression in the brain was identified for genes disrupted by de novo mutations in ASD and ID, as wellas for genes in GWAS loci for SCZ.
Results
Enrichment of de novo mutations depending on diagnosis, mutation class and se
We collected sets of genes with coding de novo SNVs identified by genome wide screens in ID, ASD and SCZ (Table S1). The data included exome sequencing of 195 ID, 3,953 ASD, and 1,027 SCZ cases. As a control, we included de novo mutations identified in unaffected siblings of ASD or SCZ individuals (n =1,995), and unrelated typically developing individuals (n =34). These mutations were divided into three classes based on their expected severity: (1) LoF (nonsense,splice site and frameshift), (2) missense (non-synonymous SNV and indels that did not result in a frameshift), and (3) synonymous mutations. Since most synonymous mutations are expected to be silent, across this study we treated the synonymous mutation as a negative control to the mutations morelikely to be functional (LoF and missense). Our strategy, focusing on de novo mutations, excludes recessive mutations, which were mostly identified for ID by homozygosity mapping (Najmabadi et al., 2011).
We calculated for each disorder and each mutation class the mutation rate per individual and tested the ratios between functional and non-functional mutation rates to correct for experimental confounders. We replicated the higher rate of de novo mutations in ID, and found a significant enrichment for missense mutations in SCZ that was not reported before (Figure 1A). LoF mutations were significantly enriched in ID (FDR corrected P = 5.5×10−8) and ASD (FDR corrected P = 6.0×10−7), but not in SCZ (FDR corrected P = 0.063). For missense mutations, the ratio was significantly higher compared to the control in ID (FDR corrected P = 4.7×10−5), ASD (FDR corrected P = 3.1×10−4) and SCZ (FDR corrected P = 2.3× 10−4). Consistent with a previous study (Fromer et al., 2014), the rates of both LoF and missense mutations were higher in ID compared to ASD (FDR corrected PLOF = 1.7×10−6, FDR corrected Pmissense = 3.1×10−4) and SCZ (FDR corrected PLoF = 6.0×10−7, FDR corrected Pmissense = 0.0027). No significant difference was found between ASD and SCZ (FDR corrected PLoF = 0.17, FDR corrected Pmissense = 0.090).
Previous studies in ASD reported a higher rate of de novo SNVs in females compared to males (De Rubeis et al., 2014; Iossifov et al., 2014). We found evidence that de novo LoF mutationswere at higher rate in females not only in ASD, but also across disorders (all uncorrected P < 0.05). Surprisingly, de novo missense mutations were at significantly higher rate in males with ID, but not in ASD or SCZ (Figure 1B).
We next tested the overlap in genes between disorders. We found a significant overlap for both LoF (observed to expected ratio [O/E] =2.94-15.24) and missense mutations (O/E = 2.24 - 2.80) between disorders (Table 1). The most significant overlap for genes with LoF mutations was between ID and ASD (O/E = 15.24), and for genes with missense mutations between SCZ and ASD (O/E = 2.80). In some pairwise comparisons,there was a small but significant overlap between genes identified in the disorders and the controls (O/E =0-1.8). For synonymous mutations, there was no significant overlap between conditions (all Pvalues >0.05).
Genes with missense mutations are highly connected in the protein-protein interaction network while genes with LoF mutations show less variation in gene expression
There are differences between genes in the sensitivity to functional mutations. We compared the degree of intolerance to mutation (constraint) for genes with different class of de novo mutations and between the disorders. For genes with LoF and missense mutations there was a significant difference in the average constraint score among the differentconditions (PLOF = 4.9×10−9, Pmissense = 6.7×10−8), but this was not significant for synonymous mutations (P = 0.39) (Figure 2A). Theaverage constraint score for genes with LoF and missense mutations was significantly higher for ID, following by ASD and SCZ, which were not significantly different from each other, but were significantly different from the controls (Post-hoc tests, Table S2). Relative to the control, the overlap between the 1,003 constrained genes and genes with mutations was significant across the three disorders for LoF and missense mutations (all FDR corrected P < 0.05), but not for genes with synonymous mutations (all FDR corrected P > 0.05) (Figure 2B).
Since genes mutated in ID, ASD and SCZ are under higher selective constraint, we asked why those genes were sensitive to mutations. We hypothesized that the genes may code for proteins having more protein-protein interactions (PPIs). It has been previously shown that mutations in highly connected proteins (hub proteins) are more likely to be lethal (Jeong et al., 2001). Consistent with our hypothesis, we found a significant positive correlation between the median weightednumber of interactions and the level of constraint across disorders and mutation types (Pearson correlation r = 0.92, P = 2.4×10−5, Figure 3A). The level of constraint and the number of interactions were also positively correlated across all genes (r = 0.24, P < 2.2×10−16), and the number of interactions for constrained genes was significantly higher compared to non-constrained genes (P < 2×10−16) (Figure 3B).
We expected the genes with functional mutations to be enriched with hubproteins, but we found this to be significant only for missense mutations.We tested the difference between the number of interactions per protein inthe three disorders compared to the control. For genes with synonymous and LoF mutations, there was no statistical difference between any of the three disorders and the control (Figure S1). In contrast, there was significant higher number of interaction partners for genes with missense mutations in ID, ASD and SCZ (FDR corrected PID = 0.0066, PASD = 0.018, PSCZ = 0.0072) (Figure 3C-E).
Pathogenic missense mutations may be especially associated with hub proteins, because changes in the protein sequence may lead to changes in the ability to interact with other protein partners. Genes may also be under strong selective constraint if the level of expression is critical for proper function (dosage-sensitive genes). Under this hypothesis, we expect constrained genes, as well as genes with pathogenic LoF mutations to show lowvariation in gene expression. To test this, we studied previously published single-cell gene expression from human brain (Darmanis et al., 2015). For each gene, we calculated the coefficient of variation (CV) in expressionacross single-cell neurons. Consistent with our prediction, constrained genes had on average a significantly lower variation in expression relative to unconstrained genes (P < 2×10−16) (Figure 4A). Similarly, genes with LoF mutations in ID and ASD had significantly lower variation in expression relative to controls (FDR corrected PID = 1.4× 10−4, PASD = 0.0053), but this was not significant for SCZ (FDR corrected PSCZ = 0.11) (Figure 4B). There was a significant difference in expression variation for genes with missense mutations only for ID (FDR corrected P = 0.0026) (Figure 4C), and no significant difference for synonymous mutations between conditions (FDR corrected P = 0.47) (Figure 4D). Thus, our analysis supports a pathogenic role in all three disorders for mutations in genes intolerant to functional mutations, possibly because some of those genes code for ‘hub’ proteins or proteins sensitive to changes in expression levels (haploinsufficient).
Constrained genes are involved in generation of neurons, expressed in multiple brain regions, but most significantly in early mid-fetal cortex
Since constrained genes are not linked to any specific phenotype, we wanted to characterize their biological processes and expression patterns in the brain. We found that the constrained genes were enriched for biological processes that included generation of neurons, axon development, neurogenesis, and chromatin modification (FDR corrected P < 5×10−31) (Figure 5A). In addition, the constrained genes were significantly associated with neurodevelopmental phenotypes in humans, including neurodevelopmental abnormality, global developmental delay and cognitive impairment (FDR corrected P < 6×10−11).
We next evaluated whether constrained genes are enriched in specific brain regions and developmental stages. We used a previously published method (Dougherty et al., 2010) that gives a measure of enrichment of a gene in a tissue or cell type (specificity index probability (pSI) statistic). We used a pSI dataset based on human brain gene expression to test for enrichment of constrained genes in six different brain region (amygdala, cerebellum, cortex, hippocampus, striatum, thalamus) and ten developmental periods (from early fetal to young adulthood). The constrained genes were significantly enriched (FDR corrected P <0.05) across many brain regions (all regions excluding the hippocampus) and across different stages of development, but mid-fetal cortex was the most significant (FDR corrected P =2.6×10−17) (Figure 5B). The cortex stood up as the brain region most enriched with constrained genes (FDR corrected P = 1.1×10−13) (Figure 5C), as well as three critical developmental periods, early fetal, early mid-fetal and adolescence (FDR corrected P = 2×10−8-4×10−3) (Figure 5D).
Preferential expression of genes mutated in ID and ASD in specific brain regions and developmental stages
We next studied the enrichment of genes mutated in the disorders acrossmultiple brain regions and developmental stages (Figure 6). In the case of ID, the enrichment was restricted to the cortex. ID genes with LoF mutations were most significantly enriched in early mid-fetal cortex (FDR corrected P = 0.024) (Figure 6A), while genes with missense mutations showed the strongest enrichment in young adulthood cortex (FDR corrected P = 0.0060) (Figure 6B). Genes with missense mutations in ASD did not show significant enrichment to any brain region. Consistent with previous studies (Parikshak et al., 2013; Willsey et al., 2013), genes with LoF mutations in ASD showed a significant enrichment in early mid-fetal cortex (FDR corrected P = 0.0019) (Figure 6C), a region that was the most significantly associated with ID genes disrupted by LoF mutations and constrained genes.
However, in contrast to ID, the enrichment in ASD was not restricted tothe cortex. Two additional regions were enriched in ASD genes with LoF mutations: early mid-fetal striatum (FDR corrected P = 0.0019) and early fetal cerebellum (FDR corrected P = 0.0019) (Figure 6C). Since there is a significant overlap between genes with LoF mutations in ASD and ID, we retested the enrichment excluding the overlapping genes. Despite the smaller number of non-overlapping genes in ID and ASD with LoF mutations (nID = 39, nASD = 506), the enrichment pattern did not changein a noteworthy way in both cases (Figure S2). Unlike ASD or ID, in SCZ none of the regions showed significant enrichment, after correcting for multiple testing. The most nominally significant enrichment was in the cerebellum during middle-late childhood, when testing genes with missense mutations (nominal P = 0.030, FDR corrected P = 0.62) (Figure 6D). Gene with synonymous mutations and genes with mutations in the control did not show any significant enrichment.
We next analyzed the genes mutated in the three disorders for enrichment of biological processes (Figure S3). Genes with LoF mutations in ID wereenriched for chromosome organization and regulation of transcription (FDR corrected P = 0.013). Genes with LoF mutations in ASD were enriched for similar biological processes, including chromatin modification (FDR corrected P = 1.6×10−10) and chromosome organization (FDR corrected P = 4.7×10−8). Genes with LoF mutations in SCZ did not show any significant enrichment. Genes with missense mutations across disorders were significantly enriched with similar and overlapping biological processes related to neuron development (FDR corrected P = 3.4×10−5-1.7×10−13) (Figure S3).
The analysis of gene co-expression network in the developing human cortex show similar patterns for ID and ASD genes, strongly depending on mutation class
Our analysis points to enrichment in fetal cortex for genes with functional mutations in both ID and ASD. Since previous studies showed that ASD genes are co-expressed in fetal cortex, we wanted to test the specificity of the enrichment by comparing between ID and ASD. We used a weighted geneco-expression network analysis based on gene expression from cortical development that divided the genes into 18 different modules (labeled M1-M18) (Parikshak et al., 2013;). The statistical power to detect enrichment may differ considerably between disorders as the power is a function of the degree of enrichment, but is also a function of the sample size (number of genes in each category) and the fraction of genuine risk genes among the list of mutated genes. To alleviate this problem and directly assess the convergence of mutations in the different disorders to the same modules, we calculated the overall correlation between the enrichment strength in each module, measured as-log10 of the nominal P-values, across the different disorders and the different mutation categories. We used thecorrelation as the input for hierarchical clustering, allowing us to globally survey how relatedare the different disorders and mutation classes (Figure 7A).
We observed a strong relationship between ASD and ID genes highly influenced by the mutation class (Figure 7A, 7B). Specifically, the most significant correlation is observed between genes with LoF mutations in ID and ASD (r = 0.93, FDR corrected P = 1.5×10−7) (figure 7B). Genes with missense mutations in ASD are most correlated with missense mutation in ID (r = 0.67, FDR corrected P = 0.015). The correlations between genes with LoF and missense mutations in ASD (r = 0.53) or ID (r = 0.44) was not significant (FDR corrected P > 0.05).
When looking at specific modules (Figure 7C),the most significant enrichment was in the M2 module for genes with LoF mutations in ID (FDR corrected P = 0.029) and ASD (FDR correctedP = 0.0018), as well as for constrained genes (FDR corrected P = 3.2×10−12). Based on functional annotations the M2 module is enriched for chromatin modification (FDR corrected P = 8.4×10−12), chromosome organization (FDR corrected P = 8.8×10−9), and genes known to cause embryonic lethality in mice (FDR corrected P = 4.2×10−6). Across all modules, there was no significant enrichment for SCZ or for genes with synonymous mutations, nor for mutations in the control (Figure 7C).
A previous study included genes with recessive inheritance in the analysis of ID (Parikshak et al., 2013;). Based on the sensitivity of our analysis to mutation class, our hypothesis was that genes sensitive to mutations in both copies would be involved in different biological processes. To explore this possibility we tested the enrichment of ID autosomal recessive genes in the WGCNA modules, as well as for functional annotations. Consistent with our hypothesis, the recessive ID genes were not significantly enriched in any module (all FDR corrected P > 0.05), and were enriched for very different functional annotations. The biological processes annotations enriched for autosomal recessive ID genes were mainly related to metabolism (e.g. organic acid metabolic process and oxoacid metabolic process) (Figure S4). A similar functional annotation enrichment for metabolic process was observed for genes associated with autosomal recessive inheritance annotation in the Human Phenotype Ontology (HPO).
Genes implicated by schizophrenia genome-wide association preferentially expressed in the cortex during adolescence and in DRD2 medium spiny neurons
Since the analysis of de novo mutations in SCZ did notyield a conclusive enrichment in the brain, we turned to analyze common variants associated with SCZ. A genome-wide association study (GWAS) of 36,989 SCZ cases identified 108 loci significantly associated with a small increase in SCZ risk (Ripke et al., 2014). Many of the loci identified by SCZ GWAS contain multiple SNPs in linkage disequilibrium and more than one gene. We therefore used a multistep procedure to prioritize the most likely causal genes.
Testing the list of genes for enrichment of functional annotations revealed a significant enrichment of multiple relevant biological processes and cellular component, such as synaptic transmission, generation of neurons, somatodendritic compartment, and synapse (FDR corrected P < 8×10−4) (Figure S5). The 60 genes that were included in at least one of the significant annotations were considered asthe most likely SCZ candidate genes (Table S3). We found that the 60 genesare preferentially expressed in specific brain regions, developmental stages, and in neural cell types. Across brain regions and developmental stages the most significant enrichment was for the cortex during adolescence and young adulthood (FDR corrected P = 0.004) (Figure 8A), and across cells for DRD2+ (FDR corrected P = 0.008) and DRD1+ medium spiny neurons (FDR corrected P = 0.01) (Figure 8B).
Discussion
Our analyses of genes that contain loss of function, de novo mutations in ID, ASD and SCZ revealed the existence of different enrichment patternsfor different class of mutation. This enrichment is not related to a specific disorder. Table 2 summarizes our findings and shows the molecular and neuronal processes vulnerable to specific types of mutation. The three disorders show divergence in the specific brain regions and developmental periods affected by genetic variations, consistent with differences in their age of onset.
Our results point to molecular convergence across disorders that in part is due to the fact that de novo mutations affect only one copy of each gene; in other words, the functional mutations have dominant phenotypes. Genes sensitive to LoF or missense mutations influence different types of sensitive pathways that are involved in brain development and function. Our study shows that the analysis of functional de novo mutations is not only influenced by disorders-specific mechanisms, but also captures the processes that are sensitive to specific class of mutations.
Our results indicate that the mechanisms leading to specific disorders are sensitive to different class of mutations. The sensitive processes, shared across disorders occur at different levels. First, at the molecular level, highly connected ‘hub genes’ are more sensitive to perturbations (Jeong et al., 2001). A gene can be highly connected because the protein product has many interactions with other proteins or because it has a role in regulation of multiple genes via transcription or translation. Our analysis suggests that genesinvolved in neuron development that are also highly connected in the PPI network are more affected by missense mutations, possibly because those mutations affect protein structure or function. Genes sensitive to LoF mutations are characterized by being chromosome and chromatin organizers and by having low variation in gene expression.
In the analysis of gene co-expression, the correlation between ID and ASD depended on mutation class. It suggests that genes intolerant to specific types of functional mutations are co-expressed during development. For instance, the M2 module that is enriched for chromatin regulators was unsurprisingly associated specifically with genes with LoF mutations in both ASD and ID. The three modules that are enriched for genes with functional mutations in ASD are also enriched with constrained genes. Thus, previous findings that genes with LoF mutations in ASD coexpressed in fetal cortex (Parikshak et al., 2013; Willsey et al., 2013) do not necessarily reveal ASD specific mechanisms, but instead expose the type of genes and processes that are sensitive to LoF mutations. This is also supported by the analysis of ID recessive genes that point to a completely different pathway that is sensitive to mutations affecting both copies of the gene.
Second, at the level of the brain, the analysis of ASD and ID genes together with constrained genes suggests that the cortex, especially during early development, is the most vulnerable region to LoF mutations. It was previously suggested that the cortex might be more vulnerable to genetic mutations because of the intense evolution in the recent lineage leading to humans, and the insufficient time to evolve enough buffering capacity (McGrath et al., 2011). We found that across different brain regions, the genes intolerant to mutations are preferentially expressed in early fetal and to a lesser degree during adolescence, suggesting that this two periods of brain development are the most sensitive to genetic insults.
Third, our analysis across disorders suggests that males are more vulnerable to LoF mutations, but not necessarily to missense mutations. The trend of increasing rate of LoF mutations in females is consistent with biological differences between the sexes and with more robust brain development in females (Suliman et al., 2014).
Despite the extensive convergence across disorders discussed above, we can still identify unique patterns of preferential expression in the brainfor the different disorders. In ID, the enrichment of genes with functional mutations was restricted to the cortex in both fetal and young adulthood, depending on the type of mutations. Genes with LoF mutations in ASD showed enrichment not only in fetal cortex, but also to additional brain regions-the cerebellum and striatum. The difference from ID in the enrichment patterns suggests that altered function of multiple brain regions account for the range of cognitive, social and restrictive behaviors seen in ASD.
The analysis of genes located in the 108 SCZ loci suggests that common variants associated with SCZ affect synaptic and neuronal processes that are active in the cortex during adolescence and that influence dopaminergicneurons. These results elucidate potential causative mechanisms, which areconsistent with SCZ age of onset as well as with the classical theories ofcortical dysfunction and the role of dopamine in SCZ (Howes and Kapur, 2009; Winterer and Weinberger, 2004). Not only that antipsychotic drugs increase presynaptic dopamine metabolism and block dopamine D2 receptors, but also brain imaging and postmortem studies have found strong evidence for a dopaminergic hyperfunction in the striatum of patients with SCZ (Howes and Kapur, 2009; Winterer and Weinberger, 2004).
Our analysis provides evidence that dopamine is not only part of the pathophysiology of the disorder but also involved in its etiology. Since extensive GWAS loci are not available for ASD or ID, we cannot directly test the specificity of the results. However, a recent GWAS study of educational attainment, which is genetically correlated with cognitive performance (but not with SCZ), found the candidate genes to preferentially expressed in the brain during the prenatal period (Okbay et al., 2016). The preferential expression of SCZ genes in adolescent cortex and educational attainment in prenatal period are consistentwith a close relationship between the temporal expression of the genes and the phenotypic manifestation of the genetic variants.
Methods
Data Collection
The de novo mutations analyzed in this paper were collected from 14 different studies: (1) mutations found in children with ASD and their unaffected siblings (De Rubeis et al., 2014; Iossifov et al., 2014, 2012; Neale et al., 2012; O’Roak et al., 2012; Sanders et al., 2012), (2) mutations found in SCZ patients and their unaffected siblings (Fromer et al., 2014; Girard et al., 2011; Gulsuner et al., 2013; Xu et al., 2008), (3) mutations found in ID patients (de Ligt et al., 2012; Gilissen et al., 2014; Hamdan et al., 2014; Rauch et al., 2012), and (4) mutations found in control families (Rauch et al., 2012; Xu et al., 2008). We annotated all the mutations using Annovar (Wang et al., 2010) with the Ensembl annotation, and included only protein coding mutations for further analysis. In total, we analyzed 4,481 ASD mutations, 309 ID mutations, 975 schizophrenia mutations, 1,931 mutation in unaffected siblings, and 44 mutations in control families. The mutations were grouped into LoF, missense and synonymous mutations. LoF mutations included frameshift, stop codon and splicing mutations. Missense mutations were defined as non-synonymous SNVs and indels, which were not LoF.
De novo mutations rates
The average number of de novo mutations per individualwas calculated for each disorder and each mutation category (LoF, missense and synonymous). To control for factors that influence estimates of absolute rates of de novo mutations we normalized the number of functional mutations (LoF or missense) to the number of synonymous mutations identified in each study. We then compared the normalized measure between cases and controls. We excluded from the calculation the study of De Ligt et al.(de Ligt et al., 2012), which has was partly overlapping with Gilissen et al.(Gilissen et al., 2014). In ASD, we used data from two studies that included all the samples from previous work (Iossifov et al.(Iossifov et al., 2014) and De Rubeis et al.(De Rubeis et al., 2014)). A pairwise t-test was used to compare between mutation rates across the three disorders. Treating mutations as dichotomous traits, we calculated an odds ratio for the presence of at least one de novo mutation fromeach mutation category in the disorders relative to the control and tested for differences using Fisher's exact test (fisher.test function in R). Interaction test between sex and mutation rate was performed for each mutation category by calculating the odds ratio of having at least one de novo mutation, separately for males and females. A z score was calculated by dividing the difference in the natural logarithm of the odds ratios between males and females divided by the square rootof the variance of the difference (the square root of the sum of thesquares of the separate standard errors):
The value of the z score was compared to the standard normal distribution (Altman, 2003). For each mutation type, P-values were corrected for multiple testing using FDR correction.
Overlap between conditions
To examine the overlap across different class of mutations, we used the dnenrich (Fromer et al., 2014) software to test whether genes with mutations in one disorderwere significantly enriched among the genes with same type of mutations inthe other disorders. The dnenrich permutation frameworkgenerates random set of mutations but controls for gene size and structure,sequence coverage and local trinucleotide mutation rate. The overlap of mutated genes between the different disorders was tested using 10,000 permutations. For each mutation category, we tested the overlap of the list of genes in each condition with all other conditions. Genes that were mutated more than once in a specific list were given a weight based on the number of occurrences. The program preformspermutations on the entire exome using the original mutation list and creates multiple permutated gene lists, controlling for GC content and gene length. The degree of overlap was calculated by dividing the observed numberof overlapping genes by the number of overlaps in the permutations. For each mutation type, P-values were corrected for multiple testing using FDR correction.
Analysis of constrained genes
We used levels of gene constraint and the list of genes with significant constraint from Samocha et al. (Samocha et al., 2014). We calculated themean constraint level for genes with mutations in each disorder using the constraint score calculated for missense mutations (presented as z scores). ANOVA was used to test for differences in constraint score, followed by pairwise t-test with FDR correction (pairwise.t.test function in R). We tested the enrichment of the 1,003 constrained genes among the genes mutated in each disorder relative to the control using Fisher’s exact test with FDR correction.
Protein-protein interactions
Protien-protein interaction (PPI) data was downloaded from the Mentha project (Calderone et al., 2013) (on August 24, 2015). The dataset contains an integrated PPI information with a reliability score for each interaction. We used the reliability score togenerate a weighted measure of the number of interactors of each protein by summarizing the reliability score of each of the protein interactions. We then tested the difference between the weighted average number of interactions for genes mutated in the different disorders relative the control, and across the different mutation categories. Significance was based on a Kolmogorov-Smirnov test, followed by FDR correction. Similarly, we tested for the difference between the 1,003 constrained genes and the unconstrained genes.
Variation in gene expression
We used a previously published single-cell gene expression from human brain (Darmanis et al., 2015). The data was comprised of 465 samples. We converted the raw read counts to counts per million (cpm), and then divided the samples to different cell types (Astrocytes, Endothelial, Microglia, Neuron, Oligodendrocytes and Oligodendrocyte progenitors) by clustering them using marker genes for each cell type (markers used in the original study (Darmanis et al., 2015)). We calculated the coefficient of variation (CV) for each gene across 247 neuronal cells. A Mann-Whitney-Wilcoxon test (wilcox.test function in R) was used to compare the distributions of CVs between the disorders and the control across the different mutation categories, and between constrained and unconstrained genes. P-values were adjusted by FDR correction.
Gene expression enrichment in specific brain regions
We used a previously published method (Dougherty et al., 2010), which is based on a specificity index probability (pSI) statistic-a measure of enrichment of a gene in a tissue or cell type. We used the pSI R package to calculate the enrichment P-values for gene lists in different brain regions (Amygdala, Cerebellum, Cortex, Hippocampus, Striatum, Thalamus), and across different developmental stages (early fetal, early mid fetal, late mid fetal, late fetal, neonatal-early Infancy, late Infancy, early childhood, middle-late childhood, adolescence, young adulthood). Significance of the overlap with the list of genes with de novo mutations was calculated using the dnenrich program (Fromer et al., 2014) with 10,000 permutations. An FDR procedure was used to correct for multiple regions and stages.
Gene co-expression analysis in the cortex
We used an approach that was previously applied to identify convergencein ASD-associated genes (Ben-David and Shifman, 2012; Parikshak et al., 2013;) based on weighted gene co-expression network analysis (WGCNA) (Zhang and Horvath, 2005). The approachrelies on the assumption that co-expressed genes are functionally related. By dividing the genes in the genome into modules of co-expressed genes, it is possible to test the enrichment of risk genes in each module (Ben-David and Shifman, 2012; Parikshak et al., 2013;). We used a published WGCNA that was based on gene expression from the developing human cortex (PCW 8-12 months) (Parikshak et al., 2013;). The network was comprised of 22,084 genes mapped to 18 modules. The enrichment analysis was performed only on protein coding genes (n = 15,591). For each mutation class and each disorder, we tested the enrichment in each module using dnenrich (Fromer et al., 2014) with 10,000 permutations. For the constrained genes, we tested the enrichment using Fisher’s exact test (using fisher.test function in R). P-values were corrected for multiple testing using FDR correction across all modules. In order to study the relationship between the disorders and mutation categories we calculated the Pearson correlation between the-log10 ofthe nominal P-values, which was used for hierarchical clustering (using heatmap.2 function in R).
Analysis of SCZ GWAS
We used a multistep procedure to prioritize the most likely causal genes. First, we selected a single SNP in each of the 108 loci with the most significant association signal as the most likely causal SNP. The list of the 108 most significant SNPs was extracted from Ripke et al. (Ripke et al., 2014). Second, assuming that mostregulatory variants are within a relatively short distance from the gene we included only genes overlapping 100 kb window around the most significant SNP. RefSeq genes overlapping the 100kb windows around the SNPs were downloaded from the USCS genome browser. On average, there were around two genes per window, totaling 210 RefSeq genes (including non-coding RNAs) (Table S3). Third, we prioritize the genes based on the assumption that SCZ risk genes share functional annotations and will be more functionally similar to each other relative to other genes in the list. The list of genes were tested for annotation enrichment using the ToppGene Suite (Chen et al., 2009). Two genes were removed from the list, as they are part of the nicotinic receptor cluster (CHRNA5-CHRNA3-CHRNB4). Genes that were included in at least one of the significant annotations were considered as the most likely SCZ candidate genes. Control gene lists were created by shifting the 100 kb windows by one Mb or by selecting random sets of 108 SNPs that were found to be significantly associated with other unrelated traits in GWASs. No significant enrichment was for a simulated control gene lists. Tests for enrichment of genes in brain regions or cell types was using previously published method (Dougherty et al., 2010), as described above.
Acknowledgments
We thank Jonathan Flint for his valuable comments on the manuscript. This research was supported by the National Institute for Psychobiology in Israel-founded by The Charles E. Smith Family and by the Israel Science Foundation (grant no. 688/12). Eyal Ben-David was supported by the Dennis Weatherstone Pre-doctoral Fellowship from Autism Speaks (grant no. 8595).
Financial Disclosures
All authors report no biomedical financial interests or potential conflicts of interest.