Abstract
Autism spectrum disorders (ASD) are a group of related neurodevelopmental diseases displaying significant genetic and phenotypic heterogeneity1-4. Despite recent progress in understanding ASD genetics, the nature of phenotypic heterogeneity across probands remains unclear5,6. Notably, likely gene-disrupting (LGD) de novo mutations affecting the same gene often result in substantially different intellectual quotient (IQ) phenotypes. Nevertheless, we find that truncating mutations that affect the same exon frequently lead to strikingly similar intellectual phenotypes in unrelated ASD probands. Analogous patterns are observed for two independent probands’ cohorts and several important ASD phenotypes. These results suggest that exons, rather than genes, often represent a unit of effective phenotypic impact for truncating mutations in autism. We find that phenotypic effects are likely mediated by nonsense-mediated decay (NMD) of splicing isoforms, and that autism phenotypes are usually triggered by relatively mild (15-30%) decreases in overall gene dosage. For genes with recurrent truncating mutations, predicted expression changes can be used to infer phenotypic consequences in individual ASD probands. We further demonstrate that LGD mutations in the same exon usually lead to similar expression changes across human tissues. Therefore, analogous phenotypic patterns may be also observed in other developmental genetic disorders.
In this study, we focused on severely damaging, so-called likely gene-disrupting (LGD) mutations, which include nonsense, splice site, and frameshift variants. We used genetic and phenotypic data, including exome de novo mutations and corresponding phenotypes of ASD probands7, for more than 2,500 families from the Simons Simplex Collection (SSC). De novo LGD mutations are observed at significantly higher rates in SSC probands compared to unaffected siblings8,9. This demonstrates a substantial contribution of these mutations to disease etiology in simplex ASD families, i.e. families with only a single affected child among siblings. We primarily considered in the paper the impact of de novo LGD mutations on several well-studied intellectual phenotypes: full-scale (FSIQ), nonverbal (NVIQ), and verbal (VIQ) intelligence quotients8,10,11. Notably, these scores are standardized by age and normalized across a broad range of phenotypes7.
We first investigated the variability of intellectual phenotypes associated with de novo LGD mutations in the same gene. The IQ differences between probands with mutations in the same gene were slightly smaller than differences between all pairs of probands. Specifically, the mean pairwise differences for probands with mutations in the same gene were: 28.3 for FSIQ (∼11% smaller compared to all pairs of ASD probands, Mann-Whitney U one-tail test P = 0.2), 25.7 NVIQ (∼12% smaller, P = 0.14), and 34.9 VIQ points (∼1.1% smaller, P = 0.5). We next explored whether probands with LGD mutations at similar locations within the same gene resulted, on average, in more similar phenotypes (Fig. 1). Indeed, IQ differences between probands with LGD mutations ≤ 1000 base pairs apart, for example, were significantly smaller than differences between probands with more distant mutations; ≤ 1 kbp FSIQ/NVIQ/VIQ average difference 11.5, 10.4, 20.6 points; > 1 kbp average difference 31.4, 28.6, 37.5 points (MWU one-tail test P = 0.002, 0.005, 0.01). However, across the entire range of nucleotide distances between LGD mutations, we did not observe significant correlations between IQ differences and mutation proximity (FSIQ/NVIQ/VIQ Spearman’s ρ= 0.09, 0.1, 0.03, P = 0.5, 0.4, 0.8).
To explain the observed patterns of phenotypic similarity, we next considered the exon-intron structure of target genes. Specifically, we investigated truncating mutations affecting the same exon in unrelated ASD probands; we took into account LGD mutations in the exon’s coding sequence as well as disruptions of the exon’s flanking canonical splice sites, since such splice site mutations should affect the same transcript isoforms (Supplementary Fig. 1). Interestingly, the analysis of 16 unrelated ASD probands (8 pairs) with such mutations showed that they have strikingly more similar phenotypes (Fig. 2, red bars) compared to probands with LGD mutations in the same gene (Fig. 2, dark green bars); same exon FSIQ/NVIQ/VIQ average IQ difference 8.9, 8.3, 17.3 points, same gene average difference 28.3, 25.7, 34.9 points (Mann-Whitney U one-tail test P = 0.003, 0.005, 0.016). Notably, the phenotypic similarity only extended to truncating mutations in the same exon. The average IQ differences between probands with mutations in neighboring exons were not significantly different compared to mutations in non-neighboring exons (MWU one-tail test P = 0.6, 0.18, 0.8; Supplementary Fig. 2). Because of well-known gender differences in autism susceptibility11-13, we also compared IQ differences between probands of the same gender harboring truncating mutations in the same exon (Fig. 2, orange bars) to IQ differences between probands of different genders; same gender FSIQ/NVIQ/VIQ average difference 5.4, 7.2, 12.2, different gender average difference 14.7, 10, 25.7 (MWU one-tail test P = 0.04, 0.29, 0.07). Thus, stratification by gender further decreases the phenotypic differences between probands with LGD mutations in the same exon.
We next explored the relationship between phenotypic similarity and the proximity of truncating mutations in the corresponding protein sequences. This analysis revealed that probands with LGD mutations in the same exon often had similar IQs, despite being affected by truncating mutations separated by scores to hundreds of amino acids in protein sequence (Fig. 3a; Supplementary Fig. 3). Notably, probands with LGD mutations in the same exon were more phenotypically similar than probands with LGD mutations separated by comparable amino acid distances but in different exons (FSIQ/NVIQ/VIQ distance-matched permutation test P = 0.010, 0.002, 0.018; Supplementary Fig. 4). We also investigated whether de novo mutations truncating a larger fraction of protein sequences resulted, on average, in more severe intellectual phenotypes. The analysis showed no significant correlations between the fraction of truncated protein and the severity of intellectual phenotypes (Fig. 3b); FSIQ/NVIQ/VIQ Pearson’s R = 0.05, 0.05, 0.06 (P = 0.35, 0.35, 0.28; Supplementary Fig. 5). We also did not find any significant biases in the distribution of truncating de novo mutations across protein sequences compared with the distribution of synonymous de novo mutations (Kolmogorov-Smirnov two-tail test P = 0.9; Supplementary Fig. 6). It is possible that the lack of the correlation between phenotypic impact and the fraction of truncated gene is due to the signal averaging across different proteins. Therefore, we used a paired test to investigate, for genes with recurrent mutations, whether truncating a larger fraction of the same protein leads to more severe phenotypes. This analysis also showed no significant differences (average FSIQ/NVIQ/VIQ difference -3.3, 0.24, -2.5 points; Wilcoxon signed-ranked one-tail test P = 0.90, 0.44, 0.89).
The results presented above suggest that it is the occurrence of de novo LGD mutations in the same exon, rather than simply the proximity of mutation sites in nucleotide or amino acid sequence, that leads to similar phenotypic consequences. To explain this observation, we hypothesized that truncating mutations in the same exon usually affect, due to nonsense-mediated decay (NMD)14, the expression of the same splicing isoforms. Therefore, such mutations should lead to similar functional impacts through similar effects on overall gene dosage and the expression levels of affected transcriptional isoforms. To explore this mechanistic model, we used data from the Genotype and Tissue Expression (GTEx) Consortium15,16, which collected exome sequencing and human tissue-specific gene expression data from hundreds of individuals and across multiple tissues. Using ∼4,400 LGD variants in coding regions and corresponding RNA-seq data, we compared the expression changes resulting from LGD variants in the same and different exons of the same gene (Fig. 4). For each truncating variant, we analyzed allele-specific read counts17 and then used an empirical Bayes approach to infer the effects of NMD on gene expression (see Methods). This analysis demonstrated that the average gene dosage changes were more than 7 times more similar for individuals with LGD variants in the same exon compared to individuals with LGD variants in different exons of the same gene (Fig 4a); 2.2% versus 17.3% average difference in overall gene dosage decrease (Mann-Whitney U one-tail test P 2×10-16). Moreover, by analyzing GTEx data for each tissue separately, we consistently found drastically more similar dosage changes resulting from LGD variants in the same exons (Fig. 4a).
Distinct splicing isoforms often have different functional properties18,19. Consequently, LGD variants may affect phenotypes not only through NMD-induced changes in overall gene dosage, but also by altering the expression levels of different splicing isoforms. To analyze changes in the relative expression of specific isoforms, we used GTEx variants and calculated the angular distance metric between vectors describing isoform-specific expression changes (see Methods). This analysis confirmed that changes in relative isoform expression are significantly (∼5 fold) more similar for LGD variants in the same exon compared to variants in different exons (Fig. 4b); 0.1 versus 0.46 average angular distance (Mann-Whitney U one-tail test P 2×10-16). The results were also consistent across tissues (Fig. 4b). Overall, the analyses of GTEx data demonstrate that the changes in expression due to truncating variants in the same exon are indeed substantially more similar than the changes due to variants in different exons of the same gene.
Truncating variants in highly expressed exons should lead, through NMD, to relatively larger decreases in overall gene dosage. To confirm this hypothesis, we used RNA-seq data from GTEx to quantify the relative exon expression for each exon harboring a truncating variant. To calculate relative exon expression, we normalized GTEx expression values of each exon by GTEx expression values of the corresponding gene. Indeed, we observed a strong correlation between the relative expression levels of exons harboring LGD variants and the corresponding changes in overall gene dosage (Fig. 5; Pearson’s R= 0.69, P < 2×10-16; Spearman’s ρ = 0.81, P < 2×10-16; see Methods).
Notably, NMD-induced dosage changes may mediate the relationship between the expression levels of target exons and the corresponding phenotypic effects of truncating mutations. To investigate this relationship we used the BrainSpan dataset20, which contains exon-specific expression from human brain tissues. The BrainSpan data allowed us to estimate expression dosage changes resulting from LGD mutations in different exons of ASD-associated genes (see Methods). Notably, it is likely that there is substantial variability in the sensitivity of intellectual phenotypes to dosage changes across human genes. Therefore, to quantify the IQ sensitivities for genes with recurrent truncating mutations in SSC, we considered a simple linear dosage model. Specifically, we assumed that changes in probands’ IQs are linearly proportional to decreases in gene dosage; we further assumed the average neurotypical IQ (100) for wild type gene dosage. We restricted our analysis to LGD mutations predicted to cause NMD-induced gene dosage changes, i.e. we excluded mutations within 50 bp of the last exon junction complex21. Using this model, we estimated the sensitivity of IQs to dosage changes for each gene with recurrent truncating ASD mutations (Supplementary Fig. 7; see Methods). Calculated in this way, the IQ sensitivity for a gene is equal to the estimated phenotypic effect of a truncating mutation in an exon with average expression.
The aforementioned model revealed that mutation-induced dosage changes are indeed strongly correlated with the normalized phenotypic effects; FSIQ/NVIQ/VIQ Pearson’s R = 0.56, 0.63, 0.51, permutation test P = 0.03, 0.02, 0.02; (Fig. 6a; Supplementary Fig. 8). Reassuringly, no or very weak correlations were obtained using randomly permuted data, i.e. when truncating mutations were randomly re-assigned to different exons in the same gene (average FSIQ/NVIQ/VIQ Pearson’s R = 0.11, 0.18, 0.01; SD = 0.23, 0.20, 0.21; see Methods). Since the heritability of intelligence is known to significantly increase with age22, we also investigated how the results depend on the age of probands. When we restricted our analysis to the older half of probands in SSC (median age 8.35 years), the strength of the correlations between the predicted dosage changes and normalized phenotypic consequences increased further; FSIQ/NVIQ/VIQ Pearson’s R = 0.68, 0.75, 0.60; permutation test P = 0.03, 0.019, 0.05 (Fig. 6b; Supplementary Fig. 9). The strong correlations between target exon expression and intellectual ASD phenotypes suggest that, when gene-specific effects are taken into account, a significant fraction (30%-40%) of the relative phenotypic effects of de novo LGD mutations can be explained by the resulting dosage changes in target genes.
Next, we evaluated the ability of our linear dosage model to predict the effects of LGD mutations on non-normalized IQs. For each gene with multiple truncating mutations, we used our regression model to perform leave-one-out predictions of each mutation’s effect on proband IQ scores (Fig. 6c, inset; see Methods). Notably, for LGD mutations that trigger NMD, the prediction errors of the dosage model were significantly smaller than the differences in IQ scores between probands with LGD mutations in the same gene; FSIQ/NVIQ/VIQ median prediction error 12.2, 11.0, 20.6 points; same gene median IQ difference 24.0, 22.0, 30.5 points; MWU one-tail test P = 0.019, 0.014, 0.017 (Fig. 6c; Supplementary Fig. 10). The predictions based on probands of the same gender had significantly smaller errors compared to predictions based on probands of the opposite gender, confirming functional differences in ASD genetics between genders; same gender FSIQ/NVIQ/VIQ median error 11.1, 9.1, 15.9 points; different gender median error 19.0, 19.9, 33.0 points (MWU one-tail test P = 0.03, 0.018, 0.02). Moreover, the prediction errors decreased for older probands; for example, for probands older than 12 years, median FSIQ/NVIQ/VIQ error 7.0, 7.6, 10.0 points (Fig. 6c, Supplementary Fig. 10, Supplementary Fig. 11).
Although we primarily analyzed the impact of autism mutations on intellectual phenotypes, similar dosage and isoform expression changes in affected genes may also lead to analogous patterns for other quantitative ASD phenotypes23,24. Indeed, for LGD mutations predicted to lead to NMD, we observed similar results for several other key phenotypes. Specifically, probands with truncating mutations in the same exon exhibited more similar adaptive behavior abilities compared to probands with mutations in the same gene (Fig. 7a, Supplementary Fig. 12); Vineland Adaptive Behavior Scales (VABS)25 composite standard score difference 4.7 versus 12.1 points (Mann-Whitney U one-tail test P = 0.017). In contrast, VABS differences between probands with truncating mutations in the same gene were not significantly smaller than for randomly paired probands (Fig. 7a, Supplementary Fig. 12); 12.1 versus 13.7 points (MWU one-tail test P = 0.23; Fig. 7a). Probands with truncating mutations in the same exon displayed more similar motor skills; the Purdue Pegboard Test, 1.2 versus 3.0 for the average difference in normalized tasks completed with both hands (MWU one-tail test P = 0.02; Supplementary Fig. 13; see Methods). Coordination scores in the Social Responsiveness Scale questionnaire were also more similar in probands with mutation the in the same exon; 0.6 versus 1.1 for the average difference in normalized response (MWU one-tail test P = 0.05; Supplementary Fig. 14).
Finally, we sought to validate the observed phenotypic patterns using an independent cohort of ASD probands. To that end, we analyzed an independently collected dataset from the ongoing Simons Variation in Individuals Project (VIP)26. The analyzed VIP dataset contained genetic information and VABS phenotypic scores for 41 individuals with de novo LGD mutations in 12 genes. Reassuringly, and consistent with our findings in SSC, probands from the VIP cohort with truncating de novo mutations in the same exon also exhibited strikingly more similar VABS phenotypic scores compared to probands with mutations in the same gene (Fig. 7a, Supplementary Fig. 15); VABS composite standard score difference 6.0 versus 12.4 (Mann-Whitney U one-tail test P = 0.014). Similar to the SSC cohort, LGD mutations in neighboring exons did not result in more similar behavior phenotypes; VABS composite standard score average difference 13.6 points (MWU one-tail test P = 0.6). The fraction of truncated proteins also did not show significant correlation with the VABS scores of affected probands (Pearson’s R = -0.08, P = 0.7). Overall, these results confirm the phenotypic patterns observed in the SSC cohort, indicating the generality of the reported findings.
Using VABS scores from both SSC and VIP, we next investigated whether, analogous to the IQ phenotypes (Fig. 3a), the similarity of VABS scores are primarily due to the presence of mutations in the same exon, rather than proximity of truncating mutations within the corresponding protein sequence. Indeed, LGD mutations in the same exon often resulted in similar adaptive behavior abilities even when the corresponding mutations were separated by hundreds of amino acids (Fig. 7b; Supplementary Figure 16). By comparing mutations in the same exon to mutations in different exons that are separated by similar amino acid distances, we confirmed that probands with mutations in the same exon were significantly more phenotypically similar (permutation test P = 3×10-4; Supplementary Figure 17; see Methods).
Discussion
Previous studies explored phenotypic similarity in syndromic forms of ASD due to mutations in specific genes27-31. Nevertheless, across a large collection of contributing genes, the nature of the substantial phenotypic heterogeneity in ASD remains unclear. Our study reveals several main sources of the observed heterogeneity in simplex ASD cases triggered by highly penetrant truncating mutations. There is a substantial variability in the IQ sensitivity to dosage and isoform expression changes across human genes (Supplementary Fig. 7). We also estimate that, due to the imperfect efficiency of NMD, truncating mutations usually result in relatively mild changes in gene dosage, with the average decrease in overall expression ∼15-30% (Supplementary Fig. 18; see Methods). Nevertheless, when gene-specific sensitivities are taken into account, the relative phenotypic effects are significantly correlated with expression dosage changes, which depend on the target exon expression (Fig. 6). Furthermore, even perturbations leading to similar dosage changes in the same gene may result in diverse phenotypes, if different functional isoforms are affected. When the same isoforms are perturbed, as is the case for LGD mutations in the same exon, the phenotypic diversity in unrelated probands decreases even further (Fig. 2). Overall, these results demonstrate that for truncating de novo mutations, exons, rather than genes, represent a unit of effective phenotypic impact. It is also likely that differences in genetic background and environment represent other important sources of phenotypic variability32-34. As the heritability of IQ phenotypes usually increases with age, it is reassuring that we observe a substantially higher correlation between phenotypes and gene dosage changes for older probands (Fig. 6b).
In the present study, we focused specifically on simplex cases of ASD, in which de novo LGD mutations are highly penetrant. In more diverse cohorts, individuals with LGD mutations in the same exon will likely display substantially greater phenotypic heterogeneity. For example, the Simons Variation in Individuals Project identified broad spectra of phenotypes associated with specific variants in the general population26,35-37. We also observed significantly larger phenotypic variability for probands from sequenced family trios, i.e. families without unaffected siblings (Supplementary Fig. 19). For these probands, the enrichment of de novo LGD mutations is likely to be substantially lower and the contribution from genetic background larger38, thus resulting in more pronounced phenotypic variability.
Our study may have important implications for the future of precision medicine32,39,40. From a therapeutic perspective, compensatory expression increases of undamaged alleles – defined according to mutation-specific dosage changes – may provide a general approach for treating ASD probands affected by highly penetrant LGDs. From a prognostic perspective, the results suggest that by sequencing and phenotyping sufficiently large patient cohorts harboring truncating mutations in different exons of contributing ASD genes, it may be possible to understand likely phenotypic consequences, at least for cases resulting from highly penetrant de novo LGD mutations in simplex families. Furthermore, because we observe similar patterns of expression changes across multiple human tissues, medically relevant phenotypic analyses may be also extended to other developmental disorders caused by highly penetrant truncating mutations.