ABSTRACT
Hundreds of genes are implicated as risk factors for autism spectrum disorder (ASD). However, the mechanisms through which they are associated with ASD remain unclear. Here, we analyzed transcriptomics from ASD toddlers and discovered a core gene network with dysregulated gene co-expression. The identified network includes highly expressed processes in fetal-stage brain development and is dysregulated in neuron models of ASD. We found ASD risk genes across diverse functions are upstream and regulate this core network. In particular, many risk genes impact the network through the RAS/ERK, PI3K/AKT, and WNT/β-catenin signaling pathways. Finally, the dysregulation degree of this network positively correlates with early-age ASD clinical severity. Thus, our results provide insights into how the heterogeneous genetic basis of ASD could converge on a core network with consequence on the postnatal outcome of toddlers with ASD. Deeper study into this may help decipher the molecular basis of ASD and decode the complex link between its genetic and phenotypic variation.
INTRODUCTION
Autism spectrum disorder (ASD) is a neurodevelopmental disorder with prenatal and early postnatal biological onset 1-3. Genetic factors contribute to the predisposition and development of ASD with estimated heritability rates of 50-83% 4,5. Large-scale genetic studies have implicated several hundred risk (rASD) genes that could be associated with many different pathways, cell processes, and neurodevelopmental stages 6-8. This highly heterogeneous genetic landscape has raised challenges in elucidating the biological mechanisms involved in the disorder. While rigorous proof remains lacking, current evidence suggests that rASD genes fall into networks and biological processes 6,7,9-13 that modulate one or more critical stages of prenatal and early postnatal brain development, including neuronal proliferation, migration, neurite growth, synapse formation and function 3,8. However, these insights are mostly gained from focused studies on single rASD genes (see Courchesne et al.3 for a recent review) or based on transcriptome data of non-ASD brains 9-11, leaving an incomplete picture of molecular changes at the individual level and relationships with early-age clinical heterogeneity.
To further complicate efforts to discern the molecular bases of ASD, the implicated rASD genes are largely identified through de novo loss-of-function mutations in their coding sequence. Such events account for 5-10% of the ASD population, and most of heritability is estimated to reside in common variants also seen in the typically developing population 5, 14-17. Currently, there is a paucity of data on whether ASD cases with known rASD gene mutations manifest as special subtypes of ASD with distinct molecular etiology, or whether they share mechanisms with the general ASD population.
To address these fundamental questions, it is important to understand which molecular processes are perturbed in prenatal and early postnatal life in ASD individuals, assess how they vary among subjects, and evaluate how these perturbations relate to rASD genes and early-age ASD clinical symptoms. It is expected that the genetic changes in ASD alter gene expression and signaling in the early-age developing brain 3,7,11,18. Therefore, capturing dysregulated gene expression at prenatal and early postnatal ages may help unravel the underlying molecular organization of ASD. Unfortunately, doing so is particularly challenging as ASD brain tissue cannot be obtained at these early stages, and all available postmortem ASD brains are from much older ages, well beyond the ages when rASD genes are at peak expression and the disorder begins. However, in contrast to living neurons that have a limited time window for proliferation and maturation, other cell types constantly regenerate, such as blood cells. Given the strong genetic basis of ASD, some dysregulated developmental signals may continually reoccur in blood cells and thus be studied postnatally19-21.
Reinforcing this notion, it was recently demonstrated that genes that are broadly expressed across many tissues are major contributors to the overall heritability of complex traits 22, and it was postulated that this could be relevant to ASD. Lending credence to this, previous studies have reported the enrichment of differentially expressed genes in ASD blood for the regulatory targets of CHD8 20 and FMR1 23 genes, two well-known rASD genes. Similarly, lymphoblastoid cells of ASD cases and iPS-derived models of fragile-X syndrome show over-expression of mir-181 with a potential role in the disorder 24. Likewise, leukocytes from ASD toddlers show perturbations in biological processes, such as cell proliferation, differentiation, and microtubules 25-29, and these coincide with dysregulated processes seen in neural progenitor cells (NPCs) and neurons, derived from iPS cells from ASD subjects 30,31. Ultimately, establishing the signatures of ASD in other tissues will be important to facilitate the study of the molecular basis of the disorder in living ASD subjects in the first years of life.
Here we leverage transcriptomic data from leukocytes, stem cell models, and the developing brain to study the underlying architecture of transcriptional dysregulation in ASD, its connection to rASD genes, and its association with prenatal development and clinical outcomes of ASD toddlers. Specifically, we discovered a conserved dysregulated gene network by analyzing leukocyte transcriptomic data from 1-4 years old ASD and typically developing (TD) toddlers. The dysregulated network is enriched for pathways known to be perturbed in ASD neurons, impacts highly expressed processes in prenatal brain development, and is dysregulated in iPS cell-derived neurons from ASD cases. Consistent with the postulated structure of complex traits 22,32, we show that rASD genes across diverse functional groups converge upon and regulate this core network. Importantly, this core network is disrupted to different levels of severity across ASD individuals, and is correlated with clinical severity in individual ASD toddlers. Thus, our results demonstrate how the heterogeneous genetic basis of ASD converges on a biologically relevant core network, capturing the underlying possible molecular etiology of ASD.
RESULTS
Leukocytes display transcriptome over-activity in ASD male toddlers
To identify the unique transcriptional response of ASD subjects, we analyzed 253 leukocyte gene expression profiles obtained from 226 male toddlers (119 ASD and 107 TD, Table S1). Robust linear regression modeling of the data identified 1236 unique differentially expressed (DE) genes (437 downregulated and 799 upregulated; FDR < 0.05). Jack-knife resampling demonstrated that the expression pattern of DE genes was not driven by a small number of cases, but rather shared with the vast majority of ASD subjects (Fig S1). The expression patterns were validated in a replicate dataset of 56 randomly resampled toddlers. We further confirmed the expression patterns of DE genes on another partially independent and one entirely independent cohort (Fig S1-S4).
We employed a systems approach to decipher how the transcriptional perturbations in leukocytes of ASD toddlers are organized in gene networks (Fig 1.a). We reasoned that ASD associated interactome rewiring is most pronounced in networks of DE genes. To identify such rewiring, we first extracted a static network (that is, the network is indifferent to the cell context) composed of high confidence physical and regulatory interactions among DE genes, as obtained from multiple databases (Methods). We next pruned the static network using our leukocyte transcriptome data to obtain context-specific networks of each study group separately (that is, the networks differ based on their cognate gene expression data). The context specific network of each study group was obtained by only retaining interactions in the static network that were significantly co-expressed within that group with FDR <0.05. To ensure the robustness of our conclusions, we replicated all presented results on two other networks with different numbers of genes and interactions obtained from additional resources (Methods).
The context-specific networks (DE-ASD and DE-TD) include published physical and regulatory interactions among DE genes that exhibit within-group co-expression in our data. DE-ASD and DE-TD networks are composed of a similar set of genes (i.e., those expressed in the leukocytes that are differentially expressed between ASD and TD samples), but the wiring of the two networks differ based on the co-expression patterns within each study group. To assess the possibility that intracellular pathways were being specifically modulated in ASD, we created a merged network by considering the union of interactions in the DE-ASD and DE-TD networks. We next examined the co-expression strength of the merged network in ASD and TD individuals (Methods) 33-35. This proxy for the transcriptional activity of gene networks 9 demonstrated that co-expression strength was higher in the ASD than the TD samples (Fig 1.b; p-value < 0.01; paired Wilcoxon-Mann-Whitney test). The stronger co-expression that is driven by the DE-ASD network, suggests a higher level of concerted activation or suppression of pathways involving DE genes in ASD toddlers. This elevated co-expression activity (herein referred to as over-activity) of the network was reproducible in the other two ASD datasets and replicable across alternative analysis methods (Fig S1-S4).
In summary, the leukocyte transcriptional networks of the DE genes show higher than normal co-expression activity in ASD toddlers. Moreover, the dysregulation pattern is present in a large percentage of ASD toddlers, as evidenced by the resampling analyses and the other two ASD datasets.
The leukocyte-based gene network captures transcriptional programs of brain development
We next assessed the potential association of the leukocyte-based network to the spatiotemporal neurodevelopmental signals relevant to ASD. By overlaying our network on the in vivo neurodevelopmental RNA-Seq transcriptome data from BrainSpan36,37, we found that the DE-ASD network was enriched for genes that are strongly expressed in the neocortex at prenatal and early postnatal periods (p-value <4.3×10−30; Fig 1.c).
To investigate the spatiotemporal activity pattern during brain development, we measured the co-expression strength of interactions in the leukocyte-based network at different neurodevelopmental time windows across brain regions using BrainSpan. We found that the highest co-expression activity of the DE-ASD network temporally coincided with peak neural proliferation in brain development (10-19 post conception weeks 3,8) across the brain and then decreased in activity at later time points (Fig 1.d). Further supporting the transcriptional activity of the leukocyte-derived network in prenatal brain, we found evidence that the DE-ASD network is preserved at the co-expression level between ASD leukocytes and prenatal brain (Fig 1.e).
Networks of rASD genes are associated with the DE-ASD network
We next analyzed the DE-ASD network in the context of other studies to explore the relevance of our leukocyte-based signature to neocortical development. Parikshak et al. previously reported gene co-expression modules that are responsive to the developmental trajectories of cortical laminae during prenatal and early postnatal ages 10. A subset of these modules show enrichment in rASD genes 10. We examined the overlap of our leukocyte-derived network with all modules from Parikshak et al10. The DE-ASD network preferentially overlapped with rASD gene-enriched modules from that study (Fig 1.f; Table S2). This suggests that our DE-ASD network is functionally related to rASD genes during neocortical development. We confirmed the significant overlap of our DE-ASD network with the networks of rASD genes reported in two other studies 7,9, indicating the robustness of the results (Fig 1.f). Intriguingly, the prenatal brain co-expression network of high confidence rASD genes was more similar to that of ASD leukocytes than TD leukocytes (Fig 1.g), suggesting that neurodevelopmental transcriptional programs related to rASD genes might be more represented in the ASD leukocyte transcriptome than TD samples.
With the observed overlap patterns, we next tested for enrichment of rASD genes in our DE-ASD network. For this analysis, we assessed the overlap of DE-ASD network with different rASD gene lists of different size and varying confidence levels. Surprisingly, this analysis demonstrated that rASD genes are not enriched in the DE-ASD network (p-value >0.19; Methods).
The DE-ASD network is enriched for the regulatory targets of rASD genes
Many high confidence rASD genes have regulatory functions 3,7,11,18. Although the perturbed DE-ASD network is not enriched for rASD genes, it overlaps with co-expression modules and networks of known rASD genes. At the mechanistic level, the observed co-expression of rASD and DE genes in the prenatal brain could be due to the regulatory influence of rASD genes on the DE-ASD network, and thereby mutations in rASD genes could cause the network over-activity and brain maldevelopment in ASD.
To elucidate if rASD genes could regulate the DE-ASD network, we examined if the regulatory targets of rASD genes are enriched in the DE-ASD network. Indeed, we observed that the DE-ASD network is enriched for genes regulated by two high confidence rASD genes, CHD8 38-40 and FMR1 41(Fig2.a). To more systematically identify regulators of the network, we evaluated the overlap of the DE-ASD network with the regulatory targets from 845 assays in the ENCODE project 42 and 615 manually curated assays in Chea201643. Strikingly, we observed DE-ASD network is significantly enriched for 11 out of 20 high confidence and suggestive confidence rASD genes (OR: 2.54; p-value: 0.05; Fig 2.b; Table S3).
The DE-ASD network is preferentially linked to high confidence rASD genes
The rASD genes were often not differentially expressed in ASD leukocytes, and therefore the DE-ASD network was not enriched in rASD genes. However, to explore if rASD genes could regulate the DE-ASD network, we expanded the DE-ASD network by including rASD genes. Thus, we obtained an expanded-ASD, XP-ASD, network (Table S4). To construct XP-ASD network, we used a similar approach to that of the DE-ASD network. We first curated a high confidence static network of DE and 965 speculated rASD genes. The context-specific XP-ASD network was next inferred by retaining only the significantly co-expressed interacting pairs in ASD samples. This pruning step results in removal of genes from the static network that do not show significant co-expression patterns with their known partners or regulatory targets in ASD leukocytes. Accordingly, the XP-ASD network included a total of 316 out of 965 (36%) likely rASD genes.
Our list of 965 rASD genes included rASD genes of both high confidence (e.g., recurrently mutated in ASD individuals) and low confidence (some even found in typical siblings of ASD individuals). We reasoned that if the XP-ASD network is truly relevant to the prenatal etiology of ASD, a preferential incorporation of high confidence rASD genes would be expected in the leukocyte-derived XP-ASD network. By following different analytical methods, different groups have separately categorized rASD genes into high and low confidence 7,16,44. Importantly, we found a reproducible enrichment of high confidence rASD genes in the XP-ASD network (Fig 2.c) with a significant enrichment for strong evidence rASD genes with de novo protein truncating variants in ASD subjects (hypergeometric p-value <3.06×10−6). Further corroborating the regulatory role of rASD genes on DE-ASD network, we found a significant enrichment of rASD genes with DNA binding activities in the XP-ASD network (OR: 3.1; p-value <2.1×10−12; Fig S7). Furthermore, the XP-ASD network was not enriched for rASD genes classified as low confidence (p-value >0.24). As negative controls, we constructed two other networks by including genes with likely deleterious and synonymous mutations in siblings of ASD individuals. Consistent with a possible role of XP-ASD networks in ASD, we found these negative control genes are not significantly associated with the DE genes (p-values >0.41; Fig 2.c). The preferential addition of high confidence and regulatory rASD genes supports the relevance of the XP-ASD network to the pathobiology of ASD, and the likelihood that the high confidence rASD genes are regulating the DE-ASD network.
rASD genes show potential suppressing effects on the DE-ASD network
To explore the regulatory effect of the rASD genes on the DE genes, we analyzed their interaction types (i.e., positive or negative correlations, alluding to activator or repressor activity). Comparative analysis of DE- and XP-ASD networks indicated a significant enrichment of negative correlations between rASD and DE genes (p-value <3.1×10−4; Fisher’s exact test), suggesting more of an inhibitory role of rASD genes on the DE genes (Fig 3.a).
Supporting the inhibitory role of rASD genes, the DE-ASD network was enriched for genes that were up-regulated by the knock-down of CHD8 in neural progenitor and stem cells; but not for those that were down-regulated (Fig 3.b) 38-40. Consistently, we observed in our dataset an overall up-regulation of genes that are also up-regulated in knock-down experiments of the transcriptional repressor CHD8 (p-value <0.039 across three different studies; GSEA), but not for those that are down-regulated. We observed a similar up-regulation pattern for the binding targets of the FMR1 rASD gene in the ASD transcriptome (p-value: 0.078; GSEA). The potential inhibitory role of rASD genes on the DE-ASD network was further supported in an independent dataset on neural differentiation. Specifically, we observed an anti-correlated expression pattern between the rASD and the DE genes from the XP-ASD network in in vitro-differentiated human neural progenitors (Fig 3.c).
Signaling pathways are central to the leukocyte-based networks
We next identified key pathways involved in the XP-ASD and DE-ASD networks. Biological process enrichment analysis of the XP-ASD network demonstrated it is highly enriched for signaling pathways (Fig 4.a; Table S5). Moreover, the DE-ASD network was highly enriched for PI3K/AKT, mTOR, and related pathways (Fig 4.b). To delineate mechanisms by which rASD genes could dysregulate DE genes, we compared enriched biological processes of DE and rASD genes involved in the XP-ASD network. DE genes were more enriched for cell proliferation related processes, particularly PI3K/AKT and its downstream pathways such as mTOR, autophagy, viral translation, and FC receptor signaling (Fig 4.a-b). However, the rASD genes were better enriched for processes involved in neuron differentiation and maturation including neurogenesis, dendrite development and synapse assembly (Fig 4.a).
Our results suggest up-regulation and elevated co-expression activity of PI3K/AKT and its down-stream pathways in ASD leukocytes (Fig 4.a-b). These processes are involved in brain development and growth during prenatal and early postnatal ages 3,45,46 and focused studies on rASD genes have implicated them in ASD 3,8,47,48. Further supporting the over-activity of the PI3K/AKT and its down-stream pathways in our cohort of ASD toddlers, gene set enrichment analysis demonstrated genes involved in PI3K/AKT signaling, mTOR pathway and the targets of the FOXO1 transcriptional repressor (the two main downstream processes of the PI3K/AKT) are altered in ASD leukocytes in directions that are consistent with PI3K/AKT over-activity (Supplementary Notes).
We further investigated the DE-ASD and XP-ASD networks using an integrated hub analysis approach (Methods). In the DE-ASD network, 63% of hub genes were involved in or regulated by the PI3K/AKT pathway including PIK3CD, AKT1 and GSK3B (Fig 4.c). The PI3K/AKT pathway is known to be active in the prenatal brain and involve in neural cell proliferation and maturation3. Consistent with a potential regulatory role of rASD genes on the DE-ASD networks, genes that were only hubs in the XP-ASD network were highly enriched for the regulatory genes associated with neuronal proliferation and maturation, including regulatory members of the RAS/ERK (e.g., NRAS, ERK2, ERK1, SHC1), PI3K/AKT (e.g., PTEN, PIK3R1, EP300), and WNT/β-catenin (e.g., CTNNB1, SMARCC2, CSNK1G2) pathways (Fig 4.c; Table S6-S7). While PI3K/AKT (a hub in DE-ASD and XP-ASD networks) promotes proliferation and survival, the ERK pathway (a hub in the XP-ASD network) can trigger differentiation of neural progenitor cells by mediating PI3K/AKT associated signaling pathways3, 49-51.
rASD genes regulate DE-ASD genes through specific signaling pathways
We further explored if perturbation to the rASD genes lead to the perturbation of the DE-ASD network through changes in the RAS/ERK, PI3K/AKT, and WNT/β-catenin pathways. To assess this, we leveraged genome-wide mutational screening data in which gene mutations were scored based on their effects on the activity of the RAS/ERK, PI3K/AKT, and WNT/β-catenin signaling pathways 52. The activity of the signaling pathways was directly measured based on the phosphorylation state of ERK, AKT, and β-catenin proteins 52. Consistent with functional enrichment and hub analysis results, we found that rASD genes in the XP-ASD network are significantly enriched for regulators of RAS/ERK, PI3K/AKT, and WNT/β-catenin pathways (Fig 4.d; p-value <1.9×10−10; Wilcoxon-Mann-Whitney test). Specifically, regulators of these pathways (FDR<0.1) accounted for inclusion of 39% rASD genes in the XP-ASD. As the control, no significant enrichment for the regulators of RAS/ERK, PI3K/AKT, and WNT/β-catenin pathways were observed among rASD genes that were not included in the XP-ASD network (Fig 4.d). These results support the regulatory role of rASD genes on the DE-ASD network through perturbation of RAS/ERK, PI3K/AKT, and WNT/β-catenin signaling pathways.
In summary, our XP-ASD network decomposition suggests a modular regulatory structure for the XP-ASD network in which diverse rASD genes converge upon and dysregulate activity of the DE genes (Fig 4.a). Importantly, for a large percentage of rASD genes, the dysregulation flow to the DE genes is canalized through highly inter-connected signaling pathways including RAS/ERK, PI3K/AKT, and WNT/β-catenin.
The DE-ASD network is over-active in neuron models of ASD
Our results demonstrate the presence of an over-active network in leukocytes of living ASD toddlers. Furthermore, they implicate the over-activity of the DE-ASD network in the prenatal etiology of ASD by demonstrating the activity of the perturbed network during brain development and its associations with high confidence rASD genes. Also, our results suggest that the network over-activity signal is present in a large percentage of our ASD toddlers and is associated with neural proliferation and maturation.
To further validate these results, we first examined if the DE-ASD network is over-active in iPS cell-derived neural progenitors and neurons of ASD toddlers compared to those of TD cases. For this, we analyzed the transcriptomes of iPS cells from 8 ASD individuals with macrocephaly and 6 TD individuals 30, which were differentiated into neural progenitor and neuron stages. Analysis of the DE-ASD at neural stages demonstrated that the network is over-active in these ASD neuron models (Fig 5), suggesting the functional relevance of identified leukocyte molecular signatures to the abnormal ASD brain development.
Network dysregulation is associated with ASD severity
We evaluated the potential role of the DE-ASD network activity on the development of early-age ASD symptoms. For this, we first tested if the same gene dysregulation patterns exist across individuals at different levels of ASD severity. Indeed, we observed that the fold change patterns of DE genes are almost identical across different ASD severity levels (Fig S11). The implicated RAS/ERK, PI3K/AKT, WNT and β-catenin pathways in our model are well known to have pleotropic roles during brain development from neural proliferation and neurogenesis to neural migration and maturation with implications in ASD 3, suggesting the DE-ASD network is involved in various neurodevelopmental related processes. At the mechanistic level, this suggests that the spectrum of autism could be mediated through the extent of dysregulation of the DE-ASD network, as it is composed of high confidence physical and regulatory interactions. Hence, we examined whether the magnitude of the co-expression activity level of the DE-ASD network correlated with clinical severity across individual ASD toddlers. Indeed, we found that the extent of gene co-expression activity within the DE-ASD network was correlated with ASD toddlers’ ADOS social affect deficit scores, the ASD diagnostic gold standard (Fig 6). To assess the significance of observed correlation patterns, we repeated the analysis with 10,000 permutations of the ADOS social affect scores of ASD individuals (see inset boxplots in Fig 6). This analysis demonstrated the significance of the observed correlations (Fig 6). Our results suggest the perturbation of the same network at different extents can potentially result in a spectrum of postnatal clinical severity levels in ASD toddlers.
Conclusion
While ASD demonstrates a strong genetic basis, it remains elusive how implicated genes are connected to the molecular dysregulations that underlie the disorder at prenatal and early postnatal ages. Towards this, we developed a systems biology framework that integrates transcriptomic dysregulations in living ASD toddlers with current knowledge on ASD risk genes to explain ASD associated fetal-stage brain transcriptomic changes and clinical outcomes. Specifically, we found a dysregulated transcriptional network that shows elevated gene co-expression activity in ASD toddlers. This core network was robustly associated with rASD genes with likely deleterious mutations in ASD subjects. Such rASD genes have potentially large effect sizes on the etiology but occur in a small percentage of the ASD population 53,54. We show that many rASD genes may exert their regulatory effect on this DE-ASD core network through the inter-connected RAS/ERK, PI3K/AKT, and WNT/β-catenin signaling pathways. The connection of the DE-ASD network (constructed with data from the general ASD pediatric population) with high confidence rASD genes provides empirical evidence of shared mechanisms underlying ASD in both those with highly penetrant rASD genes and those of other etiologies (e.g., common variants) in the wider ASD population.
The key aspect of our signature is that it is constructed based on transcriptomic data from young living ASD toddlers. This allows us to correlate its variations with the core clinical features of the same ASD toddlers. Indeed, the dysregulation degree of the DE-ASD network correlated with deficits in the toddlers’ ADOS social affect scores. Social and behavioral deficits are also suggested to be correlated with the genetic variations in ASD subjects 55,56; and previous studies have established the effect of the PI3K/AKT signaling pathway (central to the DE-ASD core network and significantly altered in ASD leukocytes) on social behaviors of mouse models 47,48. Together, these observations suggest that the etiological roots of ASD converge on gene networks that correlate with the symptom severity in ASD individuals. Moreover, our results reinforce the hypothesis that stronger dysregulation of the same core network could lead to higher severity in the ASD cases. The DE-ASD core network is enriched for pathways implicated in ASD, strongly associated with high confidence rASD genes, and correlate with ASD severity. However, we note that a direct causal relationship between the co-expression activity of the network and ASD remains to be established. Moreover, our network co-expression activity measure is a summary score from the strongest signal in our dataset (i.e., differentially expressed genes) at a group level (i.e., severity level). Therefore, by design, it may not capture the heterogeneity that could exist within each group. As detailed below, future work is needed to explore the causal relationship of our gene network to ASD development, symptoms, and the potential existence of other dysregulation mechanisms in ASD individuals.
The emerging architecture of complex traits suggests that gene mutations often propagate their effects through regulatory networks and converge on core pathways relevant to the trait 22,32. Our findings support the existence of an analogous architecture for ASD, wherein rASD genes with diverse biological roles overlap in their down-stream function. Although not significantly overlapping with rASD genes, we found that the DE-ASD network is significantly co-expressed with rASD genes in both leukocyte and brain. We also illustrated that the DE-ASD network could be controlled by rASD genes through direct transcriptional regulation or highly interconnected signaling pathways. We postulate that the DE-ASD network is a primary convergence point of ASD etiologies, including its genetic basis as we elaborated for rASD genes, in a large portion of the ASD population. This predicts that the spectrum of autism in such cases is correlated with the degree and mechanism of the perturbation of the DE-ASD network. A detailed analysis of iPS cell-derived ASD neurons demonstrated the dysregulation of the leukocyte-based DE-ASD network in ASD neurons, supporting the neural-level relevance of the findings to ASD etiology and its prevalence in the ASD population. Furthermore, direct clinical-level relevance is demonstrated by the high correlation we found between degree of dysregulation in the DE-ASD core network and ASD symptom severity in the ASD toddlers.
The currently recognized rASD genes are not fully penetrant to the disorder, except for a handful of syndromic genes 53,54,57,58. Our analysis of the XP-ASD network provides some insights on how the effects of rASD genes could potentially combine to result in ASD. Although some rASD genes could directly modulate the DE-ASD network at the transcriptional level, our results suggest that the regulatory consequence of many rASD genes on the DE-ASD network is canalized through the PI3K/AKT, RAS/ERK, WNT and β-catenin signaling pathways. The structural and functional interrogation of the DE-ASD network localized the PI3K/AKT pathway to its epicenter and demonstrated enrichment for processes down-stream of this pathway. Moreover, we found that high confidence rASD genes are better connected to the DE-ASD core network, suggesting that the closeness and influence of genes on these signaling pathways is correlated with their effect size on the disorder. These results articulate that perturbation of the PI3K/AKT, RAS/ERK, WNT and β-catenin signaling pathways through gene regulatory networks may be an important etiological route for ASD that could be associated with the disorder severity level in a relatively large fraction of the ASD population. Congruent with this hypothesis, cell and animal models of ASD have demonstrated the enrichment of high confidence rASD genes for the regulators of the RAS/ERK, PI3K/AKT, WNT and β-catenin signaling pathways 3,8,11,18,47,48,51. These signaling pathways are highly conserved and pleiotropic, impacting multiple prenatal and early postnatal neural development stages from proliferation/differentiation to synaptic and neural circuit development 3. Such multi-functionalities could be the underlying reason that we detected the signal in ASD leukocytes.
It is necessary to analyze large subject cohorts from unbiased, general pediatric community settings to capture the heterogeneity that underlies ASD at early ages. This study presents the largest transcriptome analysis on early-age ASD cases thus far from such settings. However, the analyzed dataset is still of a modest size, and as such our analysis was focused on the strongest signal that best differentiates ASD cases from TD individuals (i.e., differentially expressed genes). Here we illustrate that the captured signal is informative about the transcriptional organization of ASD and shows promise in bridging the gap between genetic and clinical outcomes. Future studies with larger datasets are required to not only replicate these results, but also explore other long-standing questions in the field, such as the basis of gender bias that exists in ASD or the potential molecular mechanisms that differentiate high functioning from low functioning cases. However, perhaps the most exciting direction is to expand the presented framework to systematically diagnose, classify and prognostically stratify ASD cases at early postnatal ages based on the underlying molecular mechanisms. The concept of precision molecular medicine for ASD can only be actualized via approaches that illuminate the early-age living biology of ASD 3,18,21. ASD toddler-derived iPS cell studies show ASD is a progressive prenatal and early postnatal disorder that involves a cascade of diverse and varying molecular and cellular changes such as those resulting from dysregulation of the pathways and networks highlighted herein 3,30,31. As such, dynamic, individual-based molecular assays in infants and toddlers will be essential to develop. The presented framework could prove invaluable for the development of quantitative, molecular-based measures for the ASD diagnosis and prognosis by identifying specific molecular dysregulations that we show are observable in leukocytes of a large fraction of living ASD toddlers at young ages.
Materials and Methods
Participant recruitment and clinical evaluation
The primary aim of this study was to associate the transcriptome dysregulations present in ASD leukocytes with the ASD risk genes. However, the currently available genetic information is mostly based on males, and less is known about the genetic basis of ASD females. Therefore, we focused on male toddlers for the transcriptome analysis in this study, specifically 264 male toddlers with the age range of 1 to 4 years. Part of the transcriptome data of this study (153 individuals) was reported previously 21,59 and a similar methodology was employed for participant recruitment and sample collection from 111 new cases 21. Research procedures were approved by the Institutional Review Board of the University of California, San Diego. Parents of subjects underwent Informed Consent Procedures with a psychologist or study coordinator at the time of their child’s enrollment.
About 70% of toddlers were recruited from the general population as young as 12 months using an early detection strategy called the 1-Year Well-Baby Check-Up Approach 60. Using this approach, toddlers who failed a broadband screen, the CSBS IT Checklist 61, at well-baby visits in the general pediatric community settings were referred to our Center for a comprehensive evaluation. The remainder of the sample was obtained by general community referrals. All toddlers received a battery of standardized psychometric tests by highly experienced Ph.D. level psychologists including the Autism Diagnostic Observation Schedule (ADOS; Module T, 1 or 2), the Mullen Scales of Early Learning and the Vineland Adaptive Behavior Scales. Testing sessions routinely lasted 4 hours and occurred across 2 separate days. Toddlers younger than 36 months in age at the time of initial clinical evaluation were followed longitudinally approximately every 9 months until a final diagnosis was determined at age 2-4 years. For analysis purposes, toddlers (median age, 27 months) were categorized into two groups based on their final diagnosis assessment: 1) ASD: subjects with ASD diagnosis or ASD features; 2) TD: typically developing (TD) controls. For more information see Table S1.
ADOS scores at each toddler’s final visit were used for correlation analyses with DE-ASD network activity scores. All but 4 toddlers were tracked and diagnosed using the appropriate module of the ADOS (i.e., Toddler, 1, or 2) between the ages of 24-49 months (Table S1), an age where the diagnosis of ASD is relatively stable 62-64; the remaining 4 toddlers had their final diagnostic evaluation between the ages of 18 to 24 months.
Blood sample collection and microarray gene expression processing
Blood samples were usually taken at the end of the clinical evaluation sessions. In order to monitor health status, the temperature of each toddler was monitored using an ear digital thermometer immediately preceding the blood draw. The blood draw was scheduled for a different day in cases that the temperature was higher than 99 Fahrenheit. Moreover, blood draw was not taken if a toddler had some illness (e.g., cold or flu), as observed or stated by parents. We collected four to six milliliters of blood into ethylenediaminetetraacetic-coated tubes from all toddlers. Blood leukocytes were captured and stabilized by LeukoLOCK filters (Ambion) and were immediately placed in a −20°C freezer. Total RNA was extracted following standard procedures and manufacturer’s instructions (Ambion).
RNA labeling, hybridization, and scanning was conducted at Scripps Genomic Medicine center, (CA, USA) using Illumina BeadChip technology. All arrays were scanned with the Illumina BeadArray Reader and read into Illumina GenomeStudio software (version 1.1.1). Raw Illumina probe intensities were converted to expression values using the lumi package65. We employed a three-step procedure to filter for probes with reliable expression levels. First, we only retained probes that met the detection p-value <0.05 cut-off threshold in at least 3 samples. Second, we required the probes to have expression levels above 95 percentile of negative probes in at least 50% of samples. The probes with detection p-value >0.1 across all samples were selected as negative probes and their expression levels were pooled together to estimate the 95 percentile expression level. Third, for genes represented by multiple probes, we considered the probe with highest mean expression level across our dataset, after quantile normalization of the data. These criteria led to the selection of 14,854 protein coding genes as expressed in our leukocyte transcriptome data, which is similar to the previously reported estimate of 14,631 protein coding genes (chosen based on Entrez Ids) for whole blood by GTex consortium66. To ensure results are not affected by the variations in the procedure of selecting expressed genes, we replicated all of our analyses (redoing DE analysis and reconstructing HC DE and XP networks) by choosing 13,032 protein coding genes as expressed (Fig S14).
Data processing and differential gene expression analysis of microarray datasets
We subdivided our microarray samples into three datasets to assess the reproducibility of the results. The primary dataset included 253 high quality samples and was used for the discovery of the dysregulation signal. The second dataset replicated 56 randomly selected male toddlers from the primary dataset (35 ASD and 21 TD). The third dataset was composed of 48 male toddlers with 24 independent, non-overlapping ASD cases, while 21 out of 24 TD cases overlapped with the primary dataset. The second and third datasets were microarrays generated at the same time, but included different subjects not in the primary dataset. All three datasets used Illumina microarray technology. However, the primary dataset was analyzed by Illumina HT-12 Chips, while the second and third datasets used Illumina WG-6 Chips. The pre-processing and downstream analysis of the three datasets were conducted separately. The data are available in the Gene Expression Omnibus database (GSE42133;GSE111175).
The primary dataset was originally composed of 275 samples from 240 male ASD and TD individuals. Quality control analysis was performed to identify and remove 22 outlier samples from the dataset. Samples were marked as outlier if they showed low signal intensity of the microarray (average signal of two standard deviations lower than the overall mean), deviant pairwise correlations, deviant cumulative distributions, deviant multi-dimensional scaling plots, or poor hierarchical clustering, as described elsewhere 20. After removing low quality samples, the primary dataset had 253 samples from 226 male toddlers including 27 technical replicates. High reproducibility was observed across technical replicates (mean Spearman correlation of 0.917 and median of 0.925). We randomly removed one of each of two technical replicates from the dataset.
The limma package 67 was then applied on quantile normalized data for differential expression analysis in which moderated t-statistics was calculate by robust empirical Bayes methods68. Sample batch was used as a categorical covariate (total of two batches; both Illumina HT-12 platforms). Exploration graphs indicated that linear modeling of batch covariate was effective at removing its influence on expression values (Fig S13). MA-plots of the primary dataset did not show existence of overall bias in the fold change estimates (Fig S1). DE analysis identified 1236 differentially expressed genes with Benjamini-Hochberg FDR <0.05.
We performed multiple analyses to confirm that our results (1) are replicable in the other two microarray datasets, (2) are robust to alterations in the analysis pipeline, (3) are not affected by the batches or potential hidden covariates, (4) are present in the vast majority of samples, and (5) are not driven by changes in the blood cell type composition between ASD and TD toddlers (Figs S1-S4).
Reproducibility of transcriptional over-activity of DE-ASD networks in an independent RNA-Seq dataset
We performed RNA-Seq experiments on 56 samples from an independent cohort of 12 (19 samples) TD and 23 (37 samples) ASD male toddlers. None of subjects overlapped with those in the primary dataset. This allowed us to ensure our results are not subject nor platform (i.e., microarray vs. RNA-Seq) specific.
RNA-Seq libraries were sequenced at the UCSD IGM genomics core on a HiSeq 4000. We processed the raw RNA-Seq data with our pipeline that starts with quality control with FastQC69. Low quality bases and adapters were removed using trimmomatic70. Reads were aligned to the genome using STAR71. STAR results were processed using Samtools72, and transcript quantification is done with HTseqcount73. Subsequently, low expressed genes were removed and data were log count per million (cpm) normalized (with prior read count of 1) using limma67. We performed SVA analysis74 on the normalized expression data and included the first surrogate variable as covariate to account for potential hidden confounding variables. Differential expression analysis was performed using Limma package with subjects modeled as random effects.
ASD risk genes
ASD risk genes were extracted from the SFARI database 44 on Dec. 7, 2016. We also included the reported risk genes from a recent meta-analysis of two large-scale genetic studies, containing genes mutated in ASD individuals but not present in Exome Aggregation Consortium database (ExAC)16. Together, these two resources provided 965 likely rASD genes that were used for the construction of XP-ASD networks (Table S8). Previously published genes with likely gene damaging and synonymous mutations in ASD siblings were retrieved from Iossifov et al.15.
ASD high confidence risk genes were extracted from the SFARI database (genes with confidence levels of 1 and 2), Kosmicki et al. 16 (recurrent gene mutations in ASD individuals, but not present in ExAC database), Sanders et al.17, and Chang et al. 7. Strong evidence genes with de novo protein truncating variants in ASD subjects were extracted from Kosmicki et al.16 and included rASD genes that were not in ExAC database and with a probability of loss-of-function intolerance (pLI) score of above 0.9. Gene names in these datasets were converted to Entrez gene ids using DAVID tools 75.
To assess the overlap of DE-ASD networks with rASD genes, we considered our list of all rASD genes (965 genes), different lists of high confidence rASD genes (varying in size and composition) and their combinations, including all SFARI rASD genes, SFARI genes levels 1-to-3, SFARI genes levels 1 and 2, strong evidence rASD genes from Kosmicki et al.16, and strong evidence rASD genes from Sanders et al.17
Functional characterization of DE-ASD networks
We set two criteria to identify biological processes that are differentially expressed between ASD and TD samples and are enriched in the DE-ASD networks. First, we required the biological process to be significantly changed between ASD and TD transcriptome samples. Second, we required the biological process to be significantly enriched in the DE-ASD networks.
GSEA identified multiple gene sets that were significantly upregulated in ASD samples (FDR <0.12; Table S9), using the R version of the GSEA package and msigdb.v5.1 database (downloaded on Oct. 20, 2016) 76,77. Significantly enriched processes in the DE-ASD networks were identified by examined the overlap of GSEA-identified significantly altered gene sets with the DE-ASD networks based on empirical permutation tests, and p-values were corrected for multiple testing using Benjamini-Hochberg procedure. We excluded gene sets annotated as associated with specific reference datasets in MSigDB since their generalizability to our dataset has not been established (Table S9).
Biological enrichment analysis of XP-ASD networks
Significantly enriched Gene Ontology biological processes (GO-BP) were identified by Fisher’s exact test on terms with the 10-2000 annotated genes. The terms with Benjamini-Hochberg estimated FDR <0.1 were deemed as significant. The enriched terms were next clustered based on the GO-BP tree, extracted from Amigo database using RamiGO package in R 78. The general terms with more than 1000 annotated genes that spanned two or more clusters were removed. The list of enriched GO-BP terms and their clustering are provided in Table S5.
Deciphering potential regulators of DE-ASD networks
To identify genes that potentially regulate DE-ASD networks, we examined the overlap of DE-ASD networks with identified targets of human transcription factors as part of ENCODE project42 and curated Chea2016 database43. We performed overlap analysis with each of the three DE-ASD networks separately using the EnrichR portal. Some of the transcription factors were assayed multiple times. To obviate potential biases, we used Fisher’s method to combine the enrichment p-values across assays related to a given transcription factor during the analysis of each DE-ASD networks. Next, p-values were corrected using the Benjamini-Hochberg procedure. Only transcription factors whose targets were significantly enriched in all three DE-ASD networks were considered as significantly overlapping with the DE-ASD networks (FDR <0.1).
Brain developmental gene expression data
Normalized RNA-Seq transcriptome data during human neurodevelopmental time periods were downloaded from the BrainSpan database on Dec. 20, 201636,37. To calculate correlations, normalized RPKM gene expression values were log2(x+1) transformed.
Neural progenitor differentiation data
Microarray transcriptome data from differentiation of primary human neural progenitor cells to neural cells 79 were downloaded from the NCBI GEO database (GSE57595). The data were already quantile normalized and ComBat batch-corrected 80. For genes with multiple probes, we retained the probe with the highest mean expression value.
To observe the transcriptome response of XP-ASD networks during neuron differentiation, we correlated the gene expression patterns with the developmental time points, considering the differentiation time as an ordinal variable. The results are represented in Fig S7.
ASD induced pluripotent stem cells (iPSC) data
ASD iPSC data 30 were downloaded from GEO (GSE67528). Gene expression counts were normalized with the TMM method 81 and filtered to exclude low-expressed genes (genes with count per million greater than 1 were retained). To calculate the correlations, normalized RNA-Seq gene expression values were log(x+1) transformed.
Regulatory effect of gene mutations on signaling pathways
Data were extracted from a genome-wide mutational study that monitored the regulatory effect of gene mutations on phosphorylation status of 10 core genes of different signaling pathways and processes 52. Genes whose mutations affected the phosphorylation status of the core signaling genes with FDR <0.1 were considered as the regulators of the cognate signaling pathway.
Acknowledgments
Authors would like to thank Dr. Lilia Iakoucheva for the critical review of this manuscript. This work was supported by NIMH R01-MH110558 (EC, NEL), NIMH R01-MH080134 (KP), NIMH R01-MH104446 (KP), NFAR grant (KP), NIMH P50-MH081755 (EC), Brain & Behavior Research Foundation NARSAD (TP), and generous funding from the Novo Nordisk Foundation through Center for Biosustainability at the Technical University of Denmark (NNF10CC1016517).
References
- 1.↵
- 2.
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.
- 27.
- 28.
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵