Genes involved in cholesterol cascades are linked to brain connectivity in one third of autistic patients

Javier Rasero; Antonio Jimenez-Marin; Ibai Diez; Mazahir T. Hasan; Jesus M. Cortes

doi:10.1101/2020.09.18.304055

Abstract

The large heterogeneity in the symptomatology and severity of autism spectrum disorder (ASD) is a major drawback for the design of effective therapies. Beyond behavioral phenotypes, subtype stratification strategies that can be applied to large populations are needed, these combining different neurobiological characteristics and based on the large-scale organization of the human brain, as well as neurogenetic fingerprints. Here, we make use of ABIDE, the largest publicly available database of functional neuroimaging in ASD, to which we have applied rigorous data harmonization between the different scanning institutions in order to employ analyses based on consensus clustering and to evaluate the patterns of brain connectivity. As a result, we identified three subtypes of ASD, the first of which was characterized by a mixture of hyper- and hypo-connectivity, stronger network segregation and weaker integration, and it represented approximately 13% of all patients. The second subtype was associated with 31% of the patients, and it was characterized by hyperconnectivity but no topological differences with respect to the group of typically developing controls. The third was the most numerous subtype, assigned to 52% of all patients, and it was characterized by hypoconnectivity, decreased network segregation and increased integration. We also defined a neurobiological signature for each of these subtypes, detailing the connectivity and structures most specific to each subtype. Strikingly, at the behavioral level, none of the neuropsychological scores used in the diagnosis of ASD is capable of differentiating any of the subtypes from the other two. Finally, we use the Allen Human Brain Atlas of gene transcription brain maps to show that subtype 2 has an extraordinary enrichment in biological processes related to the synthesis, regulation and transport of cholesterol and other lipoproteins, one of the mechanisms previously attributed to ASD. We also show that this lipid-susceptible ASD subtype could be represented by the dysfunctionality of the network, unlike the other two subtypes that have more structural alterations in the connectome. Thus, our study provide compelling support for prospects of cholesterol-related therapies in this subset of autistic individuals.

Introduction

Autism is a very heterogeneous disorder, producing symptoms that range from impaired social communication and/or interactions, to manifestations of restricted and/or repetitive behavioural patterns, interests and activities (1–3). Due to the wide heterogeneity in symptomatology, and as recommended in The Diagnostic and Statistical Manual of Mental Disorders (DSM–5), this disparate condition is referred as autism spectrum disorder (ASD), in which the term “spectrum” emphasizes the variation in the type and severity of symptoms (4). We know that ASD results from complex interactions between genetic, epigenetic and environmental factors during development (5–9), yet a clear understanding of the factors and mechanisms that produce these precise phenotypic patterns remain largely unknown.

Several mechanisms have been proposed to underlie ASD, including an imbalance in excitation/inhibition (E/I) during development (10, 11). However, the factors driving this disease are not well understood, making therapeutic interventions to restore the E/I balance in ASD a major challenge (12). Motivated by the Smith-Lemli-Opitz Syndrome (SLOS), a genetic condition with ASD symptoms characterized by impaired cholesterol biosynthesis (13), factors related to the synthesis, metabolism and transport of lipids, cholesterol and other lipoproteins might plays a relatively general role in ASD (14–16). Indeed, a therapeutic intervention based on cholesterol supplements has been attempted (17), although the final results of these studies and the therapeutic benefits at the behavioural level are not yet clear. In fact, these two mechanisms, E/I imbalance and lipid deficits appear to be related (18–20), although a precise causal correspondence between these two phenomena is also unknown.

At the macroscopic level, several brain structures appear to be implicated in ASD, many forming part of the so-called social network of the brain (21) and including the primary motor cortex, fusiform, amygdala, cerebellum, insula, somatosensory and anterior cingulate cortex (22–24), the latter playing an important role in the large anatomical heterogeneity observed in ASD (25). When considering brain networks, the frontal, default mode and salience have been implicated in ASD (23, 26–30). Moreover, the neuroanatomical structures implicated in ASD are not static but they undergo changes throughout development (31–33), and this also happens in social functioning and communication (34).

Here, we follow on from previous studies based on consensus clustering applied to brain connectivity matrices (35, 36), and we searched for ASD subtypes by representing the multivariate connectivity pattern of given brain regions, encoded in a vector of dimension equal to the number of regions, and where each component is defined by the connectivity between the given region and any other. Consequently, if two subjects belong to the same subtype, it implies that each brain structure connects to the rest of the brain in a similar way, revealing similarity between the brains of the patients within the same subtype through multivariate aspects of large-scale brain connectivity. We performed our subtyping analyses on 880 autistic patients, based on functional magnetic resonance imaging (fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE) repository (37). This data was previously harmonized (38–41) to overcome the different sources of variability in the neuroimaging data, such as scanning institution, sex and age.

Linking the connectivity-based ASD subtypes to their neurogenetic signature may also shed some light on the biological mechanisms that differentiate each subtype. As such, we must take into consideration that ASD is an extensive polygenic condition, thought to involve about 944 genes at present according to the SFARI gene human-database (as of July 15^th, 2020), see also (42). To date, a precise association between the entire transcriptome and brain connectomics in ASD patients has yet to be explored, although an association between transcriptomics and brain morphology was recently assessed (43), indicating that genes that are enriched for synaptic transmission and downregulated in individuals with autism were associated with variations in cortical thickness. Here, we take a step forward and ask if it is possible to perform a correspondence analysis between the large-scale connectivity patterns and the Allen Human Brain Atlas (AHBA) of whole-brain transcriptional data (44) for each ASD subtype, following a similar methodology to that used previously (45–47). Our hypothesis is that by identifying the genes whose expression maps coincide more closely with the connectivity maps, we might better understand the neurogenetic signature and neurobiological mechanisms associated with each subtype of ASD, a major challenge in this field.

Materials and Methods

Participants

A total of N=1890 subjects from the ABIDE repository participated in this study, of which 880 were ASD patients and 1010 were typically developing controls (TDCs). The data comes from 24 different institutions (see Table S1) and for each Institution, we registered the centre’s name, number of subjects contributed, mean age, sex distribution and the number of ASD cases. For each participant, we obtained both anatomical and fMRI data. The acquisition parameters for each scanning institution can be found at http://fcon_1000.projects.nitrc.org/indi/abide/. Moreover, we assessed cognitive performance and disease severity using the Autism Diagnostic Observation Schedule-Generic (ADOS-G), Autism Diagnostic Interview-Revised (ADI-R) and Intelligence Quotient (IQ) sub-scores: verbal IQ (VIQ), performance IQ (PIQ) and full IQ (FIQ). A brief description of all this neuropsychological information can be found in Table S2.

Neuroimaging pre-processing

A state-of-the-art pre-processing pipeline was adopted using FSL, AFNI (48) and MATLAB. We first applied slice-time correction and then, each volume was aligned to the middle volume to correct for head motion artefacts, which was followed by intensity normalization. We next regressed out 24 motion parameters, as well as the average cerebrospinal fluid (CSF) and the average white matter signal. A band-pass filter was applied between 0.01 and 0.08 Hz, and linear and quadratic trends were removed. All voxels were spatially smoothed with a 6 mm FWHM and finally, FreeSurfer was used for brain segmentation and cortical parcellation. A total of 86 regions were generated, with 68 cortical regions from the Desikan-Killiany Atlas (34 in each hemisphere) and 18 subcortical regions (left/right thalamus, caudate, putamen, pallidum, hippocampus, amygdala, accumbens, ventral DC and cerebellum). The parcellation for each subject was projected to the individual functional data and the mean functional time series of each region was computed. Furthermore, from the 86 regions we reordered those that are part of the social network (for details see Table S3). Finally, a connectivity matrix was obtained for each subject by Fisher z-transforming the Pearson correlation coefficients between the regions’ time series.

Data Harmonization

To harmonize our multi-institution data, we used an in-house implementation of Combat (https://pypi.org/project/pycombat), adjusting batch effects by linear mixed modelling and the use of Empirical Bayes methods (39). In our case, batch effects may reflect the different set-ups for image acquisition at each institution included (e.g., MRI scanner manufacturer, different antenna and/or software, gradient coils, magnet field strength, etc.). Let Y_ijkrepresent the value of the connectivity entry k for subject j at institution i. Combat adjusts the Y_ijkdata by estimating the coefficients present in the following linear mixed model: where α_k is the fixed intercept, β_k the fixed slopes for the variables in a design matrix X, and γ_ik and δ_ik the location and scale institution factors modelled as random effects. One of the strong points of Combat is the use of an empirical Bayes (EB) approach to better estimate γ_ik and δ_ik, an iterative step that is particularly relevant when sample sizes are small. Specifically, it assumes that the two parameters controlling the random effects are sampled from the following prior distribution:

The hyperparameters γ_i, τ_i², λ_i and θ_i are empirically estimated using an expectation-maximisation (EM) procedure as described in (39). Thus, the harmonized data Ŷ_ijkread: where are the fitted coefficients present in Eq. 1. In the original Combat implementation, the design matrix X encodes the effects of interest that we want to preserve during the harmonization process. Potential covariates of interest may include age, sex and group-level variables. However, since one of our main goals is to find ASD subtypes, in addition to batch/institution effects, we may need to remove the additional sources of variability in the data that could strongly affect the connectivity values, thereby hiding the true underlying structure of each ASD subtype. We can easily incorporate this into the Combat framework by modifying Eq. 1 as follows: where we have now explicitly separated the effects to keep X from other possible sources of covariation C, such as sex and age, which we want to remove during the harmonization process. Such an extension to Combat has already been proposed and proven to remove additional confounding bias (40). As a result, after fitting Eq. 5, the harmonized dataset was evaluated as follows: In our scenario, we will consider the group factor (TDC vs ASD) as the variable of interest after harmonization, and age and sex as the covariate (C) effects to be removed, in addition to the institution effects encoded in γ_ik and δ_ik.

Consensus clustering

ASD subtyping was investigated using a consensus clustering approach applied to brain connectivity matrices (35, 36). First, for each brain region i we defined a distance matrix for each pair of ASD subjects as: where r_uv is the Pearson correlation between the connectivity pattern of subjects u and v for region i. The connectivity pattern is a vector of dimension equal to the number of regions, where each component is defined as the amount of connectivity between the given region and any other. Then, each distance matrix Dⁱ is partitioned into k groups of subjects using a k-medoids clustering method (49), and the resulting clustering information encoded into an adjacency matrix, whose entries are 1 if a pair of subjects belongs to the same cluster and zero otherwise. Subsequently, a N × N consensus matrix C was evaluated by averaging this information across the nodes. Hence, the entries of C_uv indicate the number of partitions in which subjects u and v are assigned to the same group, divided by the number of partitions. Eventually, the consensus matrix is averaged over the k range in the interval (2-20), so that information about the underlying structure at different resolutions is combined in the final consensus matrix, for more details see (35).

ASD subtyping

The consensus matrix C was further used to compute a modularity matrix as follows: where P is the expected co-assignment matrix, uniform as a consequence of the null ensemble strategy obtained by repeating the permutation of labels 1000 times, and γ is the resolution parameter, which we set to 1 as in the Newman and Girvan scenarios and that corresponds to the maximum modularity value (50). Such a modularity matrix B encodes all the information about the interaction between subjects at different levels. As a result, one could now define any distance quantity and apply it to this matrix, subsequently using a k-medoids analysis to assess clustering. Instead, we fed this B matrix into a generalized Louvain method for community detection using modularity matrices (https://github.com/GenLouvain/GenLouvain), which allows an optimal output partition to be obtained without specifying the number of desired clusters.

Separability of the subtypes with respect to TDC

To assess the separability of brain connectivity profiles between each ASD subtype and the TDC group, we utilized the Multivariate Distance Matrix Regression (MDMR) method, as described previously (51, 52). First, the distance matrix defined in Eq. 7 was calculated for a fixed brain region i and then, for each brain region i and predictor column x in a design matrix X, the MDMR yields a pseudo-F statistic F_xⁱ which reads: where tr indicates the trace operator, N the number of observations, m the degrees of freedom in X, m_x the degrees of freedom of predictor x, H_x the isolated effect of predictor x from the usual matrix H = X(X^TX)⁻¹X^T, and Gⁱ the so-called Gower matrix built from the distance matrix Dⁱ (53). Like the F-estimator in a standard ANOVA analysis, Eq. 9 assesses the variance explained by a predictor variable with respect to the unexplained variance. However, the pseudo-F statistic is not distributed like the usual Fisher’s F-distribution under the null hypothesis and therefore, the related p-values have to be computed by a permutation procedure (52). Finally, in order to estimate how much variability can be attributed to each predictor, a pseudo-R² effect size can be computed by dividing the numerator without the degrees of freedom in Eq. 9 by the total sum of squared pairwise distances in the Gower matrix, i.e.: . Similar to standard linear models, this effect size quantifies the proportion of the total sum of squares that can be explained by the predictors.

Graph topological metrics

The connectivity matrices between each ASD subtype were compared using four metrics obtained from graph theory: Transitivity (segregation), global efficiency (integration), assortativity (resilience), and small-worldness (communication). Transitivity is defined as the ratio of triangles to triplets in the network, global efficiency as the average inverse shortest path length, and assortativity as the correlation between the degrees of all nodes at two opposite ends of a link (54). Small-world networks are characterized as simultaneously highly integrated and segregated networks (55), and they can be detected when λ ∼ 1 and γ > 1 (56), with representing the ratio of the average clustering coefficient and shortest path lengths of the observed network with respect to randomized networks, respectively (57). In our case, randomized networks were achieved by rewiring each node five times while preserving the same number of nodes, edges and degree distribution of the original network. Taking into account these considerations, we evaluated the amount of small-worldness in each connectivity matrix using the ratio , a positive quantity greater than 1 for small-world networks, whereby the higher the value the more small-world is the network.

Thus, a common practice to get rid of spurious correlations, that is correlations that they could mask other nodes in the network of greater importance, we binarized the harmonized connectivity matrices by applying a continuous range of proportional thresholds to maintain the same density of stronger (positive) connections across all the individual networks (58). Specifically, the density of network links was varied continuously between 5% and 30%. Finally, we evaluated the topological differences between the different pairs of groups (ASD subtypes and TDC) for a particular threshold of 10% using an ordinary least square (OLS) regression, while controlling for age, sex, VIQ, FIQ, PIQ and the overall functional connectivity. This last confounder variable was included to avoid overestimating topological differences in the network organization due to the proportional thresholding of connectivity matrices.

Neurobiological signature

To define the connectivity of structures that are most specific to each subtype, and that can therefore provide a neurobiological signature for each subtype, we made use of a machine learning approach, namely a Support Vector Machine (SVM) with a linear kernel and a Recursive Feature Elimination (RFE) procedure (59). As the classification problem that we want to solve is multiclass, given that there are three subtypes, we adopted a strategy of classifying each subtype against the rest, the so-called one-versus-rest strategy, fitting a single classifier per subtype with observations within that subtype as the positive class and the rest of observations as the negative class. RFE is one of the wrapper methods for feature selection. Specifically, one first computes the weights for the full set of features (here, the links in the connectivity matrices). Subsequently, the feature with the lowest (absolute) weight is eliminated, and this process is repeated until a desired number of features are achieved. Since this number is in principle arbitrary, we embedded this process into a 5-fold cross-validation scheme, such that in each feature elimination step, the averaged out-of-sample balanced accuracy is computed. The number of links with the highest cross-validation performance is the one we finally take as valid. To reduce the complexity of each resulting circuit while maintaining a competent performance, we also apply the “1-Standard Error Rule” for feature selection, which chooses the fewest features within a standard error (SE) similar to the highest accuracy achieved (60). Finally, the specific circuit formed by the resulting links is the one we will take as the neurobiological signature of each subtype.

Transcriptomics

To build brain transcription maps, we took advantage of the publicly available data in the AHBA (44). The dataset consisted of MRI images, and a total of 58,692 microarray-based transcription profiles of about 20,945 genes sampled from 3,702 different regions across the brains of six humans. To pool all the transcription data into a single brain template, we followed a similar procedure to that employed elsewhere (61). First, to re-annotate the probes to genes we made use of the re-annotator toolkit (62). Second, we removed those probes with insufficient signal by looking at the sampling proportion (SP), which was calculated for each brain as the ratio between the samples with a signal greater than the background noise divided by the total number of samples. Probes with a SP lower than 70% in any of the six brains were removed from the analysis, thereby ensuring sufficient sampling power in all the brains. After that, we chose the value of the probe for each gene with the maximum differential stability (DS), accounting for the reproducibility of gene expression across brain regions and individuals. This was also calculated using spatial correlations similar to those employed previously (63) but using the Desikan-Killiany atlas, the regions of which were eroded with a Gaussian kernel with a full width at half maximum (FWHM) equal to 2 mm, thereby eliminating false-positive sampling sites (i.e.: those that do not belong to the region of interest but to one in the neighborhood). To remove the inter-subject differences, the transcription values for each gene and brain were transformed into Z-scores, and pooled together from the six different brains, obtaining a single map using the MNI coordinates provided in the dataset. Finally, to eliminate the spatial dependencies of the transcription values at the sampling sites (i.e.: to correct for the fact that the nearest sites have better correlated transcription), we obtained a single transcription value for each region in the Desikan-Killiany atlas by calculating the median of all the values belonging to a given region.

Association between subtypes and transcriptomics

We first computed the statistical spatial maps per subtype that were later associated with the transcriptomics data. Specifically, we computed a distance matrix per region of interest (ROI) using the whole-brain connectivity patterns in our harmonized functional matrices. Subsequently, we performed a MDMR for each of these distance matrices to assess the differences between each subtype and the TDC group, while controlling for age, sex and IQ scores (FIQ, VIQ and PIQ). As a result, we obtained a spatial map of F-statistics per subtype. For each gene, we then calculated a similarity index using the Pearson correlation coefficient between F-values and transcriptomics, the two variables represented in vectors with a dimension equal to the number of regions in the atlas. This procedure was done separately for each of the different subtypes.

Identification of relevant genes and gene ontology

Relevant genes were identified by combining the results from two different strategies. The first was the data-driven strategy (DDS), an exploratory-orientated approach to identify the genes with the strongest transcription correlation with the F-statistic in each subtype. In particular, we chose genes with similarity index values corresponding to |Z| > 2, i.e.: outliers of the correlation distribution in both the negative and positive tails. Genes in the positive tail (Z > 2) were designated as pos-corr genes (P), whereas those in the negative tail (Z < −2) were considered neg-corr genes (N). In principle, this is an arbitrary choice that guarantees the selection of genes that are outliers in the two tails, and for which the Gaussian case would correspond to choosing those genes beyond the 95% Confidence Interval (CI)/with a percentile greater the 95% or less than 5%. Therefore, P genes were systematically expressed more strongly than other genes in the brain regions with a more pronounced F, that is, in the regions where connectivity differs most relative to the corresponding TDC, whereas the N genes were expressed much more weakly than the rest. Subsequently, for each phenotype and tail we performed a gene ontology GO biological process (64) and Reactome pathways (65) overrepresentation test using PANTHER v15.0 (http://pantherdb.org/), with the entire Homo Sapiens genome as the reference list, and using a Fisher’s Exact test with a Bonferroni correction (p < 0.05). To enhance the interpretability, we only reported enrichment ≥2-fold.

The second strategy is the hypothesis-driven strategy (HDS), a more confirmatory-orientated analysis performed on genes from https://gene.sfari.org/database/human-gene/ that belong to Category 1, i.e.: non-syndromic genes with a high confidence of being implicated in ASD due to the presence of at least three de novo likely-gene-disrupting mutations reported in the literature. This list includes 192 genes and it is also coincident with that published elsewhere (66). Thus, for the genes that were also P or N genes, their statistical significance was assessed by surrogate-data testing. The BrainSMASH tool (67) was used to build null-distributions by generating 10,000 random maps with the same spatial autocorrelation as that for the F-statistic map.

Results

A total of 880 ASD patients and 1010 TDCs participated in this study, the workflow of which is shown in figure 1. In summary, our analysis was based on the following steps: 1. Data preparation, pre-processing and harmonization to eliminate the variability associated with the differences in scanning at each institution, or the participant’s age and sex; 2. ASD Subtyping achieved by applying consensus clustering to brain connectivity matrices; 3. Characterization of connectivity as hyperconnectivity, hypoconnectivity or a mixture of the two; 4. Drawing associations between transcriptomics and the F-statistics maps, accounting for the region separability of each ASD subtype and the TDCs. The results corresponding to each of these steps will be described in detail.

Figure 1. General workflow.

Preparation and pre-processing of the ABIDE dataset. Harmonization using the Combat algorithm to remove institution, age and sex effects. ASD subtyping through a consensus clustering approach applied to brain connectivity matrices. Characterization of subtypes from their neurobiological signature and neurogenetic profiles using brain maps of the entire transcriptome.

Data harmonization

We ran several exploratory tests to assess the differences in our dataset related to the scanning institution, sex, age and diagnosis (ASD vs TDC). From a demographic point of view, no statistical differences (two-sided t-test p = 0.82) were found between the age distributions in the TDC (μ = 16.03, σ = 8.62) and ASD group (μ = 16.12, σ = 9.17). By contrast, there was a significant difference in the male:female ratio between these two groups (Fisher’s exact test p = 1.29 × 10⁻⁷). At the connectivity level, we tested the association among individuals of each link in the functional connectivity matrix with respect to age and found that 2236 edges had a significant association after correcting by false discovery rate (FDR), yet this may be strongly influenced by the large size of our dataset as the effect sizes were generally small (mean r = 0.07, σ = 0.05, max r = 0.319). Similarly, a t-test indicated that 565 connectivity links were significantly different between males and females, and with small effect sizes (mean Cohen d = 0.09, σ = 0.06, max d = 0.32). To assess the differences in the connectivity matrices among the different scanning institutions, we regressed out age, sex and diagnosis through ordinary least squares (OLS), thereafter applying a Lavene’s test to each link to assess whether or not they had equal variances across institutions. There were 1498 links that had significantly different variances across institutions after FDR correction. As explained in methods, we ran an updated version of the Combat strategy to remove such sources of variability in the data and the confounding effects that may affect the subsequent subtyping of the ASD group, while retaining the between-group variability of our data. Except for our group variable, link connectivity values across institutions had very homogeneous and comparable distributions after data harmonization (figure S1), and the effect sizes decreased dramatically (figure S2), such that there was no association between the links and any of the confounding variables (institution, age, sex).

Subtyping

The harmonized data was used to find subtypes (or clusters) in the ASD group. We employed a consensus clustering approach to the brain connectivity matrices to gather together information about the connectivity patterns of all the brain regions at different resolutions. After applying a maximization algorithm to the modularity matrix obtained, three subtypes were found (figure S3): the first comprised of 121 subjects, with a prevalence of 13.75% with respect to the entire group; the second contained 278 subjects (31.59%); and the third subtype 464 subjects (52.72%). The rest of the subjects (17 in total) were considered to be unclassified, since the community detection algorithm could not assign them to any specific subtype. None of the resulting subtypes were differentiated by age (one-way ANOVA test, p = 0.37) or sex (χ² test, p = 0.45).

Behavioural assessment

We attempted to characterize each ASD subtype based on the different behavioural scores to clarify whether they had a behavioural signature. None of the tests nor their sub-parts significantly differentiated between the three (Table 1). However, and as expected due to the diagnosis of ASD, the three subtypes had significantly lower scores with respect to the TDC group (figure S4).

View this table:

Table 1: Discrimination of the ASD subtypes based on the behavioural scores

. The average behavioural scores of the individuals within each ASD subtype and the uncorrected p-values provided by a one-way ANOVA test. None of the scores could differentiate between the three subtypes.

Connectivity class

Subsequently, we wondered if the type of connectivity could differentiate each subtype from the TDC and hence, we performed an OLS regression controlling for age, sex and the three measures of intelligence quotient (VIQ, FIQ and PIQ). First, we assessed the difference in the overall connectivity per subject defined as the average of the positive correlation coefficients extracted from the upper triangle in the harmonized connectivity matrices (figure 2a). Interestingly, the contrast between the TDC subjects (baseline) and the subtype 2 individuals was significantly positive (β = 0.09, p_bonf < 0.01), which means that there was a tendency for the overall connectivity to be greater in subtype 2 than in the TDC, indicative of overall hyperconnectivity. By contrast, the opposite was true for subtype 3 (β = −0.05, p_bonf < 0.01), which implies that hypoconnectivity generally existed in this subtype. Finally, we did not find any significant difference between subtype 1 and the TDC group (β = 0.02, p_bonf = 0.21), indicating a more balanced or mixed situation in terms of overall connectivity. The use of the absolute correlation values of the connectivity matrix, or the median and the trimmed means as the overall connectivity metric, preserved all these findings of hypoconnectivity for subtype 3, hyperconnectivity for subtype 2 and a combination of the two classes for subtype 1.

Figure 2. The connectivity class and neurobiological signature of each subtype. (a)

Raincloud plots of the individual average connectivity values for the TDC group (blue) and the three ASD subtypes (pink, orange, and green). The mean value of the TDC group is marked as the baseline by a dashed red line. Values greater than the baseline correspond to hyperconnectivity and those below the baseline hypoconnectivity. Note how subtype 2 has hyperconnectivity, subtype 3 hypoconnectivity and subtype 1 a mixture of the two, the latter with a histogram and mean value similar to that of the TDC group. (b) The differences between groups in various measures of topology, such as assortativity, transititvity, global efficiency and small-worldness calculated after thresholding individual connectivity matrices to maintain positive link values greater than 10%. Statistical differences were assessed using OLS, controlling for age, sex, and IQ scores (FIQ, VIQ and PIQ) and for average connectivity, and correcting for multiple testing through the Bonferroni-Holm procedure: *p<0.05, **p<0.01, ***p<0.001. When comparing panels a and b, one realizes that the existing hyperconnectivity in subtype 2, like the hypoconnectivity of subtype 3, is not the result of linearly scaling the weight values by a constant, since significant differences appear between some of these topological metrics and therefore, the hyper and hypoconnectivity profiles are much more complex, reflecting non-linear effects whose mechanisms are unknown. (c) The t-SNE visualization of ASD individuals using a combination of the most discriminating links per ASD subtype resulting from a Recursive Feature Elimination (RFE) and a linear Support Vector Machine (SVM). The plot shows quite decent clustering. (d) Brown barplots on the left: For each ASD subtype, the total number of selected links per bilateralized region after recursive feature elimination. Glass brain plots on the right: For those two regions with the largest number of selected links, the absolute link weight maps are depicted on the brain. (e) Separability of each ASD subtype with respect to the TDC group by illustrating the distribution of the pseudo R-square statistics across regions estimated from a Multivariate Distance Matrix Regression (MDMR).

Over and above the overall connectivity analyses, we asked if we could also find a connectivity class characteristic for each subtype at the link level. As such, we ran the same regression model for each individual link in the connectivity matrices as the response variable (figure S5) and compared this to the TDC group, most of the links gave higher values in subtype 2 individuals, indicative of link hyperconnectivity, and they were lower in subtype 3 patients, reflecting hypoconnectivity. In reference to subtype 1, the link difference distribution was more zero-centred, indicating a mix of hyper and hypoconnectivity at the link level similar to that observed when comparing the overall connectivity.