Abstract
Objective To develop a new fMRI network inference method, BrainNET, that utilizes an efficient machine learning algorithm in a specialized way to quantify contributions of various regions of interests (ROIs) in the brain to a specific ROI to estimate the network.
Methods BrainNET is based on Extremely Randomized Trees (ERT) to estimate network topology from fMRI data and modified to generate an adjacency matrix representing brain network topology, without reliance on arbitrary thresholds. Open source simulated fMRI data of fifty subjects in twenty-eight different simulations under various confounding conditions with known ground truth was used to validate the method. Performance was compared with Pearson correlation. The real-world performance was then evaluated in a publicly available Attention-deficit/hyperactivity disorder (ADHD) dataset including 135 Typically Developing Children (mean age: 12.00, males: 83), 75 ADHD Inattentive (mean age: 11.46, males: 56) and 93 ADHD Combined (mean age: 11.86, males: 77) subjects. Network topologies were inferred using BrainNET and Pearson correlation. Graph metrics were extracted to determine differences between ADHD groups. An extension to BrainNET was also developed (B-Corr) in which BrainNET adjacency matrix is combined with Pearson correlation output to remove false positives.
Results BrainNET demonstrated excellent performance across all simulations and varying confounders. It achieved significantly higher accuracy and sensitivity than Pearson correlation (p<0.05). In ADHD dataset, BrainNET was able to identify significant changes (p< 0.05) in graph metrics between groups. No significant changes in graph metrics between ADHD groups was identified using Pearson correlation. The B-Corr method provided similar results to BrainNET.
I. INTRODUCTION
The brain is a complex interconnected network that balances segregation and specialization of function with strong integration between regions, resulting in complex and precisely coordinated dynamics across multiple spatiotemporal scales [1]. Connectomics and graph theory offer powerful tools for mapping, tracking, and predicting patterns of disease in brain disorders through modelling brain function as complex networks [2]. Studying brain network organization provides insight in understanding global network connectivity abnormalities in neurological and psychiatric disorders [3]. Several studies suggest that pathology accumulates in highly connected hub areas of the brain [4, 5] and that cognitive sequelae are closely related to the connection topology of the affected regions [6]. An understanding of network topology may allow prediction of expected levels of impairment, determination of recovery following an insult and selection of individually tailored interventions for maximizing therapeutic success [7]. A large number of network inference methods are being used to model brain network topology with varying degrees of validation. A recent study [8] evaluated some of the most common methods, including correlation, partial correlation, and Bayes NET, to infer network topology using simulated resting state functional magnetic resonance images (fMRI) data with known ground truth and found that performance can vary widely under different conditions.
Development of statistical techniques for valid inferences on disease-specific group differences in brain network topology is an active area of research. Machine learning methods have been used in neuroimaging for disease diagnosis and anatomic segmentation [9]. Very few studies have attempted to apply machine learning methods to infer brain networks [9-12].
Recent work in machine learning approaches for inference of Gene Regulatory Networks (GRN) have demonstrated good performance [13-15]. Interestingly, these same approaches to gene regulatory networks can be used to infer brain networks. In this study, we describe a new network inference method called BrainNET, inspired by machine learning methods used to infer GRN [16].
Validation of BrainNET was performed using fMRI simulations with known ground, as well as in real-world ADHD fMRI datasets. In this study, publicly available resting state fMRI simulated data [8] was used to validate BrainNET’s ability to infer networks. The real-world performance of BrainNET was then evaluated in a publicly available data set of Attention-deficit/hyperactivity disorder (ADHD). ADHD is one of the most common neurodevelopmental disorders in children with significant socioeconomic and psychological effects [17, 18] and is very difficult to diagnose [19]. ADHD has widespread but often subtle alterations in multiple brain regions affecting brain function [20, 21] [19, 22] [23-25]. Neuro Bureau, a collaborative neuroscience forum, has released fully processed open source fMRI data “ADHD-200 preprocessed” from several sites [26, 27] providing an ideal dataset to test the BrainNET model and compare its performance with standard Pearson correlation.
II. MATERIALS AND METHODS
A. Datasets
MRI Simulation Data
Open source rs-fMRI simulation data was used to validate the BrainNET model [8]. The data was simulated based upon the dynamic causal modelling fMRI forward model, which uses the non-linear balloon model for vascular dynamics, in combination with a neural network model [8]. The open source dataset has 28 simulations; each including simulated data for 50 subjects with varying number of nodes and several confounders (e.g., shared input between the nodes, varying fMRI session lengths, noise, cyclic connections and hemodynamic lag variability changes). Additional details on the simulations can be found in the original study [8].
ADHD data
Preprocessed rs-fMRI data were obtained from the ADHD-200 database (http://fcon1000.projects.nitrc.org/indi/adhd200/). Seven different sites contributed to the ADHD-200 database for 776 rs-fMRI data acquisitions. The data was preprocessed using the Athena pipeline and was provided in 3D NifTI format. Additional information on the Athena pipeline and “ADHD 200 preprocessed” data is detailed by Bellec et al [26].
In our study, subjects identified with ‘No Naïve medication’ status, or questionable quality on rs-fMRI data were excluded. The remaining subjects were age-matched between the groups resulting in 135 Typically Developing Children (TDC) (mean age: 12.00, males: 83), 75 ADHD Inattentive (ADHD-I) (mean age: 11.46, males: 56) and 93 ADHD Combined (ADHD-C) (mean age: 11.86, males: 77) subjects. Mean time series from 116 ROI’s in the AAL atlas [28] were extracted using the NILEARN package [29].
B. BrainNET Model Development
The objective of BrainNET is to infer the connectivity from fMRI data as a network with N different ROIs in the brain (i.e., nodes), where edges between the nodes represent the true functional connectivity between nodes. At each node, there are measurements from m time points X = {x1, x2, x3, x4, …., xN}, where xi is the vector representation of m time points measured as Our method assumes that fMRI measurement of BOLD (Blood Oxygen Level Dependent) activation at each node is a function of each of the other nodes’ activation with additional random noise.
For the jth node with m time points, a vector can be defined denoting all nodes except the jth node as
x−j = (x1, x2, xj−1, xj+1, …., xN), then the measurements at the jth node can be represented as a function of other nodes as where εj is random noise specific to each nodej We further assume that function fj () only exploits the data of nodes in x-j that are connected to nodej. The function fj () can be solved in various ways in the context of machine learning. Since the nature of the relationship between different ROIs in the brain is unknown and expected to be non-linear [30], we choose a tree based ensemble method as it works well with a large number of features with non-linear relationships and is computationally efficient. We utilized Extremely Randomized Trees (ERT), an ensemble algorithm similar to Random Forest, which aggregates several weak learners to form a robust model. ERT uses a random subset of predictors to select divergences in a tree node and then selects the “best split” from this limited number of choices [31]. Finally, outputs from individual trees are averaged to obtain the best overall model [32]. BrainNET infers a network with N different nodes by dividing the problem into N different sub problems, and solving the function fj () for each node independently as illustrated in Figure 1. The steps are listed below:
For j = 1 to N nodes
Fit the ERT regressor with all the nodes data, except the jth node, to find the function fj that minimizes the following mean squared error:
Extract the weight of each node to predict node j, where wn is the weight of node to predict node j and n= 1 to N.
Append the weights values to the Importance matrix
The importance score for each node (Nodej) to predict (Nodei) is defined as the total decrease in impurity due to splitting the samples based on Nodej [31]. Let “S” denote a node split in the tree ensemble and let (SL, SR) denote its left and right children nodes. Then, the decrease in impurity ΔImpurity(S) from node split “S” based on Nodej to predict Nodei is defined as where, SL and SR are left and right splits and NP, NL, NR are number of samples reaching parent, left and right nodes respectively. Let 𝕍k be the number of ensembles, which uses ROIj for splitting trees. Then, the importance score for Nodej for predicting Nodei is calculated as the average of node impurities across all trees, i.e. Importance of ROIji where T is the number of trees in the ensemble.
Importance values extracted using a typical Random Forest model can be biased in the presence of two or more correlated features since the model will randomly assign importance to any one of the equally important features without any preference [33]. This problem is avoided by using the ERT regressor.
The importance of each node to predict all other node time series is extracted from the model and an NxN (where N is the number of nodes) importance matrix is generated with the diagonal equal to zero. Each row of the importance matrix represents normalized weights of each node in predicting the target node. The extracted adjacency matrix is affected in two ways. First, due to the row-wise normalization, the upper triangular values of the importance matrix are not same as the lower triangle values. We therefore take the average of the upper triangle and the lower triangle of the matrix to make it symmetric to determine the presence of connection between the nodes. This procedure does not allow directionality of the connections to be determined. Second, again because of the row-wise normalization, the sum of each row in the importance matrix is one. Since the importance values are normalized with respect to number of nodes in the analysis, we used a threshold equal to a theoretical probability value that is inversely proportional to the number of nodes (i.e., threshold = 1/number of nodes) in the network to produce a final adjacency matrix representing the network topology. This results in a dynamically changing threshold based on the number of nodes in the network.
C. Analysis
Network topologies were inferred in the rs-fMRI simulation data using BrainNET. The average of the network estimation across the 50 simulated subjects in each simulation was compared against the ground truth to calculate accuracy, sensitivity and specificity for BrainNET and Pearson correlation. In the Pearson correlation method, pairwise similarity between the nodes was calculated to create a correlation matrix [8]. A combined method called B-Corr was also created, by masking the Pearson correlation matrix with the adjacency matrix derived from BrainNET. The output from B-Corr will have nodes determined by BrainNET, with Pearson correlation values assigned between the connections. This will allows analysis of connectivity changes between nodes, which cannot be performed with an adjacency matrix alone derived from BrainNET. To determine the effect of number of nodes on threshold for the BrainNET method we False Positive Rates (FPR) with respect to number of nodes in the first four fMRI simulations which has same confounder except number of nodes (Fig.2).
After validating BrainNET on the simulated data, we applied it to the real-world ADHD data to evaluate whole brain network changes in ADHD subtypes (i.e., ADHD-Combined (ADHD-C), ADHD Inattentive (ADHD-I) compared to Typically Developing Children (TDC). The conventional Pearson correlation and B-Corr method were also used on the same dataset to infer fMRI networks. The BrainNET model was applied to extract an importance matrix for each subject. The importance matrix was then thresholded at 1/number of nodes (e.g., 1/116 for the AAL atlas regions) to obtain an adjacency matrix for each subject (AdjImp). Functional Network connectivity was calculated between the 116 ROIs using Pearson correlation and thresholded at a correlation coefficient of 0.2 (AdjCorr) This threshold was determined based on as sensitivity threshold analysis (Fig.3). The B-Corr adjacency matrix (AdjB-corr) was derived by masking the correlation matrix (AdjCorr) with the adjacency matrix of BrainNET (AdjImp). Graph theoretic metrics were extracted using each of these methods for each group. Network differences between the three groups TDC, ADHD-I and ADHD-A were then computed using t-tests on the graph metrics.
Graph Metrics
Graph theoretical metrics representing global and local characteristics of network topology were used to compare between the groups in the ADHD dataset. The Networkx package in python was used to extract the graph theoretical metrics including Density, Average Clustering Coefficient and Characteristic Path length [34]. Density of the graph is defined as the ratio of number connection in the network to the number of possible connection in the network. Average Clustering is the fraction of a node’s neighbors that are neighbors of each other. The clustering coefficient of a graph is the average clustering coefficient (ACC) over all nodes in the network. Networks with a high clustering coefficient are considered locally efficient networks. Characteristic Path length (CPL) is the average shortest path length between nodes in the graph, with minimum number of edges that must be traversed to get from one node to the other. CPL indicates how easily information can be transferred across the network [1].
III. EXPERIMENTAL RESULTS
A. BrainNET Inference of Network Topology in Simulated fMRI Data
The accuracy, sensitivity, and specificity for each method across all 28 simulated data sets is presented in Table 1. BrainNET achieved higher accuracy and sensitivity in all the simulations compared to the correlation method as shown in Fig.4. The overall accuracy and sensitivity of BrainNET for the 28 simulations was significantly higher than that Pearson correlation method with p-values of 0.0009 and 0.0001 respectively. BrainNET achieved an accuracy of 94.69 %, sensitivity of 96.31% and specificity of 88.02 %, whereas the Pearson correlation method achieved 90.03%, 89.79% and 90.35%, respectively across the 28 simulations. Although the specificity of the BrainNET’s method was lower than the correlation method (p=0.04), it was robust across the simulations and comparable to the correlation method specificity performance. BrainNET demonstrated significantly increased performance in terms of accuracy and sensitivity with a tradeoff of slightly lower specificity. The BrainNET threshold analysis is presented in Figure 2. Use of a threshold value inversely proportional to the number of nodes did not increase the False Positive rates (FPR).
B. Comparison of BrainNET and Pearson Correlation on ADHD Data
BrainNET was able to identify significant changes (p < 0.05) in brain network topology in graph metrics in the ADHD data. Significant increase and decrease of CPL and density were demonstrated respectively in between TDC and ADHD, between TDC and any ADHD subtypes, and between ADHD-C and ADHD-I subtypes (Table.2). There is no significant changes in AC between the groups. Pearson correlation was not able to detect significant changes in any of the above whole brain analyses. The B-Corr method provided similar results to the BrainNET model.
IV. DISCUSSION
BrainNET is based on ERT [35] to generate an importance matrix. The ERT regressor is used to develop a tree based ensemble model to predict each node’s time series from all other node time series. The tree based ensemble methods are ideal for inferring complex functional brain networks as they are efficient in learning non-linear patterns even where there are a large number of features [36]. The importance matrix is then thresholded to generate an adjacency matrix representing the fMRI topology. The BrainNET model is applicable to both resting-state and task-based fMRI network analysis. It can be easily adapted to datasets with varying session length and can be used with different parcellation schemes. A unique feature of the BrainNET approach is that it is implemented at the subject level. It does not need to be trained on big datasets as it infers the network topology based on each individual subject’s data.
A. BrainNET Inference of Network Topology in Simulated fMRI Data
BrainNET demonstrated excellent performance across all the simulations and varying confounders. It achieved significantly higher accuracy and sensitivity than Pearson correlation (p<0.05). BrainNET performance remained high in the simulations across varying session lengths, number of nodes, neural lags, cyclic connections, and changing number of connections. Even for the simulations in which only one node had much higher activation signal than the other nodes (simulation 24), the model achieved 84% accuracy and 85.3% sensitivity. BrainNET performance was weakest for Simulation 11. In this simulation, there are 10 nodes, and each node shares a relatively small amount of the other node time series in a proportion of 0.8:0.2. Since the features have shared data between the nodes in this simulation, it limits discrimination of true connectivity between nodes. BrainNET still outperformed the correlation method in Simulation 11. The sharing of data between nodes can be minimized in fMRI analysis by selecting independent regions using anatomical parcellation or methods such as ICA. In simulation 13, the nodes had many indirect connections. The correlation method performed poorly, identifying many false positives (specificity of only 45%), whereas BrainNET achieved a specificity of 86% with higher accuracy and sensitivity compared to the correlation method. In simulation 16, the nodes were simulated to have a greater number of connections. BrainNET specificity dropped to 75.43% while maintaining high accuracy and sensitivity. This suggests that the ability of BrainNET to find all the connections in highly connected hub nodes may be affected.
Thresholding the importance matrix can change the network topology drastically. Thresholding can be applied to suppress spurious connections that may arise from measurement noise and imperfect connectome reconstruction techniques and to potentially improve statistical power and interpretability [7]. However, based on the threshold value, the connection density of each network may vary from network to network after the threshold has been applied. This can lead to wide variability in computed graph metrics, as they are typically very sensitive to the number of edges in a graph. Identifying an appropriate threshold to infer the underlying brain network topology is critical. In the BrainNET model, as the number of nodes increases, the importance value also decreases as it is normalized across all the nodes in a row. Our choice of threshold for BrainNET is [1/ (number of nodes)], represents a theoretical probability for the presence of connection between the nodes. One concern with this approach is that as the number of nodes increases, the threshold similarly decreases, and may result in increased false positives at this low threshold value. We performed a specificity analysis to determine the false positive rate as the number of nodes increases. The false positive rate actually improved with increasing number of nodes (Figure 2), showing that the decreased threshold value does not affect the performance of BrainNET.
A major strength of the BrainNET approach is that it provides a unique threshold to determine the true network topology. In correlation-based approaches, there is no defined correlation cutoff to determine the true network topology. Rather, multiple approaches are employed, or multiple thresholds applied to generate different networks. Typically, the network cost has been used to define the cutoff value for defining true connections in correlation-based approaches [37]. Multiple costs are then applied to generate multiple instances of the network topology, and analyses are performed to determine the variation in network metrics across these costs, or variation in group differences across thresholds [38]. The BrainNET approach provides a single threshold obviating the need for these imprecise and convoluted thresholding approaches.
B. Evaluation on ADHD Data
BrainNET was able to identify statistically significant changes in graph metrics between ADHD subjects and typically developing children. The correlation method failed to identify differences in any of the groups with any statistical significance (Table.2). The BrainNET results were similar across ADHD groups and TDC. There was a decrease in density and an increase in CPL in ADHD compared to TDC. Decrease in density suggests that the number of connections is decreased in ADHD. This can be interpreted as an increase in the cost of wiring in the brain. The increase in CPL is expected given that there is a decrease in density and suggests that there is increased difficulty in transferring information across the brain in ADHD. The trends in the graph metrics using the correlation method do not convey much information and the interpretation may be misleading as none was even close to significance. The B-corr method, however, which is the combination of correlation and BrainNET, showed significant changes between all groups similar to BrainNET. Previous studies have shown that ADHD is often associated with changes in functional organization of the brain [18, 21]. BrainNET analysis of ADHD data supports the notion that functional organization of brain changes in ADHD, and it was effective in identifying the subtle changes in the ADHD subjects.
V. CONCLUSION
We describe BrainNET, a new network inference method to estimate fMRI connectivity that was adapted from Gene Regulatory methods. We validated the proposed model on ground truth simulation data [8]. BrainNET outperformed Pearson correlation in terms of accuracy and sensitivity across simulations and various confounders such as the presence of cyclic connections, and even with truncated fMRI sessions of only 2.5 min. We also describe a method of thresholding correlation-based networks using the BrainNET results (B-Corr). We evaluated the performance of BrainNET on the open source “ADHD 200 preprocessed” data from Neuro Bureau. BrainNET and B-Corr were both able to identify significant changes in global graph metrics between ADHD groups and TDC, whereas correlation alone was unable to find any differences. BrainNET can be used independently or combined with other existing methods as an effective tool to understand network changes and to determine true network topology of the brain under various conditions and disease states.
Footnotes
This paragraph of the first footnote will contain the date on which you submitted your paper for review.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.