Abstract
Background and aims The main challenge in analysing functional magnetic resonance imaging (fMRI) data from extended samples of subject (N>100) is to extract as much relevant information as possible from big amounts of noisy data. When studying neurodegenerative diseases with resting-state fMRI, one of the objectives is to determine regions with abnormal background activity with respect to a healthy brain and this is often attained with comparative statistical models applied to single voxels or brain parcels within one or several functional networks. In this work, we propose a novel approach based on clustering and stochastic rank aggregation to identify parcels that exhibit a coherent behaviour in groups of subjects affected by the same disorder and apply it to default-mode network independent component maps from resting-state fMRI data sets.
Methods Brain voxels are partitioned into parcels through k-means clustering, then solutions are enhanced by means of consensus techniques. For each subject, clusters are ranked according to their median value and a stochastic rank aggregation method, TopKLists, is applied to combine the individual rankings within each class of subjects. For comparison, the same approach was tested on an anatomical parcellation.
Results We found parcels for which the rankings were different among control subjects and subjects affected by Parkinson’s disease and amyotrophic lateral sclerosis and found evidence in literature for the relevance of top ranked regions in default-mode brain activity.
Conclusions The proposed framework represents a valid method for the identification of functional neuromarkers from resting-state fMRI data, and it might therefore constitute a step forward in the development of fully automated data-driven techniques to support early diagnoses of neurodegenerative diseases.
1. Introduction
The recent advances in functional magnetic resonance imaging (fMRI) have made available high-quality data characterized by ever higher resolution images and shorter repetition times. This translated into an explosion of the dimensionality of data, thus generating the need of analysis techniques able to cope with the increased complexity of the problem.
To investigate how information is represented in the brain of healthy subjects, and how neurodegenerative diseases affect the physiological mechanisms underlying such a representation, a wide variety of statistical and computational methods have been applied to extract meaningful patterns of neural activity from fMRI data.
In population-level analysis, brain voxels (volumetric pixels) are usually analysed in isolation with traditional univariate techniques such as t-test or ANOVA. However, when the number of variables is huge, as is the case with fMRI data, issues such as the number of false positive and the multiple comparisons problem need to be addressed (see, e. g, Eklund et al. 2016). Moreover, voxels are features deriving from a geometrical representation of the brain that does not reflect the actual organization of neurons. While this might not constitute an issue in single-subject study, in a cohort study inter-subject variability hinders the generalizability of the results: in fact, a voxel found to be significant on a given subject may not be significant on a different subject, or even fall in a different brain region.
When conducting population studies, the goal is to analyse group-specific behaviour starting from the product of a first-level analysis consisting in single-subject activation maps. One of the limitations of this approach is that, due to the low signal-to-noise ratio of single-subject images, and to within and between subject variability, the measurable effects might be small or masked by noise. For this reason, in this work we propose an approach based on similarity within groups as opposed to traditional comparative approaches that focus on searching discriminative patterns.
We present a new framework based on clustering and stochastic rank aggregation methods to identify functional neuromarkers starting from single-subject activation maps. Clustering has been previously applied in fMRI data analysis to extract patterns from raw time series (Goutte 1999) or from second level features extracted from data (Goutte et al. 2001), and in group level analyses (Thirion et al. 2006; Heuvel et al. 2008). In combination with single-subject independent component analysis (ICA) (Hyvärinen and Oja 2000), clustering has been also used to identify the most similar ICA components within a single group of subjects (Esposito et al. 2005), but to the best of our knowledge this is the first time that it is used in combination with rank aggregation to identify regions of the brains that show a common behaviour among subjects belonging to the same class or group and investigate the differences between different classes or groups. Since this approach is fully data-driven, there is no need for strong assumptions about data distribution as in parametric models, and can therefore be equally applied to, e. g., ICA maps derived from resting-state fMRI data or conventional activation maps derived from a general linear model analysis of activation time-courses. Nor it is necessary to specify in advance regions of interest, since a small subset of informative regions automatically emerges from the analysis. This framework was tested on a real-world data set consisting of individual ICA-derived default-mode network (DMN) maps from resting-state fMRI scans of healthy controls and of subjects affected by amyotrophic lateral sclerosis (ALS) and Parkinson’s disease (PD). Although very different in their clinical manifestations, both these pathologies affect the motor system. However, previous structural and functional studies have reported widespread extra-motor changes in patients with ALS, with frontal deficits linked to neuropsychological impairment and differences in the activation of the default mode and sensorimotor network (Lomen-Hoerth et al. 2003; Kiernan et al. 2011; Luo et al. 2012; Verstraete et al. 2014; Chiò et al. 2014; Verstraete and Foerster 2015; Menke et al. 2017). Network-based studies on PD revealed altered connectivity in the sensorimotor network, DMN, fronto-parietal network and temporal-occipital networks (Gattellaro et al. 2009; Olde Dubbelink et al. 2014; Tessitore et al. 2014; Suo et al. 2017; Herrington et al. 2017; Hepp et al. 2017). The choice of validating the proposed framework using the DMN is aimed at facilitating the interpretation of the results. The DMN is by far the most studied resting-state network, it appears to be the most stable across subjects both in health and in disease, it is known to interact with higher order networks and can be considered a cognitive baseline for a subject (Greicius et al. 2003; Damoiseaux et al. 2006; Harrison et al. 2008; Buckner et al. 2008; Tedeschi and Esposito 2009; Whitfield-Gabrieli and Ford 2012; Raichle 2015). Additionally, previous works have investigated the role on the DMN in association with ALS (Mohammadi et al. 2009; Tedeschi et al. 2012; Agosta et al. 2013; Iyer et al. 2015; Trojsi et al. 2015) and PD (Van Eimeren et al. 2009; Tessitore et al. 2012b; Gorges et al. 2013; Amboni et al. 2015; Luo et al. 2015). On the other hand, higher order networks imply other clinical considerations related to the diseases under study. For example, the sensory-motor network is known to be suppressed in subjects affected by ALS and is characterized in general by a lower signal-to-noise ratio (Mohammadi et al. 2009; Tedeschi and Esposito 2009; Tedeschi et al. 2012) while in PD patients it has been shown that levodopa enhances the sensorimotor network functional connectivity in the supplementary motor area, a region where drug-naïve PD patients exhibit reduced signal fluctuations compared with untreated patients (Esposito et al. 2013). While the proposed method can be applied to any network, a study on sensory/motor function should consider also other clinical factors that can potentially affect the results and consequently their interpretation.
2. Methods
2.1. Ethics statement
The institutional review board for human subject research at the Università degli Studi della Campania “Luigi Vanvitelli” approved the study and all subjects gave written informed consent before the start of the experiments.
2.2. Participants
Data come from a cohort of 121 subjects, with age ranging from 38 to 82 years (mean age 63.87 ± 8.2). Specifically, they include 41 patients (20 women) from a clinical study on amyotrophic lateral sclerosis (Tedeschi et al. 2012); 37 patients (14 women) from a clinical study on Parkinson’s Disease (Tessitore et al. 2012a; Tessitore et al. 2012b; De Micco et al. 2013; Amboni et al. 2015) and 43 control subjects (23 women) from the same clinical studies. All control subjects were free from (and had no history of) neurological disorders.
2.3. MRI Data Acquisition and Pre-processing
A 3T scanner equipped with an 8-channel parallel head coil (General Electric Healthcare, Milwaukee, Wisconsin) was used for the acquisition of MRI images.
A sequence of 240 volume was acquired using gradient-echo T2* -weighted MR imaging (TR = 1508ms, axial slices = 29, matrix = 64 × 64, field of view = 256mm, thickness = 4mm, inter-slice gap = 0mm).
Subjects were asked to stay awake and motionless and to keep their eyes closed during the scans.
To register and normalize fMRI images, high resolution T1-weighted sagittal images were acquired in the same session (GE sequence IR-FSPGR, TR = 6988ms, TI = 1100ms, TE = 3.9ms, flip angle = 10, voxel size = 1mm × 1mm × 1.2mm).
Data pre-processing was performed with BrainVoyager QX (Brain Innovation BV, Maastricht, the Netherlands) including slice timing correction, 3D rigid body motion correction and high-pass temporal filtering. After registration to structural images, functional images were normalized to fit the Talairach standard space using a 12-parameter affine transformation and resampled to an isometric 3mm grid covering the entire Talairach box. Finally, all volumes were visually inspected to assess the impact of geometric distortion on the final images, which was judged to be negligible for a whole-brain analysis.
For each subject, 40 independent components (ICs) were extracted with the fastICA algorithm (Hyvarinen 1999), accounting for more than 99.9% of the total variance. The number of ICs corresponds to one sixth of the number of time points (Greicius et al. 2007; Bosch et al. 2018). Among the extracted components, the one associated with the DMN was selected as the one with the highest goodness of fit (GoF) with a DMN mask1 from a previous study on the same MRI scanner with the same protocol and pre-processing (Esposito et al. 2010), where the GoF is defined as the difference between the average component score inside the mask minus the average component score outside the mask (Greicius et al. 2004; Esposito and Goebel 2011). To avoid ICA sign ambiguity, each component sign was adjusted to have all GoF values as positive (see, e. g., (Esposito et al. 2008)).
The DMN mask used for this study was originally obtained from a self-organizing group-level ICA (sogICA) analysis (Esposito et al., 2005) on a different group of subjects by applying a voxel-level threshold of p=0.05 (Bonferroni corrected for multiple voxel-level comparisons). Thus, following (Smith et al. 2009), 2009, to select (and validate) the DMN mask, the same threshold was applied to all original sogICA components and the resulting masks were applied to an externally validated (“well-matched”) DMN component templated available on line (https://www.fmrib.ox.ac.uk/datasets/brainmap+rsns/). The best matching between the sogICA component mask and the external well-matched DMN component map resulted from the highest average ICA score of the latter among all candidate masks.
2.4. Overview of the methodology
Following the pre-processing and the extraction of the DMN from each subject’s data (as described in section 2.3), the input data consist of a matrix N ×V where N is the number of subject and V is the number of voxels, and each entry (i, j) represents the contribution of voxel j to the DMN map of subject i.
First, voxels are partitioned into parcels using one of the two methodologies detailed in the following section. The parcellation is applied to the data of each subject and a representative feature (the median) is selected for each parcel. A ranking is then computed for each subject by sorting in descending order the extracted features. Finally, rankings are aggregated by class of subjects through stochastic rank aggregation (section 2.6); the goal is obtaining a subset of brain regions that share a common behaviour within classes. See Figure 1 for a graphic overview of the methodology.
2.5. Brain parcellation
Working on brain parcels instead of single voxels is convenient for many reasons. As mentioned above, brain activity is likely to span over multiple voxels. Therefore, the aggregation of several voxels in a single agglomerated feature may reduce redundancy and improve signal-to-noise ratio, and this could in turn increase the prediction accuracies of learning models.
To validate our method for the selection of relevant regions of the brain based on stochastic rank aggregation, we adopted two different approaches to brain parcellation: the former based on anatomical information and the latter based on a data-driven clustering technique.
Anatomical parcellations are derived from an atlas that defines brain regions on a template image. The one used in this work is the Automated Anatomical Labelling (AAL) atlas (Tzourio-Mazoyer et al. 2002) that consists of 90 parcels. Since this approach is data-independent, it allows an objective comparison of the results of different models.
Clustering methods can be applied after the standard pre-processing of fMRI data to segment data in groups maximizing a given measure of intra-group similarity. We adopted k-means clustering and Pearson’s correlation coefficient as measure of similarity between voxels across subjects. Moreover, to obtain more stable and reliable sets of features, clustering solutions were enhanced through consensus techniques, i.e. different clustering solutions were combined into a final clustering to improve the quality of individual data clusterings, as described in a previous work (Galdi et al. 2017). The consensus-based approach can be briefly described as follows: 1) first, several partitions of voxels are generated with k-means with random initialization, to ensure variability among base solutions; 2) a voxel-similarity matrix (consensus matrix) is built by counting how many times each pair of voxels is assigned to the same cluster across partitions; 3) a consensus clustering solution is obtained by applying a hierarchical clustering algorithm that uses the consensus matrix as a similarity matrix; 4) finally, the information contained in the consensus matrix is used to perform a further feature selection step, by retaining only a stable subset of voxels that are consistently assigned to the same groups. This is attained by applying two thresholds: σ, that defines how often two features should be clustered together to be considered a stable pair, and ν, that is the minimum number of stable pairs a voxel needs to be part of to be selected. These two thresholds are tuned on the data. In our previous work, the number of clusters (k=500) was chosen heuristically to guarantee that clusters were sufficiently big to be easily mapped to an anatomical region for a posteriori validation. After the application of the consensus-based voxel filtering, the final solution consists of 405 brain parcels. Since part of the data were used as a training set to obtain the consensus solution, the final clustering was applied on a test set to extract the features for the subsequent steps of the analysis.
To demonstrate that our consensus-based approach identified a stable solution, we generated 100 bootstrap samples of the original dataset (sampling with replacement 2/3 of the initial set of subjects) and repeated the whole procedure on each sample (von Luxburg 2010), varying the number of clusters k ∈{90, 100, 200, 300, 400, 500}. We then computed the adjusted normalized mutual information (NMI) between each pair of partitions to assess the stability of the resulting clustering solutions (Meilă 2007; Vinh et al. 2009; Vinh and Epps 2009). Here, we have chosen the adjusted version of the NMI score because it corrects the effect of agreement solely due to chance and it accounts for the fact that the NMI is generally higher for two clusterings with a larger number of clusters, regardless of whether there is actually more information shared (Vinh et al. 2009; Amelio and Pizzuti 2015).
Results are shown in figure 2, before and after the consensus-based voxel filtering. We can observe that the NMI score increases as the number of cluster increases, and the application of the consensus-based filter improves the score in every test. Note that after the application of the filter, some voxels are discarded from each partition, hence the NMI score is computed only on the intersection of voxels between each pair of partitions. Figure 3 shows the percentage of voxels that (on average) are selected by the filter (in yellow), and the (average) percentage of filtered voxels that are in the intersection between two partitions. This latter value is an indirect measure of the stability of the filtered solutions. Increasing the number of clusters in the partitions leads to a decreasing percentage of filtered voxels, indicating that the base solutions were indeed less stable. However, the consensus solution that we adopted (k=500, post-filtering) is the one with the highest NMI score, and the average percentage of voxels in the intersection is still reasonably high (∼60%).
Once voxels are segmented in parcels, a representative feature can be extracted from each group to obtain a compressed representation of the input data. On the one hand this can be considered as a dimensionality reduction step, aimed to decrease the number of variables and thus facilitating the application of more sophisticated statistical models. On the other hand, aggregating the information described by several voxels allows working at a higher level of abstraction, that of brain regions, and this has multiple advantages: a) it captures the modular organization of the brain; b) it aids the generalization of the results since clusters of voxels are built across subjects; c) features are easier to interpret because they can be reliably mapped to brain regions. Compared to other dimensionality reduction techniques (e.g. PCA), clustering produces features that can be put into bijective correspondence with voxels, retaining information useful for visualization and interpretation of the discriminative features.
The main drawback of atlas-based approaches is that they are prone to errors in the segmentation of functional regions and this might translate into a decreased sensitivity of the models. Data-driven approaches are, by contrast, more sensitive to noise and since they are not based on a priori defined anatomical regions they require a further step to map features onto an atlas to allow a biological interpretation.
2.6. TopKLists
TopKLists (Schimek et al. 2012; Schimek et al. 2015) is a stochastic rank aggregation method that, starting from an ensemble of rankings of a set of items, outputs a new ranking of a subset of the same objects. It works by estimating the rank position k beyond which the concordance among the input rankings degenerates into noise. Once k has been computed, all rankings are cut at position k, thus selecting the top k elements of each ordered list (hence the name). Finally, all the sublists of length k are aggregated through a cross-entropy Monte Carlo method (Lin and Ding 2009).
The estimation procedure for the cut point can be summarized as follows. First, a cut point is estimated for each pair of ordered lists. For each pair, an indicator vector I is built, where Ii= 1 if the item ranked i in the first list is ranked no more than δ positions away from rank i in the second list, and Ii = 0 otherwise. Namely, a value of 1 means that a given object was assigned a similar ranking in both lists. The indicator vectors I1, …, IN are assumed to be independent Bernoulli random variables, with for i < i0 and for i ≥ i, where pi is the probability that Ii= 1 and i0 is the rank position where the consensus of the two lists breaks down and noise takes over2. In other words, before position i0, the probability that Ii = 1 is above chance level; after position i0, it is equal to chance level. For each vector Ii, an estimate of i0 is computed with an iterative procedure, alternating a sequence of two steps. At each iteration, the algorithm updates the current estimate of the position i0, with the initial estimate being the first position of Ii. Odd iterations start rv positions to the left of the current estimate, while even iterations start rv positions to the right of the current estimate. At iteration sj, a sample of size v is extracted consisting of elements Ii with i comprised among the first v indices to the right of , if j is odd, or to the left of , if j is even (where is the position where the previous iteration ended). Informally, at odd (even) iterations, the sample is extracted taking v consecutive items starting rv positions to the right (left) of the current estimated position (see Figure 4 for a schematisation of these stages). The sample mean is used to estimate the consensus probability pi: for even iterations and for odd iterations . In even iterations, we move to the left by unitary steps until we reach the first point where . In odd iterations, we move to the right as long as the inequality holds. The threshold zv is defined as with , to control for moderate deviations in the degree of correspondence (Hall and Schimek 2012). The algorithm terminates when one of the following stop conditions is met: a) the algorithm enters a loop between two adjacent stages; b) for some j, . Once an estimate of i0 is computed for each pair of ordered lists, the final cut point is computed as the maximum of the estimated values.
In this work, TopKLists was applied to combine the rankings of brain parcels expressed by each subject of a class. Specifically, for each subject, the medians of the brain regions were computed and sorted in descending order, then all rankings of a class were aggregated in a single list, containing a subset of regions that were ranked similarly across subjects of the same class. This means that the concordance among rankings and subsequently the length of the final aggregate ranking are estimated using only data from subjects of the same class, hence for each class the number of retained regions can vary.
3. Results
Tables 1 and 2 report the regions selected by TopKLists for each class with the relative rankings for anatomical parcels and clusters, respectively (the number of voxels and the Talairach coordinates of the geometric mean is reported for each region).
The number of anatomical areas selected per class (30 for controls, 27 for ALS, 26 for PD) was higher than the number of selected clusters (8 for controls, 9 for ALS, 8 for PD) albeit the size of functional clusters was smaller than the size of anatomical areas. Figure 5 represents with Venn’s diagrams the overlap of regions and clusters shared among classes. There are more anatomical areas shared among all three classes than class-specific anatomical areas; conversely, there are more class-specific than shared clusters. For the ALS group, there are 5 class-specific parcels in the clustering-based ranking while there are none in the anatomical-based one.
Figures 6 and 7 show a comparison between the anatomical and the clustering-based solutions in corresponding regions, on an MRI image and on an inflated representation of the cortex. Cluster 265 and AAL area 25 (corresponding to the left medial orbitofrontal cortex) are both in the first or the second rank in all three classes (figure 6). Both parcels identify approximately the same region, but the functional cluster is smaller and has a substantial overlap with Brodmann area 10.
Based on their respective rankings, AAL region 86 (right middle temporal gyrus) can be compared with functional clusters 380, 382 and 383 (figure 7). In this case, the AAL parcel is present in the top rankings of all classes, while clusters 380 and 382 are specific for the control class, and cluster 382 is specific to the ALS class.
Figures 8 to 11 represent with a box-plot the distribution of the median of each of the selected regions across subjects of the same class (for anatomical regions in figures 8 to 10 and for clusters in figure 11), with the parcels ordered according to the rankings. In most cases, class specific regions (black box-plots) exhibit less variability than others,. In the anatomical-based solution for the control class (figure 8), the five class-specific regions occupy the highest ranks and exhibit narrow distributions; all of the remaining regions are shared among all classes except for two that are shared only with the ALS class and are present in the first half of the ranking. The only class-specific region for PD occupies the first rank, followed by the two regions shared with the ALS class (figure 9). The remaing regions are shared with all classes and show a high variability across subjects (figure 9). For what concerns the anatomical areas relative to the ALS class (figure 10), there are no class-specific regions, but the ones shared with PD and controls, respectively, are listed in the top ranks with narrow distributions. Also in this case the regions shared among all classes exhibit a high variability. Considering now the clustering-based solutions (figure 11), we can see that in general the parcels show a reduced variability compared to the anatomical regions, but this is not unexpected since the clusters are smaller and the selected parcels are fewer. Another difference with the anatomical approach is that class-specific parcels occupy the second half of the ranking while the clusters shared by all classes (265, 98 and 182) are in the top ranks.
To assess the statistical significance of the results, we first computed the modified Kendall tau distance3 (Lin 2010) between the ranked lists of regions selected for each pair of conditions, to have a quantitative measure of how different the respective solutions are. We then estimated the empirical distribution of chance by repeating our analysis using 1000 random permutations of the phenotypes (i.e. we randomly assigned each subject to one of the three groups, keeping everything else the same) and computing the Kendall tau distances again for each of the permutations. Finally, we computed the p-value for each comparison by counting how many times the measured distance was greater in the random permutations than in the original analysis. Figure 12 shows the empirical distributions computed for each comparison together with the Kendall tau distance obtained in the original analysis. Multiple comparison correction was performed with the Benjamini-Hochberg procedure, separately for the results based on the anatomical parcellation and for those based on clustering. Significant differences (corrected p-value<0.05) were detected when comparing Controls vs. ALS and Controls vs. PD when using the anatomical parcellation, and ALS vs. PD when using clustering. Please note that the value of the Kendall distances computed between sets of anatomical areas and between clusters are not directly comparable, since they are defined in two different spaces (the 90 anatomical areas and the 405 clusters, respectively).
The stability of the results was assessed by repeating the same analysis on subgroups of subjects from each condition, specifically 1000 bootstrap samples were generated by sampling with replacement 2/3 of the initial groups of subjects. The similarity between two sets of regions selected for two subgroups from the same condition was measured with the Jaccard index, that is defined as the size of the intersection divided by the size of the union of the sets and is comprised between 0 (no overlap) and 1 (complete overlap). A final score per condition was computed as the average Jaccard index across all the comparisons. Since varying the initial set of subjects results in a different clustering for each subsample, when comparing sets of clusters the Jaccard index was computed between the sets of voxels constituting the clusters included in the solutions. Moreover, to verify if clusters were lying in the same anatomical regions, we identified the AAL parcels containing the clusters and computed the Jaccard index between sets of parcels. Results are reported in Table 3. A Wilcoxon rank-sum test indicated that all pair-wise differences were significant (p<0.001).
4. Discussion
A novel framework based on clustering and stochastic rank aggregation has been evaluated using DMN maps from resting-state fMRI scans of ALS and PD patients and of healthy controls. As an alternative to clustering, a purely anatomical definition of brain parcels to extract regional DMN features for ranking was also considered. Since our goal was to demonstrate the utility of the proposed approach, we deemed out of the scope of this paper to extend the analysis to other resting state networks.
While in the clustering-based analyses about 2% of the clusters (8 or 9 out of 405) were selected in the final rankings, in the case of anatomical defined areas up to one third of the parcels (30 out of 90) are part of the solution. One reason for this might be that anatomical areas are bigger and fewer compared to clusters and since the median is used as representative feature this might flatten the differences across subjects resulting in more conforming rankings. This would also explain why more anatomical parcels are shared in the rankings across classes. Indeed, observing the Venn diagrams in figure 5, most of the anatomical areas are shared among the three classes, while there are more class-specific regions when using clusters. Another aspect to consider when interpreting the overlapping among selected regions is that for each subject the input data correspond to the independent component that best fits a template of the DMN. Therefore, it is reasonable to expect activations in approximately the same regions. It should also be noted that the number of overlapping regions is likely influenced by the size and number of input parcels. Indeed, working with larger (fewer) parcels might smooth out the underlying signal thus masking differences between classes and consequently resulting in more overlapping regions. However, the aim of aggregating rankings by class is precisely that of revealing patterns that are disease-specific: differences might be subtle and undetectable with a standard parametric test, but within-class coherence can provide further insights into disease-dependent alterations. Although not all between-group differences were significant, the results of permutation testing (figure 12) confirmed that the proposed approach can indeed highlight differences between conditions, thus providing a new method to search for a set of regional features characteristic of a specific neurological condition, i.e. what is usually called a neuromarker.
It is interesting to observe that the selected anatomical parcels cover most of the DMN, that is the spatial component that was extracted by ICA, meaning that although not very selective, the approach based on the anatomical parcellation led to the subset of regions that contribute to this network. As mentioned above, clusters are smaller than anatomical parcels and might therefore unveil differences at a finer granularity. Indeed, the example of figure 7, where a region included in the anatomical solutions of all three classes is compared to two clusters that are specific for the control group and one cluster that is specific for the ALS class, suggests that working with smaller regions might bring to light slight differences that are not evident at a higher level, because they are averaged out when considering a larger region. Furthermore, the fact that the clustering-based solution was able to detect a difference between the ALS and PD conditions, while the selected anatomical areas identified a contrast only when comparing patients to controls, could indicate that the data-driven approach is more sensitive to variations in the underlying signal that can differentiate pathological conditions. This could be explained considering that the clustering is built on functional data, while the anatomical parcellation is based on a priori information, hence specific aspects of the neurodegenerative conditions might be lost in this representation. Nevertheless, most of the clusters in the solution are included in one of the anatomical parcels (see last column of Table 2), often with a similar ranking between the two approaches, demonstrating the consistency of the results.
If we study in detail the clusters selected for each class, we can observe that clusters 86, 95 and 160, that are class specific for PD, all lie in Brodmann area 40, in the inferior parietal cortex, that has been shown to be relevant for this disease in previous fMRI studies: in Tessitore et al. (2012b) a decreased functional connectivity of the bilateral inferior parietal cortex in the DMN was observed in patients with PD; the results of Amboni et al. (2015) suggest that a functional disconnection of the frontoparietal network could be associated with mild cognitive impairment in PD. Another class specific cluster for PD is cluster 297 in Brodmann area 39 in the medial temporal lobe, whose relation with PD has been investigated in Gorges et al. (2013) and Tessitore et al. (2012b), where a decreased connectivity in the DMN was observed between this area and the posterior cingulate cortex, and with the prefrontal cortex, respectively.
Clusters 98 and 265, that are high-ranking in all three classes, lie in Brodmann area 10 in the prefrontal cortex, a region related to both ALS and PD: a weaker connectivity of the prefrontal region was observed in ALS patients in the DMN (Mohammadi et al. 2009; Agosta et al. 2013) and in the salience network (Trojsi et al. 2015); a deactivation of the medial prefrontal cortex was measured in PD patients in the DMN (Van Eimeren et al. 2009; Gorges et al. 2013) and in the fronto-parietal network (Amboni et al. 2015); Tessitore et al. (2012a) showed that PD patients with freezing of gait present reduced functional connectivity within the executive-attention network in the middle frontal gyrus. Considering the five clusters that are specific to the ALS class, we observe that clusters 381 and 383 are located in the temporal lobe; cluster 314 is included in Brodmann areas 18 (in the occipital lobe) and 7, in the left precuneus, that has been previously associated to ALS in the study by Agosta et al. (2013), where this region exhibited enhanced connectivity in the DMN; cluster 223 covers Brodmann areas 23 and 24, corresponding to the posterior and anterior cingulate cortex, mentioned in Tedeschi et al. (2012), where the DMN showed a disease-by-age interaction in the posterior cingulate cortex, and in Mohammadi et al. (2009) where the DMN showed less activation in ALS patients compared to controls; cluster 425 lies in Brodmann area 8 on the middle frontal gyrus, that has been associated with cognitive deficit in ALS patients in a PET study by Wicks et al. (2008), while in Terada et al. (2016) the gray matter volume measured within the right middle frontal gyrus in ALS patients was significantly lower than in healthy controls. If we observe the clusters that are class specific for control subjects, two of them (clusters 380 and 382) are on the middle temporal gyrus and one (cluster 386) on the fusiform gyrus. Both these regions are mentioned in studies on cortical thickness that investigated healthy aging as opposed to neurodegenerative disorders: in a work by Convit et al. (2000), the volumes of the fusiform gyrus and the middle (and inferior) temporal gyrus are shown to predict decline to Alzheimer’s disease (AD) in non-demented elderly; while in a work of Hänggi et al. (2011), the volume of the right middle temporal gyrus revealed promising diagnostic values to distinguish AD from mild cognitive impairment. Another study (Huettel et al. 2001) investigated the aging-related changes of the haemodynamic response in regions surrounding the fusiform gyrus. Finally, three clusters (182, 197 and 341) fall in regions of white matter and might therefore be resulting from noise.
From the point of view of the stability, the results based on the anatomical parcellation appear to be more stable (Table 3), however this is not unexpected for two reasons: first, the anatomical parcellation is the same across subsamples, while the clustering-based solutions, for their data-driven nature, change with every different subsample; second, as mentioned above, the selected anatomical areas tend to cover up to one third of the total number of parcels, while the selected clusters cover only about the 2% of the total number of clusters, hence a larger overlap is more likely with the anatomical areas. Additionally, considering the high-dimensional nature of the input data, the stability of the clustering-based results is also affected by the limited sample size; this issue could however be alleviated using larger data sets. Another aspect that might negatively affect stability is the choice of the clustering algorithm used to generate the initial parcellation; we adopted k-means but other methods might find more robust brain clusters (Thirion et al. 2014). Nevertheless, the bottom row of table 3 shows that although the selected voxels are not exactly the same across different subsamples of subjects, about a half of the identified anatomical regions are shared among solutions. Indeed, when evaluating clustering results, the most relevant aspect to consider is whether the information represented by clusters is consistently preserved, and the fact that the anatomical areas containing clusters in each subsample are overlapping is an indirect measure of how stable the information content of clusters is. It should be also pointed out that the essential role of the consensus clustering stage in the presented approach is to improve the selection of data entering the rank aggregation and the TopKLists algorithm, rather than provide a new (and stable) data-driven parcellation to be proposed as a generalizable alternative to the predefined anatomical parcellation. Indeed, as illustrated above, the use of clustering within the proposed pipeline increases the sensitivity of the rank aggregation method specifically in detecting (expectedly subtler) differences between two different pathological conditions, and such evidence remains valid even if the parcellation itself obtained via clustering is not very stable across possible subgroups and cannot be as stable as a pre-defined parcellation. In other words, the role of clustering in the presented approach (i.e. data selection for ranking) is somewhat similar to the role of principal component analysis (PCA) for subspace selection in group-level factor analyses. When performing factor analyses of high-dimensional data, the data are sometimes projected to a low-dimensional subspace (e.g. the first k principal components) and then the PCA scores are used for the factorial analyses. However, if repeating the same analysis on subgroups (e.g. to see whether the same group effects are significant or not) it is not necessarily relevant to see how similar the subspaces of different subgroups would be, especially if one would be interested to see how the group effects of interest (e.g. the difference between conditions) remain significant or not in each bootstrap sample.
Another interesting result of the stability analysis is that, regardless of the type of analysis, the ALS solutions are the most stable; this could suggest that the pattern of the activation within the DMN for this condition has a more distinctive signature and as a consequence there is greater consistency when comparing results between subsamples.
We also tried to validate our results using Neurosynth (Yarkoni et al. 2011). Note that these results are however limited to the terms/documents currently available in the Neurosynth database. For example, there are only 15 studies on ALS included in the text corpus, as opposed to the 104 available on PD. Moreover, the term ‘amyotrophic lateral sclerosis’ is not included in the list of features available for term-based meta-analysis. For this reason, we restricted part of the analyses to PD only. Since we used Talairach space and Neurosynth is based on the MNI space, we converted our coordinates using the same transformation used in the Neurosynth software. We performed two distinct analyses. The first was the reverse-inference analysis, to see if the coordinates of the selected regions per each class were associated to terms relevant to the studied conditions by previous literature (according to Pearson’s correlation between z-score activation maps, with FDR-adjusted p-value<0.05). This was done for all the terms available in the database, and for 200 topics derived from the terms applying Latent Dirichlet allocation (Poldrack et al. 2012). Table 4 and 5 report the results for terms and topics, respectively.
From table 4 we can observe that the retrieved terms are more generic than disease-specific. They mostly refer to brain regions (e.g., precuneus, posterior cingulate, parietal cortex), ICA (e.g., component, independent componenct, ica), the default mode network (e.g., default, default mode, dmn), and functions associated with the default mode network (autobiographical, episodic memory, recollection, memory retrieval, beliefs) (Buckner et al. 2008). This replicates also in the topic-based analysis (table 5) for the controls and ALS classes, where the retrieved topic can all be associated to the DMN and its function (3_memory_retrieval_episodic, 14_network_default_dmn, 10_judgments_judgment_referential). However, for the PD class, there are some additional terms and topics not shared with the other two classes, referring to executive functions (attention, calculation, control, working memory, numbers, 116_number_numerical_arithmetic), inhibition (37_inhibition_response_inhibitory), and visuospatial orienting (185_spatial_location_orienting). All these three domains have been previously linked to PD (Monchi et al. 2007; Beato et al. 2008; Possin et al. 2008; Pereira et al. 2009; McKinlay et al. 2010; Disbrow et al. 2013; Dirnberger and Jahanshahi 2013; Jahanshahi et al. 2015; Di Rosa et al. 2017).
A second analysis consisted in computing a map of activations derived from the meta-analysis of studies associated with the term “Parkinson” (this was not possible for ALS). We then looked at the overlapping between the regions selected with our approach (1315 activations) and the 4135 activations resulting from the meta-analysis. Although the voxels in the intersection were only 19, all activations were found in Brodmann area 40 (inferior parietal cortex), that is the region in which three out of the four clusters specific to the PD class were found, and that was reported to be overactive in PD patients performing motor tasks (Samuel et al. 1997; Hanakawa et al. 1999).
5. Conclusion
We presented a novel methodology to automatically detect regions that show a common behaviour in a class of subjects. Looking for commonalities instead of differences between groups is advantageous because the actual differences might be masked by noise of different origins. This approach takes into account both inter-subject variability and noise by excluding from the analysis brain parcels whose patterns of activation are incoherent across subjects of the same diagnostic group, while retaining regions for which a sufficient degree of consensus exists. The further advantage of combining this technique with a clustering-based parcellation is that of making the whole approach data-driven and hence more sensitive to subtle variations in the data that are however consistent across subjects of the same class. In this work, the proposed methodology was applied on DMN maps derived from resting-state fMRI acquisitions of healthy controls, PD and ALS patients, and results are consistent with previous literature, thus indicating that this approach might be a suitable method to search for a set of regional features characteristic of a specific neurological condition, i.e. what is usually called a neuromarker, which is a fundamental step in the development of fully automated data-driven techniques to support early diagnoses of neurodegenerative diseases.
Nonetheless, the same framework can be applied with parcels derived from pre-existing brain atlases. As a proof of concept, we showed an example of application of our method in combination with the AAL atlas, because it is the most commonly adopted in literature and our goal was to facilitate the comparison of findings. Our results suggest that subtler differences can be uncovered when using smaller, more localized regions of interest: compared to the results obtained with the anatomical parcellation, the regions identified through clustering were able to highlight differences between neurological conditions better than between healthy controls and patients. However, these results are not conclusive, and further experiments comparing a range of healthy and neurological conditions (as well as statistal tests on new data) would be needed to confirm that this holds in general. In addition, to prove the superiority of a data-driven parcellation over a predefined atlas it would be necessary to replicate this analysis on other datasets and to extend the study to other atlantes. A promising candidate would be the multi-modal brain parcellation recently proposed by Glasser et al. (2016), yet this approach requires different imaging modalities that were not available for the dataset under study.
Another limitation of this study is that we did not explicitly explore the effect of the resolution of the brain parcellation on the stability of the final rankings, although we partly addressed the effect of the choice of the parcellation on the outcome of the analysis. While it is likely that working with larger regions (fewer parcels) would increase the stability of the final rankings as in the case of the anatomical parcellation, this is not necessarily an advantage since averaging the signal over a large parcel might flatten some of the differences that might be relevant to characterize different neurological conditions. A systematic study on the effect of the number of parcels in fMRI data analyses has previously shown that the number of clusters (k) can greatly affect the accuracy and reproducibility of the results and that k=200 is a lower bound to optimally represent the variability of functional information in fMRI data (Thirion et al. 2014).
In future work, it would be interesting to apply the same method to task fMRI data, where it is easier to detect activation loci and more prior knowledge is available to define regions of interest. This would allow observing how the rankings of regions vary between healthy subjects and patients in the performance of specific tasks.
Conflict of Interest
The authors declare that they have no conflict of interest.
Information Sharing Statement
Source code (Matlab and R scripts) implementing the methods described in this paper, as well as the preprocessed (and anonymized) group data sets used in the analyses, are available on GitHub (https://github.com/paola-g/STRAIN).
Footnotes
↵1 The template binary mask was computed as the mean DMN map of a separate population of control subjects.
↵2 A second assumption is that pi decreases when i increases.
↵3 The original version of the Kendall tau distance simply counts the number of pairwise discordances between the two ranked lists defined on the same sets of objects; the modified version can deal also with partial lists.