Abstract
Background Schizophrenia (SZ) and bipolar disorders (BD) share substantial neurodevelopmental components affecting brain maturation and architecture. This necessitates a dynamic lifespan perspective in which brain aberrations are inferred from deviations from expected lifespan trajectories. We applied machine learning to diffusion tensor imaging (DTI) indices of white matter structure and organization to estimate and compare brain age between patients with SZ, BD and healthy controls.
Methods We obtained DTI data from patients with SZ (n=648), BD (n=185) and healthy controls (n=990) across 10 clinical cohorts. We trained six cross-validated models using different combinations of DTI data from 927 controls, and applied the models to estimate individual brain ages in the test sets. We assessed group differences using linear models, accounting for age, sex and scanner.
Results 10-fold cross-validation revealed high accuracy for all models. Compared to controls, the model including all feature sets significantly over-estimated the age of patients with SZ (d=.29) and BD (d=.15), with similar effects for the other models. Meta-analysis converged on the same findings. Fractional anisotropy (FA) based models were more sensitive than models based on other metrics. Using a reduced set of global features instead of regional features revealed converging results.
Conclusions Brain age prediction based on DTI provides informative and robust proxies for brain white matter integrity and health. Our results further suggest that white matter aberrations in SZ and BD primarily consist of anatomically distributed deviations from expected lifespan trajectories that generalize across cohorts and scanners.
Introduction
Schizophrenia (SZ) and bipolar (BD) spectrum disorders are severe mental disorders with partly overlapping clinical characteristics and pathophysiology. Both are highly heritable (1) with a substantial neurodevelopmental aetiology (2, 3). Along with evidence of accelerated age-related brain changes in adult patients with SZ (4–6) the neurodevelopmental origin supports a dynamic lifespan perspective in which genetic and biological factors interact with age-related environmental and physiological processes.
Aberrant myelination and brain wiring during adolescence has been included among the neurobiological features of severe mental disorders, and white matter (WM) aberrations have been documented before disease onset (7–11). Brain imaging has shown that normative WM development follows a characteristic non-linear trajectory with peak maturation around the third or fourth decade (12–14). Compared to healthy controls, adult patients with SZ or BD exhibit anatomically distributed group-level differences in various diffusion-based indices of WM structure (15, 16).
Supporting a neurodevelopmental origin, it has been demonstrated that patients with adolescent-onset SZ show WM aberrations (17), and that their developmental trajectory is altered and delayed (18) compared to age-matched normal developing peers. Further, children and adolescents with increased symptom burden, albeit presumably at subclinical levels, were found to exhibit altered diffusion based WM properties compared to peers with low or no symptoms of mental distress (19), highlighting a critical role of WM development in mental health in youths. To which degree group differences observed between adult patients and healthy controls accelerate during the course of the adult lifespan is unclear. The neurodegenerative account of schizophrenia and severe mental illness is debated (20) and lacks unequivocal support from imaging studies (16, 21), but some studies have suggested stronger age-related deterioration of the brain in patients compared to controls (22, 23).
Despite converging evidence of case-control differences both preceding and following disease onset, recent brain imaging studies have documented substantial heterogeneity within patient groups (24, 25). In contrast to conventional group level analyses, brain age prediction using machine learning on imaging features allows for brain-based phenotyping at the individual level, and enables an efficient dimensionality reduction of the neuroimaging data into one or more biologically informative summary measures (26, 27). The discrepancy between an individual’s chronological age and predicted brain age, sometimes referred to as the brain age gap (BAG), has been found to be higher in patients with SZ (5, 28, 29) and several other brain disorders (29). However, these previous studies have exclusively used brain grey matter features for brain age prediction. Thus, given the well-documented role of WM aberrations in patients with mental illness (15, 30–32), brain age prediction based on diffusion imaging is clearly warranted.
In order to fill this current gap in the literature, we here compared individual BAGs between patients diagnosed with SZ, BD, and HC using four conventional metrics (fractional anisotropy (FA), mean diffusivity (MD), radial diffusivity (RD) and axial diffusivity (AD)) obtained from diffusion tensor imaging (DTI). We used an independent training set comprising healthy individuals (n=927, aged 18.00-94.96 years) and applied the resulting model in our test sample including patients with SZ (n=648), BD (n=185) and HC (n=990) from 10 independent cohorts. In order to specifically assess the robustness and quantify the heterogeneity of effects across cohorts we adopted a meta-analytic statistical framework.
Since the different DTI-based metrics carry partly independent biological information (33–35), we trained six different models based on various combinations of the DTI metrics, which allowed us to compare prediction accuracy and subsequent group differences between the metrics. Based on converging evidence of widespread WM aberrations in patients with severe mental disorders (15), we hypothesized higher BAG in patients with SZ and BD compared to HC, with stronger effects in SZ compared to BD. To test the relevance of the varying spatial resolution of the feature sets, which is important to inform the discussion regarding the anatomical specificity of brain WM aberrations, we compared models including various atlas-based tracts of interest (TOIs) with models including only global features. Based on previous studies comparing the prediction accuracy and sensitivity between metrics (16, 27), we hypothesized that FA would enable both high age prediction accuracy and sensitivity to group differences, but remained agnostic concerning the additional value of the remaining features.
Materials and methods
We combined diffusion MRI data from 2750 individuals from 11 sites/studies across 10 different scanners. Figure 1A, Supplementary Figure S1-S2, and Supplementary Table S1-S2 summarise key demographics per cohort. Supplemental Table S3 summarizes the MRI systems and diffusion acquisition protocols.
The dataset was split into a training set and a test set. Supplemental Figure S2 shows the age distribution within each cohort in the training set. Briefly, the training set consisted of 927 HC covering the full adult lifespan (mean age=53.81, s.d.=18.38, range 18.00-94.96 years). The test set comprised 990 HC (mean age=34.70 s.d.=11.24, range=17.52-68.97), 185 patients with BD (mean age=33.12, s.d.=10.53, range=18.40-64.48) and 648 patients with SZ (mean age: 34.49, s.d.=11.40, range=18.00-66.00).
MRI acquisition and processing
A summary of MRI acquisition protocol for each cohort is presented in Supplementary Table S3. Imaging analyses were performed using the Oxford Center for Functional Magnetic Resonance Imaging of the Brain (FMRIB) Software Library (FSL) (36–38). To correct for geometrical distortions and eddy currents all cohorts were processed using eddy (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/eddy) (39, 40). The two cohorts (TOP1 and TOP2) which had collected blip-up/blip-down sequences were additionally processed using topup (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/topup) (36, 41) prior to eddy. Using an integrated framework along with correction for susceptibility induced distortions, eddy currents and motion eddy detects and replaces slices affected by signal loss due to bulk motion during diffusion encoding (40).
Fitting of the diffusion tensor was done using dtifit in FSL, yielding conventional DTI metrics, including fractional anisotropy (FA), and mean (MD), radial (RD) and axial (AD) diffusivity. FA, MD, RD and AD maps were further processed using tract-based spatial statistics (TBSS) (42). FA volumes were skull-stripped and aligned to the FMRIB58_FA template supplied by FSL using nonlinear registration (FNIRT) (43). Next, mean FA were derived and thinned to create a mean FA skeleton, representing the center of all tracts common across subjects. We thresholded and binarized the mean FA skeleton at FA>0.2. The procedure was repeated for MD, AD and RD. For each individual, we calculated the mean skeleton FA, MD, AD and RD, as well as mean values within 23 regions of interest (ROIs, Supplemental Table S4) based on two probabilistic white matter atlases provided with FSL, (i.e. the CBM-DTI-81 white-matter labels atlas and the JHU white-matter tractography atlas (44–46)). In total, we derived 96 DTI features per individual including the mean skeleton values.
Quality assessment
Subjects with poor image quality due to subject motion or other visible image artefacts (e.g. due to metal) were removed. Additionally, we employed a multistep quality assessment (QA) procedure (16) that included maximum voxel intensity outlier count (MAXVOX) and tSNR (47) prior to statistical analyses. In short, manual inspection of the flagged datasets after QA suggested adequate quality. Thus, we present results on the full dataset with supplemental results from a stringent QA (see (16) for additional information).
Brain age prediction
We trained six models for age prediction. Our main model included all 96 features across all DTI metrics. To assess metric specificity, we trained four additional models based on all ROIs for each DTI metric (FA, RD, MD or AD). To test the value of including regionally specific information we trained an additional model with only the global mean skeleton feature from all four metrics included.
The following pipeline for brain age prediction was identical for all six models. We used the xgboost framework in R (48) to build the prediction model. The number of rounds (nround), maximum depth (max_depth) and subsample were tuned and optimised using a 5-fold cross validation of the training data, with early stopping if the prediction errors did not improve for 20 rounds. The learning rate (eta) was pre-set to eta=0.01. Besides the default setting, the following parameters were used in the model: nround=1400, max_depth=14.
Prior to implementing the model, we regressed out the main effect of scanner in the entire dataset while accounting for age, age2, and sex using linear models in R (49). To estimate the reliability of our age prediction model, we used a 10-fold cross-validation procedure within the training sample and repeated the cross-validation step 100 times to provide a robust estimate of model predictive accuracy. Within the same procedure, we tested the performance of our trained model by predicting age in unseen subjects in the test sample. By applying the model to the test sample 100 times we obtain both a mean estimate and an estimate of uncertainty. We then calculated the correlation between the predicted (mean across 100 iterations) and the chronological age as a measure of model performance, in addition to the mean absolute error (MAE, in years) and root mean square error (RSME). For each individual, we calculated the discrepancy between estimated and chronological age, i.e. the BAG. Based on recent recommendation (50) we regressed out the main effect of age on BAG using linear models in R, yielding a residualized BAG (BAGR) used to calculate MAE and RMSE, and for group comparisons across cohorts.
Statistical analyses
Statistical analyses were performed using R (version 3.3.3 (2017-03-06)(49)). We tested for main effects of diagnosis using linear models with BAGR as dependent variable and group, sex and site as independent variables, and performed pairwise group comparisons as appropriate. Using the metafor package (51) in R we adopted a meta-analytic framework in order to assess the heterogeneity and generalizability of the results. A random-effects model was used to weigh the primary studies prior to aggregating the effect size. Effect sizes were aggregated using the estimated marginal means of BAG from each group contrast (HC/SZ, HC/BD and BD/SZ) accounting for age, age^2 and sex. For effect size estimates we used Hedges’ g. Cochran’s heterogeneity statistic Q was used to test the homogeneity of effect sizes. A χ2 test with k-1 degrees of freedom was used to examine the significance of Cochran’s Q. The heterogeneity was quantified using the I2 statistic, which is sensitive to the degree of inconsistency in results between cohorts.
Results
Brain age predictions
Age prediction in the training set using 10-fold cross validation revealed high correlations between chronological and predicted age for the main model including all features (r=.855, 95% CI: .845-.865, MAE=7.28, RMSE=9.37).
Figure 1B shows predicted age plotted as a function of chronological age for the test set when using the full feature set, and Table 1 summarizes prediction accuracy for all six models. Age prediction accuracy for the full model was high in HC (r=.593, MAE=7.98, RMSE=10.1), and in patients with BD (r=.576, MAE=8.89, RMSE=11.4), and SZ (r=.553, MAE=9.47, RMSE=12.00). While all models performed relatively well, prediction accuracy was highest for the full model, and the global mean skeleton model outperformed the ROI based single-metrics models.
Group differences in BAGR
Figure 1C shows the distributions of BAGR within each group, and Table 1 and Supplemental Figure S3 summarize the results from the group comparisons. Briefly, all models revealed significant main effects of group, with higher BAGR in patients with SZ and BD compared to HC. The FA model yielded strongest effect size for the main group effect, although the full and mean skeleton models in addition to FA, MD and RD models revealed similar and converging patterns. All analyses revealed higher BAGR in BD and SZ compared to HC, with effect sizes ranging between d=0.1 and d=0.34. The model based on AD revealed less consistent results, and was the only model not showing significant group differences between SZ and HC.
Meta-analysis and heterogeneity in effects between cohorts
Figure 2 shows a forest plot summarizing the results from the meta-analytical approach for the full model. Supplemental Figures S4-S8 show results from the other models. In short, the results revealed significantly higher brain age gap in SZ and BD compared HC with moderate effect sizes. The analysis did not support a group difference in brain age gap between BD and SZ. Whereas the effect sizes varied slightly between cohorts for the full model, the Q and I2 statistics indicated low and non-significant heterogeneity. Figure S9 shows each cohort’s contribution to the heterogeneity and influence on the result from the meta-analysis.
Quality control
Figure S10 summarises the results from multistep QA. Briefly, higher BAGR was observed in SZ and BD compared to HC across all levels of QA, with highly similar effect sizes.
Discussion
The aetiology of severe mental disorders has a substantial neurodevelopmental component, which is amongst other characteristics reflected in altered brain maturational trajectories during the formative years of childhood and adolescence, and as group-level differences in adult patient populations. Along with evidence of genetic and clinical overlap with several aging-related conditions, including cardiovascular risk factors and increased mortality, the neurodevelopmental account supports the need for a dynamic lifespan perspective in the search for disease mechanisms. Here, in ten different cohorts comprising healthy controls and patients with SZ and BD, we used machine learning to estimate the brain age using DTI based indices of white matter structure and organization. This novel approach yielded five main results. First, in a large independent training set we found high accuracy of brain age prediction across the adult lifespan using DTI features, supporting the feasibility and sensitivity of the approach. Second, applying the model to an independent test set revealed significantly higher brain age gap in patients with SZ and BD compared to HC. Third, follow-up meta-analysis and tests of heterogeneity suggested high consistency across independent cohorts and scanners. Fourth, brain age models based on FA showed higher sensitivity than models based on the other metrics, both alone and combined. Finally, the reduced set of global mean skeleton features compared to a number of regional atlas-based features revealed highly converging results. We next discuss the implications of these findings in more detail.
Brain age prediction provides an informative summary measure that may serve as a proxy for brain integrity and health across normative and clinical populations. Neuroimaging derived white and grey matter phenotypes carry distinct biological information of brain integrity, and tissue-specific brain age models may provide higher sensitivity and specificity to relevant biological processes compared to conventional models based on grey matter features alone (27). DTI has been broadly applied in clinical neuroscience research due to its proposed sensitivity to microstructural properties of brain tissue. However, whereas previous studies have documented higher brain age in patients with severe mental disorders, these were based on grey matter models only (5, 28, 29). In order to test if previous findings suggesting clinical deviations from normative grey matter trajectories generalize to white matter, we performed brain age prediction using different combinations of DTI based metrics. In line with previous findings (27) we obtained high age prediction accuracy across most models. Supporting previous evidence suggesting that regional DTI based indices of brain aging reflect relatively low-dimensional and global processes (12, 52), we found similar age prediction accuracy for the reduced models comprising global mean skeleton values only compared to the model including the extended set of regional features. Although brain aging shows some regional heterogeneity, these findings demonstrate that the most relevant information required for brain age prediction is captured at the global level. This conjecture is also supported by a recent twin study demonstrating that a large proportion of the estimated heritability of specific tracts is accounted for by a global factor (53).
Likewise, we found that the sensitivity to group differences was not strongly dependent on the inclusion of the full feature set. Indeed, the effect size obtained when comparing patients with SZ and HC were slightly higher for the global mean skeleton model compared to the full model. These findings are in line with recent evidence of anatomically widely distributed group differences between healthy controls and patients with SZ (15). Interestingly, the largest effect when comparing SZ and HC was obtained for the FA only model, supporting the sensitivity of FA to clinical differences in WM properties (15, 16). Higher predicted brain age in the patient groups compared to healthy controls may indicate accelerated brain aging in patients with severe mental disorders. However, our cross-sectional design does not permit us to make any inference about brain aging per se, and previous reports of relatively age-invariant group differences in brain volumetry (21) and DTI indices (16) suggest that the reported group differences in brain age may in fact reflect differences accumulating already early in life. Unfortunately, due to the current study design with adults only we cannot address the maturational trajectories in the formative years. Although the application of diffusion MRI as the basis for age prediction is novel, higher gray matter brain age has been shown in several brain and mental disorders (29, 54). We extend these previous findings by documenting higher DTI based white matter brain age in both SZ and BD, and, although with moderate effect sizes, we show that the effects are relatively robust across cohorts and scanners, with only minor heterogeneity in effect sizes between cohorts.
We found no significant difference in BAGR between BD and SZ, supporting previous evidence of partly overlapping clinical and biological characteristics between these two diagnostic categories (16, 55, 56). While the current results support the existence of a common set of mechanisms across disorders, future studies utilizing a broader range of imaging modalities in combination with specific genetic, clinical (symptoms, cognitive function etc) and biological phenotypes may allow for the identification of specific diagnostic signatures and sub-groups. However, inherent limitations associated with the classical case-control design in mental health research have recently been emphasized using neuroimaging data (24, 25). In particular, the current lack of biologically informed diagnostic criteria should motivate future studies to consider alternative approaches to promote a novel clinical nosology based both on symptomatology and data-driven clustering (57) as well as brain-based and biological phenotypes cutting across diagnostic boundaries.
Our results document robust group level deviances in white matter structure manifesting as older-appearing brains in patients with severe mental disorders compared to their healthy peers. Whereas DTI based markers are sensitive to a range of different biological and anatomical characteristics, the current specificity does not allow for inference on the distinct neurobiological mechanisms involved. Myelin integrity and myelin packing density are among the proposed candidate mechanisms for observed changes in DTI metrics (34, 58, 59), but the specificity is low, and the current results probably reflect a combination of different neurobiological processes and macroanatomical differences. Previous evidence has implicated myelin-related abnormalities and neuroinflammation both in the pathophysiology of severe mental disorders and in brain aging (60–63). Future studies may benefit from the inclusion of advanced diffusion based models based on multi-shell diffusion MRI allowing for stronger inference on the microstructural milieu of the brain tissue, including microstructural indices based on different diffusion scalar metrics (e.g., Neurite Orientation Dispersion and Density Imaging (NODDI) (64, 65), diffusion kurtosis imaging (DKI) (66), white matter tract integrity (WMTI) (67) and restriction spectrum imaging (RSI) (68)).
In line with previous findings of widely distributed effects in well-powered studies of brain aging (12) and schizophrenia (15), we found similar age prediction accuracy and subsequent group differences in brain age for the model including only global mean skeleton values and the model including a range of regional informative values extracted from various atlas-based tracts and regions of interest. Although specific symptoms and clinical traits may map preferentially onto specific neuroanatomical subsystems (see e.g. (19)), these novel results suggest that a large proportion of the relevant variance associated with age and corresponding deviations in the patients groups are captured by primarily global brain processes, with relevance for our understanding of the anatomical heterogeneity and dimensionality of brain aging and severe mental illness.
In addition to the anatomical distribution of effects, the spatiotemporal dynamics of brain development and aging and their deviations in patients with mental disorders remain unclear. The individual level onset and rate of the group-level deviations from the normative white matter trajectory is unknown and can only be inferred using longitudinal designs covering sensitive periods of neurodevelopment. Previous studies have shown both delayed neurodevelopment during adolescence (18) and accelerated aging in adulthood (5) in patients with severe mental disorders. Whereas these observations are not mutually exclusive, future studies should aim at disentangling the lifespan dynamics, e.g. by including individuals with a wider age-range and pursuing longitudinal designs in individuals across a wide range of functional levels and risk. The latter may be particularly pertinent to disentangle primary disease-related mechanisms and secondary factors related to the disease, including medication and life-style factors such as nutrition, physical activity, education etc. Unfortunately, although possible effects of psychotropic drugs on the brain is a topic of great interest and importance (69–71), in common with other studies employing a cross-sectional and non-randomised design the current design does not allow us to make inference about the effects of medication and other clinical and lifestyle factors on brain age, which should be investigated by future and properly designed studies. Meanwhile, previous studies reporting associations with medication status in smaller samples need to be interpreted in light of the recent lack of significant associations in the largest DTI study to date (15).
In conclusion, in this multi-sample study including patients from 10 different cohorts we report higher brain age in patients with SZ and BD compared to HC using various DTI-based indices of white matter structure and organization. Although the effect sizes were modest, our unique design allowed us to specifically quantify the heterogeneity and robustness of effects across cohorts and scanners, supporting that brain age prediction using diffusion MRI is a sensitive marker in the clinical neurosciences.
Acknowledgements
This work was funded by the South-Eastern Norway Regional Health Authority (2014097, 2015073, 2016083, 2016044), the Western Norway Regional Health Authority (911820, 911679),the Research Council of Norway (213700, 204966, 249795, 223273, 213727), KG Jebsen Stiftelsen and the European Commission’s 7th Framework Programme (#602450, IMAGEMEND). KaSP was supported by grants from the Swedish Medical Research Council (SE: 2009-7053; 2013-2838; SC: 523-2014-3467), the Swedish Brain Foundation, Åhlén-siftelsen, Svenska Läkaresällskapet, Petrus och Augusta Hedlunds Stiftelse, Torsten Söderbergs Stiftelse, the AstraZeneca-Karolinska Institutet Joint Research Program in Translational Science, Söderbergs Königska Stiftelse, Professor Bror Gadelius Minne, Knut och Alice Wallenbergs stiftelse, Stockholm County Council (ALF and PPG), Centre for Psychiatry Research, KID-funding from the Karolinska Institutet. Data collection and sharing for this project was provided by the Cambridge Centre for Ageing and Neuroscience (CamCAN). CamCAN funding was provided by the UK Biotechnology and Biological Sciences Research Council (grant number BB/H008217/1), together with support from the UK Medical Research Council and University of Cambridge, UK. CCNMD was supported through NIH Grants P50 MH071616 and R01 MH56584. CNP was supported by the Consortium for Neuropsychiatric Phenomics (NIH Roadmap for Medical Research grants UL1-DE019580, RL1MH083268, RL1MH083269, RL1DA024853, RL1MH083270, RL1LM009833, PL1MH083271, and PL1NS062410). The HUBIN project was supported by the Swedish Research Council (2006-2992, 2006-986, K2007-62X-15077-04-1, 2008-2167, K2008-62P-20597-01-3. K2010-62X-15078-07-2, K2012-61X-15078-09-3, 2017-00949, K2015-62X-15077-12-3), the regional agreement on medical training and clinical research between Stockholm County Council and the Karolinska Institutet, the Knut and Alice Wallenberg Foundation. StrokeMRI was supported by the Research Council of Norway (249795, 248238), the South-Eastern Norway Regional Health Authority (2014097, 2015044, 2015073, 2016083), and the Norwegian ExtraFoundation for Health and Rehabilitation (2015/FO5146). Data collection and sharing for the NMorphCH project was funded by NIMH grant A R01 MH056584. The BergenPsykose project was supported by the European Research Council (ERC AdG) (693124), and Western Norway Health-Authorities (912045).