Abstract
The expanding behavioral repertoire of the developing brain during childhood and adolescence is shaped by complex brain-environment interactions and flavored by unique life experiences. The transition into young adulthood offer opportunities for adaptation and growth, but also increased susceptibility to environmental perturbations, such as the characteristics of social relationships, family environment, quality of schools and activities, financial security, urbanization and pollution, drugs, cultural practices, and values, that all act in concert with our genetic architecture and biology. Our multivariate brain-behavior mapping in 7,577 children aged 9-11 years across 585 brain imaging phenotypes, and 617 cognitive, behavioral, psychosocial and socioeconomic measures revealed three population modes of brain co-variation, which were robust as assessed by cross-validation and permutation testing, taking into account siblings and twins, identified using genetic data. The first mode revealed traces of perinatal complications, including pre-term and twin-birth, eclampsia and toxemia, shorter period of breast feeding and lower cognitive scores, with higher cortical thickness and lower cortical areas and volumes. The second mode reflected a pattern of socio-cognitive stratification, linking lower cognitive ability and socioeconomic status to lower cortical thickness, area and volumes. The third mote captured a pattern related to urbanicity, with particulate matter pollution (PM25) inversely related to home value, walkability and population density, associated with diffusion properties of white matter tracts. These results underscore the importance of a multidimensional and interdisciplinary understanding, integrating social, psychological and biological sciences, to map the constituents of healthy development and to identify factors that may precede maladjustment and mental illness.
Introduction
The complexity and idiosyncratic characteristics of the human mind originates in an intricate web of interactions between genes, brain circuits, behaviors, economic, social and cultural factors during childhood and adolescence. The major life changes associated with the transition into young adulthood offer opportunities for adaptation and growth, but also increased susceptibility to detrimental perturbations, such as the characteristics of social and parental relationships, family environment, quality of schools and activities, economic security, urbanization and pollution, drugs, cultural practices, and values, that all act in concert with our genetic architecture and biology. A multidimensional understanding of the interplay of these factors is paramount to identify the constituents of healthy development and to identify factors that may precede maladjustment and mental illness.
Population-based neuroimaging now allows us to take a birds-eye view on this stupendous multiplicity, and to bring hitherto unseen patterns into focus1. The Adolescent Brain Cognitive Development (ABCD) study2 provides brain images of more than 10,000 children aged 9-11 years across the US and includes a broad range of cognitive, behavioral, clinical, psychosocial and socioeconomic measures. While each neuroimaging feature typically explains a minute amount of unique variance in behavioural outcome3, 4, their combined predictive value is non-negligible, including predictive patterns for identification of individuals5, 6 and characteristics such as age7, 8, cognitive ability9 and psychopathology9. This added value of multivariate and combinatorial approaches for prediction of complex traits is highly analogous to the substantial polygenic accumulation of small effects in the genetic architecture of complex human traits and disorders10, 11.
Adolescence is a transition period between childhood to adulthood and a period of protracted brain maturation, associated with heightened sensitivity to the social and cultural environment12. For most individuals, this transition results in successful acquirement of skills and coping strategies required for adulthood and subsequent independence from caregivers, however it is also period of increased risk for mental health issues13, with possible life-long repercussions. Mapping positive and negative factors impacting the brain as well as psychological adjustment during the transition from childhood to adulthood is therefore of pivotal importance. Combining levels of information using latent-variable approaches which model all available information may reveal interpretable patterns among multiple brain imaging features and variables such as cognition and socio-demographics4, 14. One recent example revealed that a wide range of cognitive, clinical and lifestyle measures constitute a “positive-negative” dimension associated with adult brain network functional connectivity1.
Here we used an analogous approach in 7,577 children aged 9-11 from the ABCD-study, collected across 21 sites across the US, combining canonical correlation analysis (CCA) with independent component analysis (ICA) to derive population-level modes of co-variation, linking behavioral, psychosocial, socioeconomic and demographical variables (behavioral measures) to a wide set of neuroimaging phenotypes. Each resulting mode represents an association between a linear combination of behavioral measures with a separate combination of imaging features that show similar variation across participants14. In order to avoid overfitting, which is particularly important when employing data-driven approaches, and due to the high number of inter-correlated features, CCA was performed after data reduction with principal component analysis, and robustness and reliability of the identified modes were assessed using stratified cross-validation and permutation testing with restricted exchangeability, taking into account siblings and twins based on participant’s genetic data. To express results in the original variable-space, CCA-ICA subject weights were correlated back into the original data. Based on earlier reports of population level associations between measures of life-outcomes and brain connectivity1 and structure15 in adults, the known and rising socioeconomic inequalities in the US16, as well the impact of socioeconomic factors on child brain development17, 18, we expected to find traces of social stratification in the child brain.
Methods
ABCD data access
We accessed MRI, behavioral, clinical and genetic data from ABCD Annual curated release 2.0.1. The data as well as release notes including documentation of measures, scanning protocols and imaging QC can be accessed using the following NIMH data archive DOI: http://dx.doi.org/10.15154/1503209.
Behavioral, clinical, cognitive and demographical data
Tabulated data was imported and processed using R (https://cran.r-project.org). We accessed data from 11,853 participants. Supplementary Table 1 lists the behavioral measures included the analysis. We used the function ‘nearZeroVar’ from the R-package ‘caret’ (v. 6.0-81, https://github.com/topepo/caret/) to identify and exclude any continuous variables with zero or near-zero variance, and categorical variables with a ratio of > .95 for the most common compared to the second most common response. For each remaining variable we derived robust z-scores by calculating each scores absolute deviation from the median absolute deviation19 (MAD), and removed values with a z > 4 (4 x MAD). Those with a z > 3 were manually inspected: e.g. measures of facility income, time spent on phone and several measures of area deprivation have scores with z>3, but were kept in the analysis. We then excluded variables with less than 90% of retained datapoints, before excluding subjects with less than 90% retained data across the retained variables. The remaining subjects (n=11,809) were included for further analysis.
MRI Imaging derived phenotypes
We accessed T1 and T2 (n=11,534) and DWI (n=11,400) tabulated data from ABCD curated release 2.0.1. Supplementary Table 2 lists the MRI features included in the analysis. We included participants which passed quality assurance using the recommended QC parameters (T1: n=11,359, T2: n=10,476, DWI: n=10,414) described in the ABCD 2.0.1 Imaging Instruments Release Notes and whom had all included modalities available (n=9,811). ABCD preprocessing and QC steps are described in detail in the methodological reference for the ABCD Study by Hagler et al20. For each included imaging phenotype, we calculated the median absolute deviation (MAD) for each score, and removed values with MAD > 3. Subjects with less than 90% of features retained in any of the imaging modalities, and features with less than 90% of retained subjects were excluded from analysis. The remaining subjects (n=9,016) were included for further analysis.
Genetic data
We accessed genetic data for 10,627 participants to identify siblings and twins. We used genome-wide complex trait analysis21 to create a genetic relationship matrix after performing the following filtering: removal of SNPs in the major histocompatibility complex (25:35 Mb region on chr6) and the inversion region of chr8 (7:13 Mb); SNPs with genotyping rate <99%, minor allele frequency < 5%, pairwise pruning of SNPs in linkage disequilibrium (r2 > 0.2, window of 5,000, step of 500). To account for sibling and twins in the dataset, three groups were created based on the following genetic relatedness cut-offs, <.4, >.4 & <.6, and >.8, with the two latter groups containing pairs of siblings, and used for stratified cross-validation and creation of permutation exchangeability blocks.
Canonical correlation analysis
We performed CCA22 using MATLAB R2019b. Participants with MRI, behavioral and genetic data (n=7,577) were included. We applied a rank-based normal transformation to the behavioural/clinical data using ‘palm_inormal’ from FSLs PALM23 (v. 0.52, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/PALM). Next, we residualized all measures with respect to age and sex using linear models. Imaging phenotypes were also residualized for site/scanner, and volumetric features were also corrected for estimated total intracranial volume (eTIV from Freesurfer). eTIV was also included as variable in the analysis to capture associations with global volume, in addition to the eTIV-corrected volumes capturing associations with regional specificity. For both MRI and behavioral measures, missing values were imputed with ‘knnimpute’, replacing missing data based on the k nearest-neighbour columns based on Euclidian distance (k=3). An alternative approach without imputation is described below and did not change results. Data were then z-normalized and submitted (separately for imaging and behavioural data) to PCA (Supplementary Fig. 1), to avoid issues with rank deficiency and to increase robustness of estimated modes by avoiding fitting to noise. We extracted the first 200 components for both the imaging and behavioural data, and submitted these to CCA.
Cross-validation
To assess the reliability and generalizability of the resulting CCA-modes we performed the following 10-fold cross-validation procedure: For each iteration (n=100) of the cross-validation loop the dataset was randomly divided into 10 folds, stratified by the genetic relatedness groups, and ensuring that sibling and twin pairs were kept together to avoid training on one sibling/twin in a pair, and test on the other. While keeping each fold (10% of participants) out once we submitted the remaining data (90% of participants) to PCA (separately for imaging and MRI data) and then to CCA. Next we multiplied the kept-out behavioral measure and MRI feature matrices with the estimated PCA coefficient matrices, before multiplying the resulting PCA scores with the canonical coefficients and then correlated the resulting CCA scores. Finally, we took the average of these canonical correlations across the 10 folds (Supplementary Fig. 2). This procedure was repeated 100 times to derive mean canonical correlations for kept-out data, and used for calculating p-values after permutation testing. We also correlated the CCA subject measure and MRI coefficients derived for kept-out participants, with those from the full analysis.
Permutation testing
To assess significance of the resulting CCA-modes, we ran 1000 iterations of the same 10-fold cross-validation procedure described above, but with the order of participants of the imaging phenotype matrix randomly permuted in each iteration, respecting twins/sibling relationships, and collecting canonical correlations for the kept rather than the kept-out data to account for overfitting by the CCA. We then collected the maximum canonical correlation across CCA-modes (i.e. mode 1) for each permutation to form a null-distribution to calculate familywise error corrected (FWE) p-values. P-values for each of the CCA-modes were calculated by dividing the count of permuted maximum R-values (including the observed value) >= the mean of cross-validated R-values by the number of permutations. CCA-modes with a corrected p-value < .01 was included for further analysis (Supplementary Fig. 2).
CCA-ICA
The canonical variates becomes increasingly difficult to interpret due to their orthogonality. Since we had more than one significant mode, and following procedures described by Miller et al14, we used ICA to obtain more interpretable modes: we extracted and combined the behavioral and MRI CCA-scores for the three significant variates, correlated these with the original data matrix, transformed the correlations using a Fisher Z-transform, and submitted these to ICA. We estimated three components (the number of significant and extracted CCA covariance-modes) using fastICA24. To assess the reliability and generalizability of the ICA decomposition we reran 100 iterations of the 10-fold cross-validation procedure described above, this time including ICA estimation after the PCA and CCA step, and then correlated ICA subject-weights derived from kept-out data to those from the full analysis (Supplementary Fig. 2). To assess and plot the significant CCA-ICA modes in the full original variable space, we correlated the subject weights for each CCA-ICA mode with the original age- and sex (+ eTIV) adjusted matrices. For each significant mode of population covariation, we also plotted the variable text/descriptions for the 35 variables with the highest explained variance in the original adjusted data (lists of all variables and associated descriptions, correlations and ICA weights can be found in Supplementary Tables 3-5). ICA subject weight histograms are shown in Supplementary Fig. 3. The explained variance of single variables ranged between 10% to 40% for the most highly involved items on these population modes, which is in a similar range as reported employing a similar approach in the adult UK Biobank sample14. For visualization purposes, we produced scatter-plots using the highest-loading variables for each mode, color-coded by each individual’s score on the respective modes (Supplementary Fig. 4).
Consistency across sex and race/ethnicity
To assess the degree of similarity of the patterns across the sexes, we split the CCA-ICA subject weights by sex (Supplementary Figs. 5–7) and compared sex-specific subject-weight-with-variable correlations to those estimated for the full analysis. Correlations for the three modes ranged between r=.95 and r=1. The aim of this work was not to make comparisons of population sub-groups, but to detect general population patterns. Since many of the included indicators relating to inequality and socioeconomics are known to differ between ethnic minority groups, we did not regress these variables out of the data. To show that the detected patterns are generalizable we computed subject-weight-with-variable correlations for groups based on parent-ascribed race/ethnicity (Supplementary Fig. 8–10) excluding those with a frequency < 5% of the total sample (retaining “black”, “white”, “other”, and compared these to the full analysis. Correlations for the three modes ranged between r=.76 and r=1. These results indicate that the patterns are generalizable across sexes and ethnicity/race.
Consistency across sites
To further assess the generalizability and robustness of the CCA-ICA patterns we computed site/scanner-wise CCA-ICA variable-correlations, and performed correlations comparing these to the full model (Supplementary Fig. 11–13). The pattern of the three modes are mostly consistent across sites, but with some sites deviating more from the full analysis modes than others (r=.85 - r=.22). All imaging phenotypes were adjusted for site, however, ABCD collects data at 21 sites across the continental US (https://abcdstudy.org/about) and population-level demographical differences are expected.
Alternative approach without imputation
To ensure that the results were not affected by the imputation procedure for missing data points, we also used an alternative and previously described approach1 in which we estimated the subject x subject covariance matrix, ignoring missing values, before projecting this approximated covariance matrix to the nearest positive-definite covariance matrix using the MATLAB tool ‘nearestSPD’ (https://www.mathworks.com/matlabcentral/fileexchange/42885-nearestspd), thereby avoiding the need for imputation of missing values. The correlations between the CCA scores for the first three modes between the original analysis using imputation and this approach were r= .99, r=.96 and r=.96, respectively.
Alternative number of PCA components
To investigate the impact of choosing a stricter criterion of inclusion of PCA we reran the analysis with 100 PCA components, and compared the resulting CCA scores for the first three modes with those from the original analysis, yielding correlations of r=.98, r=.91, and r=.91, respectively.
Adjusting data for age2
To address the possibility of non-linear relationships between age and the various demographic, clinical and MRI measures features we reran analysis with age2 added along with the original confound variables, and compared the resulting CCA-ICA subject weights for the first three modes with those from the original analysis, yielding correlations of r=.99, r=.96 and r=.97, respectively.
Results
We identified three distinct modes of co-variation, linking brain-features to perinatal and early life events, socio-cognitive factors and urbanicity (Fig. 1). Canonical correlations for the first three modes where were significant and robust as assessed by 10-fold cross-validation and permutation (out-of-sample r=.61, r=.42, r=.38, all permuted-p = 0.001 respectively, Supplementary Fig. 2).
Mode 1 links perinatal factors and obstetric complications to cognitive ability and brain morphology in late childhood (Fig. 2). Having a twin, premature birth, birth complications requiring oxygen, Caesarian section, (pre)-eclampsia, toxemia and jaundice is associated with shorter duration of breast feeding, parent reported delayed motor development, lower cognitive scores and linked to a pattern of cortical morphometry and white matter diffusion measures in several brain regions, with lower cortical volume and area and higher thickness in middle temporal, lateral orbitofrontal and inferior parietal cortex among the highest-loading imaging features.
Mode 2 captures a pattern of economic deprivation and poverty, with the highest loading measures being related to the area deprivation index (ADI), such as parent unemployment, neighborhood median household income, income disparity and violence (Fig. 3). The mode links these measures to lower maternal age at child birth, lower parent education level, unplanned pregnancy, shorter duration of breast feeding, higher number of half-siblings, higher levels of religiosity, and the child having less nightly sleep hours on average, lower grades in school and worse performance on cognitive tests, jointly forming a dimension of socio-cognitive stratification. This dimension is associated with lower cortical thickness, area and volume, with total volume, lateral occipital cortical volumes and thickness, and bilateral lingual thickness among the highest-loading imaging features.
Mode 3 captures a links higher air particle matter (PM2.5) and area deprivation to lower population density, lower levels of NO2, lower neighborhood walkability, lower home value and rent, but higher home ownership percentage, higher number of half-siblings, and living in a state which has not legalized marijuana for medical use (as of 2016). The mode (Fig. 4, Supplementary Table 5) is further associated with reporting emerging signs of puberty such as body hair and to lower area and volumes across the cortex, as well as with white matter indices such as fractional anisotropy (FA) radial diffusivity (RD) neurite density (ND) and tract volumes, with the highest loading measures being related to the parahippocampal cingulum, the uncinate fasciculus, and corpus callosum.
Discussion
Adolescence is a transition period between childhood to adulthood, associated with heightened sensitivity to the social and cultural environment12. While for most individuals the transition results in successful acquirement of skills and coping strategies required for adulthood and subsequent independence from caregivers, it also coincides with increased risk for mental health issues and psychological madadjustment13. Research addressing the social, economic and environmental conditions affecting adolescent development, facilitating health and leading to fulfilling adult lives is therefore critical. Here we discuss three modes of population co-variation, each linking behavioral, clinical, psychosocial, socioeconomic and demographical measures to neuroimaging in 7,577 children aged 9-11 years.
The first mode links obstetric complications and early life factors such as duration of breast feeding and motor development, with cognitive ability, cortical surface area, thickness and volume in late childhood. Obstetric complications increase the risk of later cognitive deficits and mental disorders25. The present results support that children with a history of obstetric and perinatal complications show delayed brain development and are consistent with reports associating birth weight with cortical area and brain volume in childhood and adolescence26, underscoring the importance of taking perinatal factors into account when studying child and adolescent brain development.
The second mode captures a socio-cognitive stratification pattern associated with brain volume and regional measures of cortical thickness, area and volume. Conceptually, the mode shares similarities with a positive-negative population mode linked to brain functional connectivity1 and structure15 in adults. The mode links several positive and negative life-events and environmental circumstances, with the highest loading factors being related to socioeconomic status, such as poverty, parent unemployment and education level. It further captures several factors known to be related to social deprivation, such as degree of family-planning and early pregnancies, neighborhood level of violence and level of religious beliefs. These constitute important environmental conditions for neurodevelopment that these children receive from their parents, their community and society at large. Consistent with the literature on the effect of social deprivation on child development, this mode is also associated with less sleep, worse school performance and lower cognitive ability. Cognitive ability is moderately heritable in childhood and adolesence9, 27, 28, and this is likely partly explaining the association between child cognition, academic performance and SES29. However, the effects of poverty, low socioeconomic status and early life adversity on brain and cognitive development18, 30 also underscore the role and importance of social policies aimed at reducing disparities31 that put some children at a disadvantage, often with life-long consequences for opportunities, mental and physical health and quality of life. The neurotypical developmental trajectory at this age is characterized by apparent cortical thinning, likely partly reflecting synaptic pruning32 and myelination33. Thicker cortex with higher SES is consistent with reports of accelerated brain maturation in children from low-SES families34, 35. Indeed, across species and in humans, early life adversity is associated with accelerated maturation of neural systems, possibly at a cost of increased risk for later mental health problems36.
The third mode reflects an inverse association between particulate matter air pollution (PM2.5) and area deprivation on one side, and home value, walkability, and population density on the other. Specific geographical information for this mode cannot be discerned, since the ABCD does not provide geographical data about its participants, however the this mode fits a known socio-economic settlement pattern: low-pollution, high-walkability “sweet-spot” neighborhoods in urban areas are typically skewed toward higher-SES households, contrasted with high-pollution, low-walkability “sour-spot” neighborhoods associated with lower income37. Exposure to PM2.5, is associated with adverse health outcomes and disproportionally affects lower-income households38. Interestingly, this pattern was associated with the legal status of medical marijuana (as of 2016), possibly indicative of geographical differences for this pattern across the US states. Here we document an association with cortical area and volumes, as well as associations with diffusion properties of brain white matter pathways, in particular the parahippocampal cingulum, uncinate fasciculus, corpus callosum and forceps minor.
While these population level patterns are highly interesting, the cross-sectional and the non-experimental design warrant caution. People and their brains, genes and environments are not varying randomly, but are highly correlated39, likely along multiple dimensions, which complicates causal and mechanistic inference. This is especially relevant for population-based neuroimaging, in which subtle confounds can induce spurious associations4. These general caveats notwithstanding, these valuable resources represent an unprecedented opportunity to reveal co-varying patterns of socio-demographics, cognitive abilities, mental health, and brain imaging data, beyond simple bivariate associations, which are potentially highly informative of the biology, psychology and sociology of childhood and adolescent brain development and psychological adaptation.
In contrast to the standard regression approach which models one outcome-variable at a time and typically includes only a few covariates the combined multivariate approach employed here considers the full pattern of co-variability between variables. Our approach is therefore well suited for capturing population patterns by maximizing statistical power. However, it does not allow for interpretation of specific associations between pairs of variables. Overfitting can be a challenge with multivariate approaches, in particular in small samples and for complex models40. Currently there are no comparable samples to ABCD in which to independently assess the generalizability of the results. However, the current results were obtained in a large sample, using data reduction as well as 10-fold cross-validation with all relevant analysis steps performed within the cross-validation and permutation loop, to avoid over-fitting and assess generalizability. All the patterns are purely correlational and also treated analytically and reported as such. It is also entirely possible, and highly probable, that these patterns are further correlated with other important phenomena not measured or included in the current analysis. The current approach also effectively captures differential patterns involving the same measures. For example, higher cortical thickness, indicative of delayed maturation, is independently associated both with socio-cognitive stratification, higher cognitive ability and SES, as well as with obstetric and perinatal complications, lower cognitive ability, and delayed speech and motor development. Another example is duration of breast feeding, which where independently associated both with obstetric complications as well as with socio-cognitive stratification, associated with differential patterns of brain differences. These independent and co-existing associations with brain structure emphasize the importance of multidimensional considerations for understanding child and adolescent neurodevelopment and support that political priorities and decisions aiming to improve health outcomes and adaptation during transformative life phases should be based on interdisciplinary perspectives integrating social, psychological and biological sciences41.
Supplementary information
1. Supplementary Figures
Supplementary Fig. 5 - 7 – Modes 1 - 3 by sex
Supplementary Fig. 8-10: Mode 1 – 3 by ethnicity/race
Supplementary Fig. 11-13: Mode 1 – 3 by site/scanner
2. Supplementary Tables
Tables available using this link to a OSF.io data repository (Center for Open Science) for the current paper.
Supplementary Table 1: Lists the included behavioral variable names, and their associated descriptions and instrument.
Supplementary Table 2: Lists the included MRI variable names, and their associated descriptions.
Supplementary Tables 3-5: List of all included variables, their description, ICA-weights and correlations with CCA-ICAs Mode 1-3 subject weights.
Acknowledgements
D.A. is funded by the South- Eastern Norway Regional Health Authority (2019107). T.K. is funded by the Research Council of Norway (276082). A.F.M. gratefully acknowledges support from the Dutch Organisation for Scientific Research via a Vernieuwingsimpuls VIDI fellowship (016.156.415) and a Wellcome Trust Innovator award (215698/Z/19/Z). S.M.S is funded by a Wellcome Trust grant (203139/Z/16/Z). L.T.W. is funded by the European Research Council under the European Union’s Horizon 2020 research and innovation program (ERC Starting Grant 802998), the Research Council of Norway (249795), the South-East Norway Regional Health Authority (2019101), and the Department of Psychology, University of Oslo.
Footnotes
Disclosures: The authors declare no conflict of interest