Abstract
The recent availability of comprehensive, brain-wide gene expression atlases such as the Allen Human Brain Atlas (AHBA) has opened new opportunities for understanding how spatial variations on the molecular scale relate to the macroscopic neuroimaging phenotypes. A rapidly growing body of literature is demonstrating relationships between gene expression and diverse properties of brain structure and function, but approaches for combining expression atlas data with neuroimaging are highly inconsistent, with substantial variations in how the expression data are processed. The degree to which these methodological variations affect findings is unclear. Here, we outline a seven-step analysis pipeline for relating brain-wide transcriptomic and neuroimaging data and compare how different processing choices influence the resulting data. We suggest that studies using AHBA should work towards a unified data processing pipeline to ensure consistent and reproducible results in this burgeoning field.
Introduction
Over the past two decades, human imaging genetics has emerged as a powerful strategy for understanding the molecular basis of macroscopic neural phenotypes measured across the entire brain (Meyer-Lindenberg and Weinberger, 2006, Muñoz et al., 2009, Arslan, 2015, Hashimoto et al., 2015). Traditionally, this work has involved correlating allelic variation at one or more genetic loci with variation in one or more imaging-derived phenotypes (IDPs), initially through candidate gene studies and more recently at a genome-wide level. The latter has been facilitated by the formation of large consortia, such as ENIGMA (Thompson et al., 2014). A common assumption in this work is that variants associated with an IDP (or nearby variants tagged by the associated variant) influence gene expression or protein abundance, which in turn alters cellular function and ultimately affects the studied IDP. However, multiple environmental and other factors can impact gene activity (Fraser et al., 2005, Choi and Kim, 2007, Cole, 2009) and the functional roles of many IDP-linked variants, which are usually identified through large-scale statistical analyses, are often unknown. As a result, the mechanisms through which a given variant may influence phenotypic variation can be unclear. Moreover, the expression levels of many genes vary substantially across brain regions (Hawrylycz et al., 2015), and these spatial variations cannot be inferred from DNA sequence alone.
Assays of gene expression provide a more direct measure of gene function. Expression assays are invasive, requiring direct access to neural tissue, and technical limitations have historically constrained analyses of gene expression in the brain to small sets of areas studied in isolation. Recent advances in the development of high-throughput tissue processing and bioinformatics pipelines have overcome these limitations, resulting in datasets of gene expression across a large fraction of the genome in a large number of brain regions, and through various stages of development [see Keil et al. (2018) for a detailed overview]. While some of the human atlases span multiple brain areas, only the Allen Human Brain Atlas (AHBA) offers high resolution coverage of nearly the entire brain, comprising expression measures for more than 20,000 genes taken from 3702 spatially distinct tissue samples. Critically, the samples have been mapped to the stereotaxic space, allowing researchers to directly relate spatial variations in gene expression to spatial variations in IDPs (for more details on the AHBA see supplementary material S1).
This unprecedented capacity to link molecular function to macroscale brain organization has given rise to the nascent field of imaging transcriptomics, which has begun to yield new insights into how regional variations in gene expression relate to functional connectivity within: canonical resting-state networks (Richiardi et al., 2015, Forest et al., 2017); fiber tract connectivity between brain regions (Goel et al., 2014); temporal and topological properties of large-scale brain functional networks (Cioli et al., 2014, Vértes et al., 2016); the specialization of cortical and subcrotical areas (Krienen et al., 2016, Parkes et al., 2017, Anderson et al., 2018); regional maturation during embryonic and adolescent brain development (Kirsch and Chechik, 2016, Whitaker et al., 2016); and pathological changes in brain disorders (Rittman et al., 2016, Romme et al., 2017, McColgan et al., 2018, Romero-Garcia et al., 2018a). Software toolboxes to facilitate the integration of brain-wide transcriptomic and imaging data have also been developed (French and Paus, 2015, Gorgolewski et al., 2015, Rizzo et al., 2016, Rittman et al., 2017).
Analyses in imaging transcriptomics are often highly multivariate, involving expression measures of around 20,000 genes in each of around 102–103 brain regions, being related to one or more distinct IDPs quantified in each region requiring quite extensive data processing. The impact of data processing choices on the results of neuroimaging analyses is well documented, with strategies for the correction of motion-related and global signal fluctuations in functional MRI being a prime example (Power et al., 2015, 2017, Ciric et al., 2017, Parkes et al., 2018). Comparable scrutiny has not yet been applied to the many processing choices that can affect the analysis of transcriptomic atlases and their relation to IDPs. At the time of writing, more than 30 studies have linked the AHBA gene expression measures to human neuroimaging data. The lack of a standard processing pipeline for gene expression data means that the degree to which the results of this work are robust to different methodological choices remains unclear.
As the field develops, it is important to establish methodological guidelines to ensure consistent and reproducible results, and to support valid interpretation. In this paper, we offer a practical guide to some of the key steps in processing the AHBA gene expression data and examine the potential impact of methodological choices available at each step. We focus on the AHBA, as it is the most spatially comprehensive and widely used gene expression atlas in the field (Hawrylycz et al., 2012).
The paper is organised as follows. We begin by summarizing some basic aspects of how gene expression is quantified, and general characteristics of the AHBA. We then outline several key steps in a basic workflow for relating gene expression measures to imaging data and examine the impact of methodological choices at each step. In the final section we make some recommendations for best practice, and provide directions for further research.
Measuring gene expression
Gene expression is a process through which genetic information encoded by sequences of DNA is read and used to synthesize a particular gene product, such as a protein or RNA molecule (Szymański and Barciszewski, 2002). The order of amino acids within each gene determines the structure and function of the resulting product, which in turn affects cellular function and drives phenotypic variability. While the DNA of each cell in the organism is identical, different cells and anatomical structures express different phenotypes (e.g., neurons versus lymphocytes) due to differences in gene expression. The process through which a sequence of DNA is expressed is complex, but (for present purposes) can be divided into two main stages: (1) transcription, which occurs when an unwound segment of DNA is read to produce messenger RNA (mRNA); and (2) translation, which occurs when mRNA is used to synthesize proteins (Krebs et al., 2014). Gene expression is commonly approximated by measuring mRNA levels of a particular gene and is thus an index of gene transcriptional activity. Gene transcription is an indirect proxy for protein abundance, which is ultimately determined by gene translation. This distinction is important as several studies have shown that mRNA and protein levels within a tissue can vary significantly (Futcher et al., 1999, Gygi et al., 1999, Greenbaum et al., 2003) and gene expression (transcriptional activity) and protein abundance (translational activity) are not always positively correlated (Margineantu et al., 2007, Schwanhäusser et al., 2013).
In the AHBA, transcriptional activity has been measured using microarray, which quantifies the expression levels of thousands of genes at once by measuring the hybridization of cRNA (Cy3-labeled RNA) in a tissue sample to a particular spot on the microarray chip. Each of these spots, called probes, maps to a unique location of the DNA and contains single-stranded nucleic acid profiles that are ready to anneal to their complementary targets in the process of hybridization. Relative levels of gene expression in a tissue sample are then quantified by measuring the fluorescence at each sequence-specific location, which is proportional to the amount of complementary mRNA in a sample (Tarca et al., 2006). This method provides a cost-effective way to measure gene expression in high-throughput manner. However, it is limited to known gene sequences, is prone to background noise due to indirect assessment of expression values, and spatial biases can result from variability in lateral diffusion of target molecules on the chip (Steger et al., 2011). Expression measures can also be affected by cross-hybridization artefacts arising when cRNA anneals to an imperfectly matched probe.
Microarray is typically performed on bulk tissue samples, and the cellular composition of a sample can strongly influence its gene expression profile. As a result, two samples with varying densities of different cell types may show transcriptional differences simply because of their different cellular composition. This is an important consideration when comparing data acquired from samples taken from different parts of the brain, since variations in the density of distinct cell types may drive differences in regional gene expression. In addition, variations in the way tissue samples are acquired, handled and processed, age at death (Glass et al., 2013), sex (Berchtold et al., 2008, Trabzuni et al., 2013), ethnicity (Spielman et al., 2007), brain pH (Mexal et al., 2006), post-mortem interval (Zhu et al., 2017), and RNA degradation (Jaksik et al., 2015), can all affect expression measures. Another potential influence arises from batch effects, caused by samples being processed at different times, by different staff, or in different labs; even changing atmospheric ozone levels can impact the final measures (Fare et al., 2003) [see Scherer (2009) for an overview]. The Allen Institute has implemented a series of steps to mitigate this variability as much as possible, as outlined in Allen Human Brain Atlas technical white paper (Allen Human Brain Atlas, 2013).
One final consideration is that any individual gene expression assay provides a static snapshot of a dynamic process. Gene expression changes through development, and as a function of experience, environmental exposures and other factors (Fraser et al., 2005, Choi and Kim, 2007, Berchtold et al., 2008, Cole, 2009, Birdsill et al., 2011, Naumova et al., 2012, Kumar et al., 2013). The further advancement of developmental atlases of gene expression (Johnson et al., 2009, Colantuoni et al., 2011, Kang et al., 2011, Fertuzinhos et al., 2014, Bakken et al., 2016) will help to shed light on these dynamic processes.
A general workflow for processing brain-wide transcriptomic data
The AHBA consists of microarray data in 3702 spatially distinct samples taken from six neurotypical adult brains. The samples are distributed across cortical, subcortical, brainstem and cerebellar regions in each brain, and quantify the expression levels of more than 20,000 genes (for more details see supplementary material S1). Different brain regions were sampled across each of the six AHBA donors to maximize spatial coverage. Figure 1 shows the variability of coverage across individual brains.
Each tissue sample is associated with a numeric structure ID, name and structure label (cortex, cerebellum, or brainstem) in addition to the MRI voxel coordinates in native image space and MNI stereotaxic coordinates, which can be used to match samples to other imaging data (Figure 1). The AHBA also provides: (1) a binary indicator of when the level of a given transcript exceeds background levels, which can be used for quality control purposes; (2) RNA-seq data for a subset of tissue samples in two donor brains (120 samples each), which can be used for cross-validating expression measures (as we show below); and (3) magnetic resonance images, including T1-weighted, T2-weighted, T2-weighted gradient echo and FLAIR scans for all six brains, and diffusion-weighted images for two brains. These scans were collected prior to the dissection for anatomical visualization.
The AHBA samples were processed over approximately three years, which raises concerns about possible batch effects. Expression data were subjected to normalization procedures within a single brain, as well as between brains, to minimize the effect of non-biological biases such as array-specific differences, dissection method, and RNA quality differences among others, while maintaining biologically-relevant variance. Detailed information about the normalization procedures is provided in the technical white paper (Allen Human Brain Atlas, 2013). Despite these procedures, we show below that large inter-individual differences in gene expression remain, such that samples from the same brain tend to have more similar gene expression compared to the samples from other brains. These differences must be taken into account when combining data across all six brains.
Beyond the processing steps applied by the Allen Institute, a number of other steps are required to link expression measures and neuroimaging data. Here we outline seven major steps, which represent the core features of a typical workflow. The data processing steps, summarized in Figure 2, are: (i) verifying probe-to-gene annotations; (ii) filtering of probes that do not exceed background noise; (iii) probe selection, where representative probes (or a summary measure) are selected to index expression for a gene; (iv) sample assignment, where tissue samples from the AHBA are mapped to specific brain regions in an imaging dataset; (v) normalization of expression measures to account for inter-individual differences and outlying values; (vi) gene-set filtering, to remove genes that are inconsistently expressed across six brains and/or to select genes in a hypothesis-driven way based on the research question. (vii) accounting for the spatial patterns in gene expression. The first six processing steps produce the region × gene matrix that can be used for the regional analyses. The final step of accounting for the autocorrelation in the gene expression measures depends on the particular research question. The potential need to account for spatial effects arises because gene expression is more strongly correlated between samples that are separated by short distances compared to those that are far apart, a pattern that has been described in humans (Richiardi et al., 2015, Krienen et al., 2016, Vértes et al., 2016, Pantazatos and Li, 2017), mouse (Fulcher and Fornito, 2016) and C.elegans (Arnatkevičiūtė et al., 2018). Although this spatial autocorrelation is, in itself, an important neurobiological feature of the brain transcriptome (Gryglewski et al., 2018), it is critical for any analysis claiming a specific association between spatial variations in gene expression and a given IDP to show that the association exceeds what would be predicted by lower-order spatial gradients of gene expression. In the following sections, we outline some of the choices that can be made at each of these steps and consider their impact on analysis with some recommendations summarized in the conclusions section. Code and data used for data processing and the following analyses are available at github https://github.com/BMHLab/AHBAprocessing and figshare https://figshare.com/sZ441295fe494375aa0c13 respectively.
Step 1. Probe-to-gene re-annotation
In microarray experiments, probe sequences correspond to a unique portion of DNA and are assigned to genes based on available genome sequencing databases (O’Leary et al., 2016). While the AHBA (and other platforms) provide annotation tables where probes are mapped to genes, this information gets outdated with each update of the sequencing databases. For example, at the time of the AHBA release in 2013,18% of probes were not annotated to any gene. Using updated sequencing information we can find corresponding genes for more than 2000 probes that previously were not matched to any gene while some probes are being matched to different genes than before. At the same time some probes can not be unambiguously mapped to any gene using updated sequencing data and therefore should be excluded from further analyses. An accurate probe-to-gene mapping is essential for obtaining biologically meaningful findings. It is therefore necessary to re-assign probes to genes using the most current information available. This re-annotation can be done using several methods and toolboxes, some of which are summarized in Table 1. To our knowledge, only three studies using the AHBA have performed probe-to-gene re-annotation (Richiardi et al., 2015, Eising et al., 2016, Romero-Garcia et al., 2018b).
To investigate how probe-to-gene annotations change over time, we supplied a list of all available 60 bp length AHBA probe sequences (n = 58,692) to the Re-annotator toolkit (Arloth et al., 2015) (Table 1). We found that 45,821 probes (78%) were uniquely annotated to a gene and could be related to an entrez ID - a stable identifier for a gene generated by the Entrez Gene database at the National Center for Biotechnology Information (NCBI). A total of 19% of probes were not mapped to a gene, and just under 3% were mapped to multiple genes and could not be unambiguously annotated. Of the probes that were unambiguously annotated to a gene, 3438 (7.5%) of the annotations differed from those provided by the AHBA: 1287 probes were re-annotated to new genes and 2151 probes that were not previously assigned to any gene in the AHBA could now be annotated. Additionally, 6211 (~ 10%) probes in the initial AHBA dataset had an inconsistent gene symbol, ID or gene name information according to the NCBI database (https://www.ncbi.nlm.nih.gov/), as of 5th March 2018. Because of these differences, we recommend obtaining probe-to-gene annotations and retrieving the gene symbol ID and name from the latest version of NCBI (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/). Hereafter, we present all analyses using this newly re-annotated set of 45,821 probes, corresponding to 20,232 unique genes.
Step 2. Data filtering
Microarray experiments are prone to background noise due to non-specific hybridization, so appropriate controls must be employed to discriminate expression signal from noise. Variability in measured intensity values is greater for lower hybridization intensities, where signal levels approach background (Quackenbush, 2002). This problem is often addressed by removing a fixed percentage of probes with lowest intensity or using only array elements that show statistically significant expression differences (increase) from the background (Quackenbush, 2002). Each probe in each sample of the AHBA has been assigned a binary indicator for whether it measures an expression signal that exceeds background levels (see Figure 1). This assignment is done on the basis of two criteria: 1) a t-test comparing the mean signal of a probe to the background (p < 0.01) indicating that the mean signal of the probe’s expression is significantly different from the background; and 2) the difference between the background and the background subtracted signal is significant (> 2.6 × background standard deviation).
Filtering genes based on the AHBA binary indicator [intensity based filtering (IBF)] can have a marked effect on the final set of genes included for analysis, however only a few published studies using the AHBA data have reported using the IBF (Hawrylycz et al., 2012, Richiardi et al., 2015, Burt et al., 2017). For example, if we exclude probes that did not exceed the background in at least 50% of all cortical and subcortical samples across all subjects, we exclude 30% of probes (13,844 out of 45,821), assaying 4486 out of 20,232 genes (Figure 3A). In other words, if no filtering is performed, > 22% of genes will have expression levels consistent with background noise in at least half of the tissue samples.
To further investigate the impact of IBF, we examined how filtering affects the average correlation between expression values quantified by multiple probes for the same gene. Given that expression measures of different probes are expected to be comparable, IBF should increase the inter-probe agreement. Figure 3B shows the distribution of the average between-probe correlation, estimated before and after IBF. Starting with an initial set of 17,769 genes with multiple probes, applying IBF to exclude probes that do not exceed background in at least 50% of regions removes 6579 genes. It is evident that the distribution of between-probe correlations is pushed towards higher values.
We next compared the mean between-probe correlations obtained before and after IBF, focusing on the 11,190 genes with multiple probes that were retained after filtering. For 10,111 of these genes, the average correlation was identical, while for the remaining 1097 genes (~ 10%), the mean correlation was significantly greater following IBF (Spearman’s rank correlation (denoted as ρ through the text): ρ = 0.47 vs ρ = 0.30; p = 7 × 10−54, Wilkoxson rank sum test; Figure 3C). Gene score resampling (GSR) analysis (Gillis et al., 2010) revealed that IBF excluded genes that are involved in generic cellular, immunological and metabolic processes that are not specific to the brain (see supplementary file enrichmentExpression.csv for results and supplementary material S2 for more details). While the exact threshold for the IBF still remains to be chosen by the researchers, these results indicate that IBF is effective in mitigating noise in the microarray gene expression measures.
Step 3. Probe selection
Multiple probes can be used to measure the expression level of a single gene at different exons (segments of RNA molecules that code for a protein or peptide sequence), which can increase the reliability of the measurement. After performing re-annotation and IBF, 71% genes in the AHBA were measured with at least two probes (compared to 93% in the original data). One might expect that probes measuring the expression of the same gene should show consistent expression patterns, but this is not always the case. For example, even after IBF, the correlation between probes measuring the expression levels of the same gene for more than 20% of genes is ρ < 0.3 (Spearman rank correlation) (Figure 3B). Investigators have used different strategies to derive a representative measure of gene expression. Some of the strategies used in published work are summarized in Table 2.
To evaluate how the gene expression measures vary under different probe selection methods, we estimated, a single summary measure of expression for each gene indexed by multiple probes, according to one of the methods listed in Table 2. We also evaluated a few other methods beyond ones used in the previous literature, such as selecting a probe with maximum coefficient of variation across samples (CV), or the probe with the highest proportion of samples with expression levels exceeding background noise (signal proportion). In addition, we included random probe selection (averaged over 100 repeats) for comparison. We then took the expression vector of each gene across tissue samples and computed the Spearman rank correlation coefficients between these vectors estimated for each possible pair of methods.
Figure 4A shows the average correlation between expression measures selected using different criteria, averaged across 17,769 genes - all genes with multiple probes available for the same gene. Since most studies using the AHBA do not report using IBF, we show results for unfiltered data (similar results have been obtained using data after IBF, see Figure S2). The average correlation coefficients between probe selection methods range between , indicating that the probe selection method can have a major impact on expression estimates. The method of summarising the expression measures for a gene as the mean across all available probes is the most highly correlated, on average, to all the other methods. Variance-related methods [coefficient of variation, maximum variance, connectivity-variance and highest loading on first PC of non-normalized data (Parkes et al., 2017)] are similar to each other, but different to other methods. Consistency (DS) and intensity (max intensity, signal proportion and connectivity-intensity) related methods, on the other hand, are more correlated with each other. Notably, the correlations between gene expression measures selected based on the highest CV compared to the consistency/intensity-related criteria are much lower than resulting from the random probe selection strategy, indicating that these methods favour dissimilar properties of expression measures for probe selection. A more detailed discussion of these results is presented in supplementary material S3.
The lack of a gold standard makes it difficult to choose between different probe selection options. One strategy is to use RNA-seq data as an external reference (Miller et al., 2014b). RNA-seq allows precise quantification of the amount of RNA in the sample without reliance on existing knowledge about genome sequences [for an overview, see Wang et al. (2009), Kukurba and Montgomery (2015)]. It is also free of the background noise artefacts that are known to contaminate hybridisation-based gene expression measures and therefore provides a more reliable estimate of gene expression. Samples from two AHBA brains previously analyzed using microarray were reprocessed using RNA-seq to provide expression data for more than 20,000 genes, in 120 samples in each brain (Hawrylycz et al., 2012). Comparing expression values for matching structures in each of the two brains allows us to select probes that correlate most strongly with RNA-seq, providing an additional quality control measure to cross-validate probe selection.
Considering that 17,609 of the 20,232 genes in the microarray data have RNA-seq measures, we first aimed to evaluate whether excluding the ~ 13% of genes that do not overlap between the datasets would eliminate brain-relevant genes. We verified this using over-representation analysis ORA: the genes removed are not enriched in brain-specific functionality but rather are related to septin assembly and organization, as well as the negative regulation of RNA splicing (see supplementary file enrichmentExpression.csv for results and supplementary material S2 for more details).
We then examined the correlations between microarray and RNA-seq expression measures in the 17,609 genes that overlap between both RNA-seq and microarray datasets across 112 brain regions, as shown in Figure 4B. Most correlations are low, with 52% of genes exhibiting a correlation ρ < 0.3 and only 23% genes exhibiting a correlation ρ > 0.5. This divergence between RNA-seq and microarray is likely to be caused by inaccuracies in the microarray measurements. Using GSR analysis (Gillis et al., 2010) we find that genes with higher correlations between microarray and RNA-seq are related to neuronal connectivity and communication related processes with categories such as ‘transmission of nerve impulse’, ‘ensheathment of neurons’, ‘myelination’ and ‘glial cell development’ demonstrating the strongest enrichment (see supplementary file enrichmentExpression.csv for results and supplementary material S2 text for more details). This analysis demonstrates that RNA-seq data can be used as a reference to select brain-relevant and reliably measured genes.
Figure 4C shows that, compared to other probe selection methods, RNA-seq demonstrates the highest similarity to intensity/consistency-based approaches (ρ > 0.8, Spearman’s rank correlation), with DS showing the highest correlation. In contrast, variance-based methods are no more similar to the RNA-seq measures than random probe selection (ρ < 0.75, Spearman’s rank correlation). Given that RNA-seq data is only available for a limited number of samples (with only 87% of genes being represented), and the data come from only two of the six brains donor brains in the AHBA, Figure 4C indicates that DS may be a reasonable alternative method for probe selection that can be generalized to the full AHBA.
Step 4. Assigning samples to brain regions
The AHBA provides gene expression data for multiple spatially localized tissue samples (Figure 1). When relating such data to macroscopic IDPs, it is necessary to generate some mapping between the spatial location of each tissue sample and the particular spatial unit of analysis (e.g., voxel, brain region) used to construct the IDP. This mapping is facilitated by the AHBA including an MNI coordinate (and voxel coordinate) for each tissue sample, and MRI data acquired for each individual brain contained in the AHBA. Each tissue sample is also associated with an anatomical structure ID, which can be related to corresponding higher order structures using the Allen Institute anatomical ontology, allowing brain structures to be identified at different resolution scales.
Existing studies have used several approaches to map tissue samples to regions-of-interest (ROIs) in imaging data. One strategy has been to match samples to structures based on the name of a given anatomical sample. The simplest approach is to use the anatomical structure names provided by the AHBA [see Allen Human Brain Atlas (2013), Tan et al. (2013), Myers et al. (2015), Chen et al. (2016), Kirsch and Chechik (2016), Hecker et al. (2017), Lee et al. (2017), Negi and Guda (2017)], but these regions do not directly correspond to brain parcellations typically used in imaging analyses, so precise alignment with imaging data can be difficult. An alternative approach is to use the MNI (or voxel) coordinates of each sample (Goyal et al., 2014, Cioli et al., 2014, French and Paus, 2015, Richiardi et al., 2015, Komorowski et al., 2016, Krienen et al., 2016, Rizzo et al., 2016, Burt et al., 2017, Parkes et al., 2017, Romme et al., 2017, Shin et al., 2017, Anderson et al., 2018, Romero-Garcia et al., 2018b). It is possible to either assign samples to brain regions in a single parcellation defined in MNI space (Krienen et al., 2016, Keo et al., 2017, Parkes et al., 2017, Romme et al., 2017), or to assign samples to regions based on parcellations of each individual AHBA brain (Romero-Garcia et al., 2018a). The former approach is simpler, but a characteristic of the AHBA is that the MNI coordinates provided for each tissue sample are based on spatial normalizations that were tailored to each individual brain. Specifically, two of the AHBA brains were scanned in cranio and normalized to MNI space via a linear transformation, whereas the other four were acquired ex cranio and normalized using an affine followed by a non-linear transformation [for details see (Allen Human Brain Atlas, 2013)] with the deformation fields also being smoothed to facilitate matching the images to Nissl stains. These differences across brains will influence the accuracy of the normalization across the six brains, which is compounded by differences in tissue distortion that occurred during sample handling and processing.
To overcome these issues, a parcellation scheme can be applied to each individual donor brain. This method can more accurately account for individual differences in donor brain anatomy but is contingent on being able to generate appropriate transformations between native and MNI space for an accurate parcellation. For cortex, the accuracy of the parcellation can be greatly enhanced by parcellating and normalizing the surface; parcellation of non-cortical areas requires volumetric normalization. In our own work, we have been able to segment the cortical surfaces of the six AHBA brains with reasonable accuracy (assessed by visual inspection), and we supply four different volumetric parcellations mapped at different resolutions to each brain: the Desikan-Killany (Desikan et al., 2006), comprising 34 nodes per hemisphere, the group-level HCPMMP1 (Glasser et al., 2016) comprising 180 nodes per hemisphere, and two random parcellations comprising 100 and 250 nodes per hemisphere, respectively.
Once a particular parcellation has been generated, tissue samples should be assigned to the nearest region of the parcellation. In this assignment, a threshold can also be applied, to avoid assigning samples beyond a certain distance threshold. The distance between sample and region is commonly estimated as the Euclidean distance in 3D space. This sample-region distance has been computed in different ways, including representing a region in space by its centroid coordinate (Vértes et al., 2016, Whitaker et al., 2016, McColgan et al., 2018), or taking the minimum distance between the sample and any voxel in the region (French and Paus, 2015, Parkes et al., 2017, Romme et al., 2017). The latter approach is more accurate, given that regions in any given parcellation vary in size and folded geometry (e.g., Figure 5A).
In this process of assigning samples to regions, errors can occur if the mapping is not done separately for (i) broad anatomical division (cortex, subcortex, cerebellum and so on); and (ii) left and right hemispheres. That is, cortical samples listed as coming from the left hemisphere in the AHBA ontology should only be mapped to left cortical voxels (as samples were taken from annatomically known positions in the brain), right subcortical or cerebellar samples to right subcortical/cerebellar voxels and so on. In our own experience, we have observed that subcortical samples (as indicated by AHBA ontology) can be mapped to cortical regions of the parcellation as cortical voxel may be closer (or visa versa). Similarly, if no separation between hemispheres is performed, 58 out of 2748 cortical and subcortical samples are assigned to an incorrect side of the brain when using the Desikan-Killany (Desikan et al., 2006) parcellation (Figure 5B). While the majority of those samples are very close to the midline, several are clearly incorrectly mapped to the stereotaxic space, such as two samples in the frontal pole, which are assigned to the left side of the brain according to the AHBA annotations but have a positive MNI x-coordinate. The same is true for some samples from the mammillary body and cingulate gyrus, which are labelled as coming from the right hemisphere but have negative MNI x-coordinates (Figure 5B). To avoid potential mistakes, samples with mismatching assignment should be excluded.
A second consideration is to set a distance threshold for assigning samples to regions, to ensure that samples further than a given threshold away from the parcellation will not be assigned. As shown in Figure 5B, only around 50% of samples are directly mapped to a parcellation when using the Desikan-Killany (Desikan et al., 2006) parcellation (i.e., their coordinates correspond to a voxel inside the parcellation). Increasing the distance threshold will allow some tolerance for small errors in spatial normalization. Figure 5C shows that assigning samples that are up to 2 mm away from any voxel in the parcellation increases the proportion of assigned samples to almost 90%, with additional increases in the distance threshold yielding only minor gains in the number of assigned samples, therefore, we use a 2 mm distance threshold in our analyses.
Step 5. Six brains, one atlas: accounting for individual variability
In cases where a given brain region is assigned multiple samples, we must generate some aggregate measure of expression for that region. Most commonly this is done by taking a mean across the samples assigned to a given region. A complication of the AHBA is that the samples come from different brains. As we shown in the next section, each brain shows a distinct transcriptomic profile, which must be addressed before data from different brains can be aggregated.
The AHBA is often used to represent a general transcriptomic profile of the adult human brain. However, it is comprised of data taken from people aged 24 to 57 years, of different ethnicities, sexes, medical histories, causes of death, and post-mortem intervals (Table 3). Many of these factors can impact gene expression (Fraser et al., 2005, Berchtold et al., 2008, Kumar et al., 2013, Trabzuni et al., 2013). One way to address this brain-specific variance is to conduct analyses separately in each brain. However, spatial coverage of different brain areas in the AHBA varies from person to person, therefore, collapsing samples from all brains allow to derive a single atlas with maximum spatial coverage across the brain. In this case, an appropriate correction for donor-specific transcriptomic differences is required.
Considering that Allen Institute applied a range of data normalization procedures to remove batch effects and artefactual inter-individual differences, most studies using AHBA have not taken into account the additional interindividual differences that might be important when aggregating data across six donor brains. Here we investigated whether intrinsic inter-individual differences in expression play a major role by projecting all tissue samples from six donor brains into a two-dimensional transcriptional principal components space. Figure 6A plots loadings of each cortical tissue sample on the first two principal components of gene expression for all six donors (for the whole brain see Figure S3). This unsupervised projection of samples into gene expression space captures the latent dimensions of variance between all samples and broadly separates the six donors (regardless of where a tissue is located in the brain), indicating that each donor has a distinctive gene expression profile. In other words, while the data normalization procedures applied by the Allen Institute prior to data release removed batch effects and artefactual inter-individual differences, a considerable degree of intrinsic donor-specific variance remains and must be accounted for in order to perform valid data aggregation.
One approach for addressing donor-specific effects is to perform a leave-one-out analysis, where the analysis is repeated six times, excluding one of the brains at each iteration (Parkes et al., 2017, McColgan et al., 2018). This approach can ensure that the results are not driven by single brain. A more direct way of eliminating the inter-individual differences in expression measures is to normalize the gene expression data separately for each subject (Rizzo et al., 2016, Liu et al., 2017, Negi and Guda, 2017, Romme et al., 2017, Romero-Garcia et al., 2018a). With this approach, each gene’s expression values are normalized across regions separately for each donor in order to reflect the relative expression of each gene across regions, within a given brain (Figure 6B-D). A desirable normalization procedure should offer robustness to outlying values and quantify expression on the same scale across donors to enable direct comparison. Most studies using AHBA have generally used z-score normalization (Rizzo et al., 2016, Negi and Guda, 2017, Romero-Garcia et al., 2018a), where represents the mean, σ represents the standard deviation and xi — the expression value of a gene in a single sample. The estimates of and σ are appropriate for symmetric distributions, whereas gene expression distributions across brain samples are often non-symmetric, and can contain outliers, which can bias these summary statistics. Figure 6E demonstrates the sensitivity of z-score normalization to the outlying values. A variety of outlier-robust normalizations exist such as Hampel hyperbolic tangent transformation, however here we focus on a variant of a normalization method used by Fulcher and Fornito (2016), the scaled robust sigmoid (SRS) normalization (Fulcher et al., 2013). This approach normalizes gene expression values based on an outlier-robust sigmoid function, where 〈x〉 represents the median and σ represents the standard deviation, before rescaling normalized values to a unit interval,
This normalization is robust to outliers and ensures equivalent scaling of expression values for each person. Figures 6C and E show the effectiveness of SRS in dealing with outliers and scaling. Other strategies for removing donor-specific effects involve using linear models applied to cross-donor combined data. For example, donor-specific effects can be treated as an additional batch effect and removed via linear modelling using the R/Bioconductor software package limma (Ritchie et al., 2015). While this approach removes inter-individual differences in gene expression, the linear model is sensitive to outliers. This correction in turn can be followed by SRS normalization to minimize the influence of outliers (Figure 6D).
To account for potential between-sample differences in gene expression, Burt et al. (2017) introduced within-sample normalization across genes before subject-specific normalization across samples (see Figure S4). Indeed, some samples can show a markedly different expression profile (extremely low or high values across all genes) from other samples in close spatial proximity that may be caused by measurement artefacts. The influence of these artefacts can be minimized by applying within-sample cross-gene normalization to quantify relative expression levels within a given sample, before normalizing across samples. To quantify the effect of the initial within-sample normalization, we calculated the correlations between expression values across genes and samples in two cases: i) when only cross-sample normalization for each gene was applied; ii) when both cross-gene normalization within sample as well as cross-sample normalization for each gene were applied. While the correlation values were relatively high (mediansample(r) = 0.969, IQR = 0.04; mediangene(r) = 0.856, IQR = 0.1), the initial within sample normalization was beneficial in reducing potential measurement artefacts in the data.
One additional consideration is that the spatial distribution of tissue samples across individual brains in the AHBA is not uniform. As such, different brains can contribute a different number of samples to any given brain region (Figure 1 and Figure 7). In light of this variability, we have two choices: we can either average all samples falling within a region, meaning that the average may be driven by a subset of individuals who have more samples localized to that region, or we can average at the level of each individual donor brain before aggregating across people (Figure 7). The latter approach ensures that each donor makes an equal contribution to the mean, provided that all genes are normalized to the same scale, however, the choice between those two options can be made depending on the researchers preference.
Step 6. Gene filtering
The AHBA consists of more than 20,000 unique genes, of which only a fraction is expected to show consistent regional variations in expression across the brain. Many analyses interested in transcriptomic signatures of IDPs will be primarily interested in these brain-specific genes. Various methods for pre-selecting genes of interest have been adopted, including selecting: (i) disease-specific genes (Rittman et al., 2016, Romme et al., 2017, Yokoyama et al., 2017), (ii) genes related to a priori hypotheses (Goyal et al., 2014, Komorowski et al., 2016, Krienen et al., 2016, Acevedo-Triana et al., 2017), or (iii) genes that are expressed consistently across all six AHBA brains, as quantified using the DS measure (Hawrylycz et al., 2015). Genes with high DS values demonstrate consistent patterns of regional variation in expression across the six AHBA subjects, and have been shown to be enriched for brain-related biological function (Hawrylycz et al., 2015). Filtering based on DS thus offers a more targeted approach for investigating relationships between IDPs and gene expression compared to the whole-genome analysis. The selection of disease-specific genes is traditionally based on previous GWAS studies (Satake et al., 2009, Simón-Sánchez et al., 2009, Höglinger et al., 2011, Ferrari et al., 2014, Ripke et al., 2014, Kouri et al., 2015), while gene selection based on an a priori hypothesis can depend on other factors such as a specific involvement in clinical disorders (Komorowski et al., 2016, Acevedo-Triana et al., 2017). One particular set of 19 genes demonstrating a selective enrichment in the upper layers of the human cortex compared to mouse [Human Supragranular Enriched (HSE) genes] has been extensively investigated and was found to be implicated in both the functional (Krienen et al., 2016) and topological organisation of the brain (Vértes et al., 2016, Romero-Garcia et al., 2018b). While selecting an appropriate gene filtering strategy rather than implementing the analyses on the whole-genome data is a highly research question-specific choice, investigating the relationships between IDPs and the patterns of gene expression using AHBA may benefit from the initial DS-based filtering.
Step 7. Accounting for spatial effects
The application of steps 1 to 6 results in a processed region × gene matrix of transcription level values, which can be used for further analyses. Typically, the data are linked to IDPs at either the regional level, or at the level of pairs of regions (i.e., patterns of correlated gene expression, or CGE, between pairs of brain regions are related to pair-wise measures of structural or functional connectivity between those regions). In both cases, we seek to understand how spatial variations in gene expression or CGE relate to spatial variations in the IDP. One complicating factor is that cortical regions that are located in close proximity are more likely to share similar gene expression patterns (Richiardi et al., 2015, Krienen et al., 2016, Vértes et al., 2016, Pantazatos and Li, 2017, Richiardi et al., 2017). A similar spatial autocorrelation of gene expression has been reported in the mouse brain (Fulcher and Fornito, 2016) and in the head of the nematode C. elegans (Arnatkevičiūtė et al., 2018). In some respects, this is an interesting and physiologically meaningful trend that warrants further investigation. However, if an IDP varies across the brain in a manner that reproduces a spatial gradient in gene expression, any apparent association between the IDP and gene expression measures may be driven by low-order spatial effects. Depending on the research question, especially when a direct relationship between an IDP and gene expression is evaluated, it is important to confirm that the identified association is stronger than what would be expected based on the spatial autocorrelation properties of gene expression (if such an effect is claimed).
A critical first step in understanding spatial biases in gene expression is to define distances between brain regions. These distances can be estimated by (i) calculating the Euclidean distance between regions; (ii) estimating the shortest distance within the grey matter volume; or (iii) estimating the shortest distance on the cortical surface (Figure 8), see supplementary material S4 for more details. The Euclidean distance is the simplest method, but it approximates distances as straight lines that do not respect cortical geometry. Calculating distances within the grey matter volume or on the cortical surface present a more biologically reliable approach, as distances are quantified considering the cortical geometry. A comparison of these methods, shown in Figure 8D, demonstrates that evaluating the Euclidean distance results in shorter distances, on average, compared to other methods, while anatomically constrained volume and surface-based approaches yield similar distance estimates in cortex. Note that only the Euclidean approach can be generalized for measuring distances to subcortex.
Spatial effects are most easily examined in the context of analyses of correlated gene expression (CGE). Such analyses focus on patterns of pair-wise or multivariate transcriptional coupling between regions, where transcriptional coupling is estimated as a correlation between regional expression profiles. Such measures of CGE can then be related to some inter-regional IDP, such as a measure of functional or structural connectivity (Richiardi et al., 2015, Fulcher and Fornito, 2016, Arnatkevičiūtė et al., 2018). Figure 9A shows that CGE decays sharply as a function of increasing spatial distance (on the pial surface) between regions in the cortex; relationships for other distance measures are qualitatively similar (see Figure S5). In line with previous findings in different species (Fulcher and Fornito, 2016, Arnatkevičiūtė et al., 2018), the dependence of CGE on distance can be approximated as an exponential (Figure 9A) and therefore the residuals of the exponential fit could be further used in the analyses (Figure 9B). Extending this relationship to the whole-brain including samples from both cortex and subcortex is complicated by a strong anti-correlation between cortical and subcortical gene expression (Hawrylycz et al., 2015). Thus, separate normalization procedures for cortical and subcortical regions and corrections for different types of region pairs can be applied (see supplementary material S5 and Figure S6 for more details). Note also that the dependence of CGE on distance can vary as a function of the gene set and parcellation (see Figure S7).
Characterizing and removing distance dependence can be relatively straightforward in analyses of CGE. Addressing spatial relationships in analyses of regional gene expression can be more challenging since distance is defined between pairs of regions, whereas a regional expression value is a property of a single region. Some promising strategies to deal with this issue involve comparing observed findings relative to an appropriate null model. One class of methods uses spatially constrained permutation of the original data. Arbitrarily-defined regions are not independent form one another, so some spatial constraints are required to account for these dependencies during permutation. As an example, a block permutation algorithm implemented by Vértes et al. (2016) accounted for spatial relationships between regions by aggregating areas into spatially contiguous subsets (blocks) according to the Desikan-Killiany atlas, and then permuting the resulting blocks rather than individual regions. Vasa et al. (2018) introduced a spatial permutation test based on the rotation of regional coordinates in the spherical projection such that the relative spatial relationships between regions are preserved. Matching between original and rotated coordinates, therefore, allows the regional measure to be permuted while controlling for spatial contiguity and hemispheric symmetry. Burt et al. (2017) used a spatial lagged autocorrelation model to characterise the spatial dependency between observed gene expression values. While these approaches provide some valid options, thorough evaluation of these null models is an important avenue of future work.
Conclusions
Imaging transcriptomics provides an unprecedented opportunity to uncover the molecular basis of large-scale brain organization. Given the rapid development of this field and its heavy reliance on publicly available data, there is a pressing need for standardized data processing pipelines that will facilitate the comparison of findings across studies. Our analysis delineates seven core steps of a basic workflow and demonstrates how choices at each step may affect the final expression measures. We summarize some preliminary recommendations for best practice in Table 4.
Considerable further work is required, particularly in the development of methods for addressing spatial correlations in the data. The development of standardized workflows will be essential to ensure reproducibility, particularly as gene expression atlases become more widely available and increase in their sophistication (Lein et al., 2007, Harris et al., 2010, Miller et al., 2014a). We have focused here on the processing of expression measures and removal of inherent biases in the data. Another area requiring further work is the development of appropriate statistical methods for relating IDPs to transcriptomic measures. For example, there is considerable variability in the software packages used for enrichment analyses, each of which makes different assumptions and uses different annotations of genes to gene ontology and other categories (Rhee et al., 2008). It will be important to understand how the available choices for analyzing these data affect reproducibility.
Conflicts of interest
The authors declare no competing financial interests.
Acknowledgments
We would like to thank A/Prof David Powell and Dr. Sarah Williams for valuable comments regarding the gene expression data processing. AF was supported by the Australian Research Council (ID: FT130100589) and National Health and Medical Research council (IDs: 1146292). BDF was supported by an NHMRC Early Career Fellowship (ID: 1089718).