Abstract
A major obstacle to treating Alzheimer’s disease (AD) is our lack of understanding of the molecular mechanisms underlying selective neuronal vulnerability, which is a key characteristic of the disease. Here we present a framework to integrate high-quality neuron-type specific molecular profiles across the lifetime of the healthy mouse, which we generated using bacTRAP, with postmortem human functional genomics and quantitative genetics data. We demonstrate human-mouse conservation of cellular taxonomy at the molecular level for AD vulnerable and resistant neurons, identify specific genes and pathways associated with AD pathology, and pinpoint a specific functional gene module underlying selective vulnerability, enriched in processes associated with axonal remodeling, and affected by both amyloid accumulation and aging. Overall, our study provides a molecular framework for understanding the complex interplay between Aβ, aging, and neurodegeneration within the most vulnerable neurons in AD.
Introduction
Selective neuronal vulnerability is a shared property of most neurodegenerative diseases1. The molecular basis for this selectivity remains unknown. In the early stages of Alzheimer’s Disease (AD), the most common form of dementia, clinical symptoms (such as memory loss) are caused by the selective degeneration of principal neurons of the entorhinal cortex layer II (ECII), followed by CA1 pyramidal cells in the hippocampus. In contrast, other brain regions, such as the primary sensory cortices, are relatively resistant to degeneration until later stages of the disease2-8.
AD is characterized by two major pathological hallmarks: accumulation of the Aβ peptide (the main constituent of amyloid plaques) and formation of neurofibrillary tangles (NFT, aggregates of hyperphosphorylated tau proteins which are thought to occur downstream of Aβ accumulation). Amyloid plaques do not accumulate in discrete brain areas. Rather, they are relatively widespread across most regions of the neocortex, followed by the entorhinal cortex and hippocampus of AD patients9,10. In contrast, NFTs exhibit the same regional pattern as neurodegeneration11-13. The co-occurrence of NFTs and neurodegeneration, as well as the fact that the best pathological correlate for clinical symptoms to date is the extent of NFT formation14-16, highlight the importance of tau pathology. Genetic analyses have revealed the importance of microglia in the disease. Yet the molecular drivers for the neuronal component of the pathological cascade that leads from Aβ accumulation to NFT formation and neurodegeneration are still largely unknown. While there might be regional differences in microglia identity, recent evidence suggest that microglia regional particularities are mainly driven by regional differences in the neighboring neurons17.
To understand and model cell type-specific vulnerability in AD, we must thus gain insight into the molecular-level differences between healthy vulnerable and resistant neurons that predispose some neurons, before any pathological process becomes visible, to develop tau pathology much faster than others. This requires high quality cell type-specific profiles of both vulnerable and resistant neurons. While some neuron types of relevance to AD were profiled in a mouse hippocampal study18, the most vulnerable neuron type in early AD (ECII) has not previously been studied ex vivo. In humans, Small et al. profiled whole EC and dentate gyrus (DG) in control and AD patients19, and the Allen Brain Atlas (ABA) provides a large dataset for a number of human brain regions20, but neither of these studies are cell type specific. A comprehensive dataset of neuron-specific AD-relevant profiles has been generated by Liang et al21. While valuable, human samples, including those in the studies cited above, are inevitably subject to degradation and postmortem changes and, in the context of AD, do not allow for direct probing of the effect of aging and Aβ accumulation on gene expression.
Furthermore, a key challenge to achieving a molecular understanding of selective neuronal vulnerability in AD is that vulnerability and pathology are likely not simply the result of a few genes acting in isolation. Previous work examining whole brain lysates from AD patients and non-demented individuals 22-24 demonstrated the promise of network analyses in AD, but these studies were limited to larger brain regions and thus could not address cell type-specific vulnerability. Deciphering the pathological cascade requires cell type-specific systems-level analysis and modeling of the complex molecular interactions that underpin the vulnerability of specific neurons to AD. Here, we provide the first molecular framework to understand the interactions between age, Aβ, and tau within neurons. Our approach (Fig. 1a) integrates the precision of cell type-specific profiling across age in the non-diseased mouse with computational modeling of human neuronal-omics (e.g., expression, interaction) data. It also combines this network modeling with human disease information from quantitative genetics data, ensuring relevance for human AD biology. The neuron-specific networks are available for download and exploration in an interactive web interface (alz.princeton.edu).
Results
Cell type-specific profiling of mouse neurons with differential vulnerability to AD
To investigate the selective vulnerability of neurons in AD, we generated cell type-specific expression profiles spanning the entirety of adulthood for vulnerable and resistant neurons using the bacTRAP (Bacterial Artificial Chromosome – Translating Ribosome Affinity Purification) technology in normal mice25,26. The bacTRAP technology enabled us to assay AD-relevant neuronal cell types with genome-wide coverage, measure transcripts ex vivo (as opposed to postmortem), and specifically capture actively translated (rather than all transcribed) genes.
We focused on the vulnerable principal neurons of ECII and pyramidal neurons of CA1, and on five types of resistant neurons, namely pyramidal neurons of CA2, CA3, primary visual cortex (V1), primary somatosensory cortex (S1), and granule cells of DG. Specifically, we constructed different transgenic mouse lines for each type of neuron, overexpressing the ribosomal protein L10a fused to the green fluorescent protein (GFP) under the transcriptional control of a driver specific to that type of neuron (Fig. 1b, Supplementary Note). The bacTRAP procedure then consists of immunoprecipitation of GFP-tagged polysomes from GFP-L10a-expressing cells, thus isolating actively translated neuron-specific mRNAs for RNA-sequencing. Previous work using bacTRAP or similar technologies (e.g., RiboTag) has demonstrated the strong enrichment for cell-type specific signal from cells expressing the tagged ribosomal protein 25-31.
We first performed multidimensional scaling analysis of the resulting bacTRAP data and found that the samples (3-12 biological replicates per neuron type per age) clustered primarily by tissue location, as expected, with a clear separation between the ECII, hippocampal regions (CA1, CA2, CA3, DG), and neocortical regions (S1 and V1) (Fig. 2a). We further verified the expression patterns of known neuron-type specific markers (Fig. 2b) and identified the top enriched genes for each neuron type in our data (Fig. 2c, Supplementary Table 1). Comparisons with the semi-quantitative in situ hybridization (ISH) data in the ABA (Supplementary note, Supplementary Fig. 1) show that our data includes the cell type-specific signals in these datasets, while providing substantially higher regional and quantitative, genome-scale coverage. Thus, our approach provides a high quality, genome-wide assay of ex vivo neuron-type specific expression in AD-vulnerable and resistant regions of the brain.
To characterize molecular signatures for AD vulnerable cells in the nondisease state, we compared gene expression profiles of ECII and CA1 neurons against the five AD-resistant neuron types in wild-type mice. Among the significantly enriched processes we found many AD-relevant pathways (Fig. 2d, Supplementary Table 2). Furthermore, known Alzheimer’s Disease genes were significantly enriched in vulnerable neurons (KEGG AD genes, Wilcoxon rank sum test, p-value < 9.41e-7). These results support the hypothesis that intrinsic properties of ECII and CA1 neurons, present even in healthy individuals, render these neurons as preferential substrates for the development of AD pathogenesis.
Neuron-specific spatial homology between mouse and human
An important question for interpreting model organism studies of AD is whether the molecular identity of neurons is conserved between mouse and human. Previous comparisons using spatially resolved semi-quantitative ISH32 or transcriptomics and proteomics without cellular resolution33,34 have suggested that mouse and human regional expression patterns are correlated, but the conservation of expression across neuronal subtypes requires further exploration. In humans, fully quantitative data at cell type-specific resolution is lacking across the regions most relevant for AD. However, the discrete brain structure microarray data from the ABA20 captures enough regional specificity for an expression-based comparison between the seven mouse neuronal subtypes and 205 human brain regions. We calculated a spatial homology score between molecular signatures for each mouse neuron type and each human brain region, generating 1,435 pairwise spatial homology measurements. Remarkably, of all these possible mappings, we found a near perfect match between each mouse profile and its corresponding relevant human brain region (Fig. 3, Supplementary Table 3). This confirms the validity of leveraging the power of ex vivo neuron-specific molecular profiles in the mouse to gain relevant insight into the molecular characteristics of the most vulnerable neurons in human AD. While there are differences in lifespan and other factors relevant to AD that may facilitate the degeneration of human neurons35, our comparison supports the notion that physiological differences between vulnerable and resistant neurons are conserved. Our study provides, to our knowledge, the first systematic evidence that the molecular identity of AD-relevant cell types is conserved between the mouse and human brain. This supports our approach of combining the cell-type-specific signals in healthy mouse neurons with the AD-relevant signals in large collections of human data.
In silico modeling of gene networks in AD-relevant neuronal cell types
AD neurodegeneration is the result of multiple molecular-level changes to the system of interacting genes and pathways within vulnerable neurons. We model this system with cell type-specific functional networks, i.e., maps of functional relationships between proteins in the specific cellular contexts of the different types of neurons. Specifically, a functional relationship represents the common involvement of two proteins, either directly or indirectly, in a biological pathway in the cell type of interest. We recently developed a regularized Bayesian network integration method to construct tissue-specific functional networks36. These network-level models are an effective first approximation of the functional landscape of a cell and have been successfully applied to the study of diseases36-38. It was, however, previously impossible to apply this method to construct networks at neuron-specific resolution because of limitations in high-quality cell type-specific gene expression annotations in human. Given the strong concordance between our mouse neuron-specific molecular signatures and their corresponding human brain regions, we used the signatures as positive examples to extract cell-specific signal from a large human data compendium including thousands of gene expression, protein-protein interaction, and shared regulatory profile datasets to construct human neuron-type specific functional networks. We have made the resulting seven in silico human genome-wide network models, each representing one AD-vulnerable or resistant neuron type in the non-disease state, available both for download and dynamic, query-based exploration at http://alz.princeton.edu.
To identify functional characteristics and differences specific to neuron types vulnerable or resistant to AD, we examined the functional cohesiveness of biological processes (i.e., a measure of network connectivity among genes known to be part of that process) in each corresponding functional network model (Supplementary Table 4). We found that pathways neuroprotective in AD39-41 appeared more cohesive in AD resistant neurons than in vulnerable neurons, namely the transforming growth factor beta receptor signaling pathway (in DG) and the canonical Wnt signaling pathway (in DG, S1 and V1). On the other hand, mitochondrial processes like apoptotic mitochondrial changes and mitochondrial fission were cohesive in CA1 and ECII respectively, which is consistent with the saliency of mitochondrial dysfunction at early stages of the disease42. Strikingly, we found that the processes with largest functional cohesiveness in vulnerable compared to resistant networks were all related to microtubule organization. This is the first evidence that these tau-regulated processes may intrinsically differ between healthy vulnerable and resistant neurons.
Identifying AD-associated genes through integration of AD GWAS and the ECII functional network
To identify potential AD-associated genes, we then combined these network models of vulnerable neuron function with unbiased disease signal from human quantitative genetics data. Specifically, we developed an approach, Network-Wide Association Study 2.0 (NetWAS 2.0), that extends our previously described36 framework with a probabilistic subsampling method to take into account gene-level confidence from quantitative genetics studies. This machine learning approach leverages genome-wide association studies (GWAS) in conjunction with a functional network specific to the region of interest to identify cell type-specific network patterns that are predictive of a disease of interest, reranking all genes based on disease relevance significantly better than the original GWAS36.
We applied NetWAS 2.0 using the network model for the most vulnerable neuron (ECII) to reprioritize genes based on an AD GWAS for Braak stage (NFT pathology-based staging)43 (Supplementary Table 5). Notably, MAPT (microtubule-associated protein tau, the gene that encodes tau, the primary component of NFTs) was ranked first among all 23,950 reprioritized genes. MAPT was not even nominally significant in the initial GWAS (initial GWAS tau p-value = 0.269). This illustrates the power of NetWAS 2.0 to extract (through cell type-specific functional networks) important disease-relevant signals that may be hidden in the original GWAS. Overall, while the original GWAS for Braak stage was somewhat enriched for known AD genes (genes from the KEGG AD gene set, Wilcoxon rank sum test, p-value < 0.199), the reprioritized gene ranking was much more significantly enriched for these genes (Wilcoxon rank sum test, p-value < 1.60e-4). We also observed strong enrichment of genes involved in regulation of Aβ accumulation and NFT formation (Wilcoxon rank sum test, p-value < 1.29e-10, p-value < 2.2e-16, respectively; gene sets curated by a curator independent from the analyses, Supplementary Table 6) (Fig. 4a, b). Known AD neuroprotective pathways, like neurotrophin signaling44 and Wnt signaling pathway40,41 were also predicted to be strongly associated (Supplementary Table 7). Lastly, we highlight the association of neurotransmitter secretion with AD (FDR < 2.57e-24). Dysregulation of this pathway is one of the most prominent effects of Aβ accumulation45, and the resulting hippocampal network hyperactivity was suggested to be a crucial contributor to AD pathogenesis46-48. As the AD signal in NetWAS 2.0 comes only from unbiased GWAS data (i.e., no prior AD disease knowledge was incorporated), the NetWAS 2.0 results thus provide a data-driven, unbiased prioritization of AD-associated processes out of the many pathways that, over time, have been associated with tau pathology.
Beyond these well-characterized associations, one of the most significantly enriched pathways in the NetWAS 2.0 results was a microtubule-related process, regulation of microtubule cytoskeleton organization (FDR < 1.38e-27) (Supplementary Table 7). This is consistent with our connectivity analysis, where we discovered that microtubule-regulating pathways were particularly cohesive in vulnerable neurons. Together, these results support a hypothesis that microtubule-regulating pathway genes may cooperate with MAPT for the formation of NFTs in vulnerable neurons. Our data also strongly support a role for mRNA splicing and transport in AD pathogenesis (RNA splicing, FDR = 4.48e-10; RNA transport, FDR = 3.32e-16). RNA binding proteins in these processes have recently emerged as major players in various non-AD neurodegenerative diseases49, and recent studies suggested possible involvement in AD (TIA1 protects against tau-mediated degeneration50, CELF1 is one of the main GWAS hits51, and the activity of ELAVL proteins is altered in AD brains52).
Association of NetWAS 2.0 genes with AD pathology
We next investigated the link between key drivers of the AD pathological cascade (Aβ accumulation and age) and AD-vulnerability-associated genes identified by NetWAS 2.0 analysis. To enable the direct analysis of the ECII-specific effects of Aβ accumulation in AD, we crossed our ECII-bacTRAP mice with an AD mouse model (APP/PS1 mice). These mice overexpress mutant APP and PSEN1 and have increased levels of Aβ in the cortex and hippocampus53. We profiled ECII neurons at 6 months of age, when the first plaques are starting to form (Supplementary Table 8). Genes significantly downregulated in these APP/PS1 mice were strongly enriched in our top NetWAS 2.0 gene predictions (Wilcoxon rank sum test, p-value < 9.21e-14). Additionally, genes associated with aging in ECII of wild-type mice (24 month-versus 5-month-old mice) (Supplementary Table 8) were strongly enriched at the top of our ranking (Wilcoxon rank sum test, p-value < 4.19e-13). Our finding that Aβ and aging modulate the expression of genes NetWAS 2.0 predicts to be associated with AD indicates that these genes might connect Aβ accumulation and NFT formation in the age-dependent pathological cascade within vulnerable neurons.
To examine the possible relationship between top NetWAS 2.0 genes and human AD pathology directly, we then used data from two independent human datasets. The Adult Changes in Thought study (ACT)54 provides paired gene expression data and pathology measurements from hippocampus samples of elderly individuals at risk for dementia. For each gene, we calculated the correlation between expression level and amount of amyloid plaques. We found that expression of our top gene predictions was significantly more correlated with amyloid plaque amount than either background or genes implicated in the original Braak stage-GWAS (bootstrap p-value < 0.0001, Fig. 4c). Furthermore, our predictions were very significantly enriched in genes differentially downregulated in tangle-bearing ECII neurons of sporadic AD patients measured in a different study (relative to non-tangle-bearing neurons, Wilcoxon rank sum test, p-value < 2.2e-16)55. Together, this consensus of results indicates that the top NetWAS 2.0 gene predictions may highlight novel genes that participate in the AD pathological cascade within neurons.
Identification of AD-associated functional modules
To better understand the processes and pathways through which these genes are associated with NFT formation and AD, we used a shared-nearest-neighbor-based community-finding algorithm56 to cluster the genes with top NetWAS 2.0 ranks into functional modules within the ECII network (Fig. 4d, Supplementary Table 9). We identified four modules, each enriched in distinct AD-associated processes, including RNA splicing (module A), metabolism (module B), neurotransmitter release (module C), and neuron differentiation (module D). Interestingly, several pathways were shared across multiple modules, including microtubule organization (A, C, D) and axonogenesis (B, C, D), supporting a central role for these processes in AD pathogenesis (Supplementary Table 9).
We then further characterized these functional modules by examining their relationship to aging as well as Aβ accumulation and NFT formation in vulnerable neurons (Supplementary Table 10). We found that the neurotransmitt er-secretion-related module C genes showed decreased expression in the context of Aβ accumulation in the mouse (our APP/PS1 mouse profiling) and have significantly lower expression in aged wild-type mice. Thus, module C is a good candidate for linking Aβ accumulation with aging in the AD pathological cascade. Furthermore, module C was the only module with ECII-specific signal for tau pathology (i.e., significantly enriched in genes downregulated in NFT-bearing ECII neurons of AD patients, but not strongly correlated with tau in non-ECII regions of the human hippocampal formation (ACT study)). Additionally, only module C demonstrated significantly tighter cohesiveness in ECII versus resistant neurons (Student’s t-test, intersection-union test, p-value < 0.0135). Thus, while modules A, B, and D may represent pathways common to general AD progression in any neuron type, module C may confer the surplus of susceptibility specific to ECII neurons. As this vulnerability-specific module represents processes related to both axon structural remodeling and presynaptic excitability, it is tempting to speculate that specific AD vulnerability of ECII neurons may be linked to their lifelong maintenance of a state of high axonal plasticity (Fig. 4e).
Functional association of α-synuclein, tau, and PTB in ECII neurons
To identify genes in this vulnerability-specific module that underlie ECII susceptibility in relation to early stage AD, we examined the connectivity and centrality of the module members across all seven neuron-specific networks. Intuitively, two genes are tightly connected in a specific neuronal context if they have a high confidence link in the functional network for that neuronal type; this suggests involvement of these genes in shared processes. A highly central gene is one that has many high confidence links across the network, indicating involvement of this gene in a wide array of processes. Within module C, MAPT (tau) was the most centrally connected out of all 668 module C genes, and our analysis pointed to SNCA (α-synuclein) as potentially driving the ECII specificity of this vulnerability-specific module. This is based on the finding that not only are MAPT and SNCA tightly connected to each other in the ECII network, but α-synuclein also has the highest differential network centrality between ECII and the resistant neurons. This suggests that α-synuclein is associated with many more processes in ECII neurons compared to other types of neurons, that tau cooperates with α-synuclein in many of these processes, and that α-synuclein may contribute to NFT formation upon dysregulation of these processes. The novel association between MAPT and SNCA in the context of AD neuronal vulnerability is supported by previous work demonstrating physical as well as functional interaction between these two proteins in other neurodegenerative disorders (reviewed in 57). For example, tau and α-synuclein have been previously described to influence each other’s aggregation into pathological lesions in Parkinson’s disease as well as in mice overexpressing these genes58-61. However, a role for endogenous α-synuclein in the formation of NFT has not been previously described, although a large proportion of AD patients present α-synuclein pathology62.
PTB, a regulator of alternative splicing63, was the most highly connected protein to both α-synuclein and tau in the ECII network. Interestingly, PTB was detected in a screen for tau splice factors as one of the regulators of tau exon 1064. Regulation of exon 10 is of high relevance for tau pathology, as its inclusion gives rise to four-rather than three-microtubule binding repeat tau (4R- and 3R-tau respectively). An imbalance between 4R- and 3R-tau has been repeatedly shown to give rise to tau pathology in different tauopathies as well as in AD (reviewed in 65). Furthermore, NFTs in ECII neurons have been shown to be devoid of 4R-tau, in contrast to other hippocampal neurons that have both tau isoforms66,67. Regulation of tau splicing by PTB in ECII neurons could initiate tau pathology specifically in these neurons, potentially contributing to their vulnerability.
Discussion
Little is known about the molecular basis of selective neuronal vulnerability in AD and the molecular pathways that lead to NFT formation and neurodegeneration. Furthermore, no animal model comprehensively recapitulates every aspect of human AD pathogenesis. Here, we provide an integrative and unbiased framework for the study of this disease that combines advantages of both mouse models and of human data. Our approach 1) models AD vulnerable and resistant human neurons in silico with high-quality cell type-specific molecular profiles generated in the non-diseased mouse and a compendium of publicly available human data, 2) leverages human quantitative genetics to identify AD-relevant genes and pathways within these in silico models, and 3) experimentally tests in the mouse the effect of age and Aβ, a major AD endophenotype, on the predicted AD genes, elucidating the pathological cascade of AD. Our approach is general and applicable to any complex disease with selective cell vulnerability where relevant human GWAS data are available. For neurodegenerative diseases with a complex multicellular pathogenesis, the approach also allows for the identification of cell type-specific pathological pathways.
Using this approach, we identify molecular mechanisms underlying neuronal vulnerability in AD. In addition to significantly predicting many of the gene candidates previously associated with AD, we also outline novel pathways linking Aβ and tau pathology. Specifically, our unbiased, data-driven analyses place microtubule dynamics at the center of AD pathogenesis. We find that this process is both closely associated with NFT formation and one of the most salient characteristics of the most vulnerable neuronal subtype. As key regulators of neuronal architecture and intraneuronal trafficking, microtubules are the endpoint of many neuronal functional processes. Thus, it is important to determine which specific pathways lead to dysregulation of microtubule dynamics in the context of AD. While a conclusive answer to this question requires further study, our analyses of ECII-vulnerability highlight two potential candidate processes: axonogenesis (which includes tau) and synaptic vesicle release (which includes α-synuclein). Both have been previously linked to microtubule remodeling68,69 and are connected to microtubule genes within the vulnerability-specific module. Such interactions could be more prominent in ECII neurons (known to display considerable axon arborization70) than in other cell types – which could confer exceptional axonal plasticity to ECII neurons, but could also be responsible for ECII vulnerability.
Materials and Methods
Animal models
All experiments were approved by the Rockefeller University Institutional Animal Care and Use Committee (RU-IACUC protocols #07057, 10053, 13645-H), and were performed in accordance with the guidelines described in the US National Institutes of Health Guide for the Care and Use of Laboratory Animals. Mice were housed in rooms on a 12 h dark/light cycle at 22 °C and maintained with rodent diet (Picolab) and water available ad libitum. Mice were housed in groups of up to five animals. All bacTRAP mice and APP/PS1 mice (B6.Cg-Tg(APPswe,PSEN1dE9)85Dbo/Mmjax purchased from the Jackson lab) were maintained in a heterozygous state by crossing them with non-transgenic C57Bl/6J mice (also purchased from the Jackson Lab). For cell type-specific profiling in wild-type mice, only male mice were used, and the tissue from two males were pooled. Each type of neuron was profiled at 4-5 months, 12 months, and 24 months. For comparing ECII neurons in wild-type and APP/PS1 mice, both male and female mice were used, and each sample corresponded to the tissue of one mouse.
bacTRAP transgene construction
In order to construct cell type-specific bacTRAP mice, we searched for drivers specific to each type of neuron. For that purpose, we mined the ABA and GENSAT for genes expressed selectively in the different cell types of interest. We selected the following genes: Rasgrp2 and Sh3bgrl2 (ECII principal neurons), Sstr4 (CA1 pyramidal neurons), Cacng5 (CA2 pyramidal neurons), Gprin3 (CA3 pyramidal neurons), Calca (V1 pyramidal neurons), Cartpt (S1 pyramidal neurons) for enriched expression in the cell type of interest compared to neighboring cell types. Regulatory regions of these genes should drive expression in the corresponding neuron types. We thus used these genes to construct corresponding bacTRAP mice according to previously described procedures71. Specifically, we obtained the bacterial artificial chromosomes (BACs) where the open reading frame (ORF) for each of these genes is most centrally located, ensuring that both upstream and downstream regulatory sequences are driving the expression of the bacTRAP construct: RP23-307B16 (Sh3bgrl2), RP23-199D5 and RP24-344N1 (Rasgrp2), RP23-126C5 (SSTR4), RP23-329L1 (Cacng5), RP23-181A2 (Calca), RP24-68J22 (Cartpt) (Children’s Hospital and Research Center at Oakland). We modified each BAC to place the eGFP-L10a cDNA under the control of each gene’s regulatory sequences71. For each gene, we cloned by PCR a small homology arm corresponding to approximately 500bp of sequence upstream of the ORF, stopping 5 bp before the ORF (sequences of the small homology arms in supplementary table 11), in the S296 shuttle vector (a pLD53.SC2 plasmid containing the cDNA for eGFP-L10a). For each BAC, we transformed the BAC and the S296 vector containing the corresponding small homology arm into recA-expressing bacteria. We monitored the proper integration of eGFP-L10a at the beginning of each ORF using southern blot. We prepared a purified BAC stock using Cesium Chloride gradient, and linearized the BAC with PI-SceI. The Rockefeller University Transgenic Services performed pronuclear injection of the linearized BACs on a C57Bl/6J (Jackson Lab) background. F1 and F2 of the different founder lines were then tested for proper expression pattern. One of the founder lines with the Sstr4 BAC (Sstr4#19 line) presented ectopic expression in granule cells from the dentate gyrus and no expression in CA1 neurons. We thus used Sstr4#19 for granule cell profiling. We used another founder line (Sstr4#7) for CA1 neurons. We also separately obtained Cck- and Gprin3-bacTRAP mice, which were previously described26,72.
Cell type-specific molecular profiling
To isolate cell type-specific mRNA, bacTRAP mice from the different transgenic lines were decapitated after slight CO2 intoxication, and brains were promptly taken out. For each transgenic line, we dissected the minimal area where transgene expression is restricted to the cell type of interest (for ECII bacTRAP lines, we made a coronal cut around -3.3 mm antero-posterior (AP); for Sh3bgrl2-bacTRAP, we then scooped the hippocampus off the tissue caudal to the cut, discarded it, and kept the tissue located ventral to the rhinal fissure; for Rasgrp2-bacTRAP, we took all the tissue caudal to the -3.3 mm AP cut, and ventral to a horizontal cut around -3mm dorso-ventral (DV); for SSTR4#7- and SSTR4# 19-bacTRAP lines, we used all the hippocampus; for the CCK-, CACNG5- and Gprin3-bacTRAP lines, we used all the hippocampus rostral from a coronal cut around -3.3mm AP; for CALCA-bacTRAP, we made a sagittal cut around +3.6 mm medio-lateral (ML) on each side, a coronal cut around -3 mm AP and we extracted the cortex respectively dorsal and caudal to these cuts, while cutting out the mEC; for CARTPT-bacTRAP we made coronal cuts around 1.75 mm AP, -0.25 mm AP, and -2.25 mm AP and for each slice, we dissected out the part of the cortex that contains the somatosensory cortex).
We then performed bacTRAP purification following the previously described procedure28 except for two differences. First the volume of lysis buffer used for tissue homogenization depends on the size of each particular brain region. The buffer volumes for each bacTRAP line are shown in Supplementary Table 11. Second, we used RNeasy Plus Micro Kit (Qiagen) to purify RNA after bacTRAP, and RNA was thus detached from beads using the RLT Plus buffer supplemented with 1% β-mercaptoethanol (MP biomedicals). RNA integrity was evaluated with a bio-analyzer RNA 6000 pico chip (Agilent) and RNA quantified by fluorescence detection with Quant-It Ribogreen RNA reagent (ThermoFisher). All samples included in the study had RNA Integrity Numbers above 7. Five ng of RNA were then used for reverse-transcription with Ovation RNAseq v2 kit (NuGEN). cDNAs were cleaned up using a QIAquick PCR purification kit (Qiagen). Doublestranded cDNAs were quantified by fluorescence detection using Quant-IT Picogreen dsDNA reagent (ThermoFisher). cDNAs (200 ng) were sonicated in 120ul volume using a Covaris S2 ultrasonicator (duty cycle, 10%; intensity, 5; cycles/burst, 100; time, 5 minutes) to generate 200bp fragments on average. The fragmented cDNAs were then used to construct sequencing libraries using TruSeq RNA sample prep kit v2 (Illumina). Library concentration was evaluated using bioanalyzer, and libraries were multiplexed. Multiplexes were then sequenced at the Rockefeller University genomics resource center with a HiSeq 2500 sequencer (Illumina).
Histology
To study the expression pattern of the bacTRAP transgene, bacTRAP mice were transcardially perfused with 4% paraformaldehyde, brains were dissected out, immersion fixed for one hour in 4% paraformaldehyde, frozen in OCT compound (TissueTek), and 40 μm-thick section were cut on a CM3050 S cryostat (Leica). Sections were permeabilized in PBS with 0.1% Fish Gelatin (Sigma), 2% normal goat serum (Jackson ImmunoResearch), and 0.1% triton X-100 and then stained overnight at 4°C in PBS with 0.1% Fish Gelatin and 2% normal goat serum with a chicken anti-GFP antibody (1/300). The primary antibody was detected with an Alex 488-donkey anti-chicken secondary antibody (1/300). After the last wash, sections were mounted with Prolong Gold Medium containing DAPI. Sections were imaged using a Zeiss LSM 510 META laser scanning confocal microscope. Images were minimally processed using Photoshop (Adobe Systems) to enhance brightness and contrast for optimal representation of the data.
RNA-seq analysis
RNA-sequencing reads were mapped to the mouse genome (Ensembl 75) using STAR (version 2.3.0e, default parameters)73, and gene-level counts were quantified using htseq-count (version 0.9.1)74. Genes were subjected to an expression detection threshold of 1 count per million reads per gene in more than 3 samples and oligodendrocyte, endothelial, and ependymal cell gene clusters were excluded to focus on the neuronal signal. Differential expression and multidimensional scaling analysis were performed using edgeR (version 3.8.6)75.
Spatial homology analysis
Human brain microarray data were downloaded from the ABA (http://human.brain-map.org/static/download)76. Brain regions that were measured in fewer than 3 out of the 6 subjects profiled were excluded from downstream analysis to ensure robustness.
We calculated an ontology-aware spatial homology score between each of our 7 mouse neuron types and each of the 205 human brain regions robustly measured by the ABA, as follows:
, for mouse neuron type i and human brain region j (thus Tm = 7 and Th = 205), human gene g. where are respectively the mean and standard deviation of expression for the mouse functional ortholog77 of gene g in mouse neuron type i (in log2(rpkm)). Ni is the number of samples for neuron type Ti while , are the mean and standard deviation of expression values for the mouse functional ortholog for unrelated neuron types (e.g., for neuron type hippocampus CA2, would be the mean expression of all non-hippocampus neuron types). The quantile used was q = 0.9. Normalized microarray expression values as processed by the Allen Institute of Brain Science were used to calculate the corresponding scores (, etc.) for gene g in human. Intuitively, is a normalized enrichment score for the mouse functional ortholog of gene g in tissue Ti, of the mouse. Si is the set of genes that are both highly expressed (high ) and highly specific (high ) to tissue Ti, thus providing a strong molecular signature for that tissue. This signature is combined with the enrichment scores from human to produce a final spatial homology score.
Construction of functional networks
We then used these cell type-specific molecular signatures to construct a cell type-specific gold standard (see Gold standard section below), which we then used to integrate a human genome-scale data compendium (see Human data compendium section below) to construct cell type-specific functional networks based on our tissue-specific regularized Bayesian integration method36 (see Data integration section below).
Gold standard
The cell type-specific gold standard was constructed by combining a functional interaction standard and cell type specific signatures. The functional interaction gold standard was constructed based on either the presence or absence of gene co-annotations to expert-selected biological process terms from the Gene Ontology (GO) based on whether the term could be experimentally verifiable through targeted molecular experiments. For each of these 337 selected GO terms, we obtained all experimentally derived gene annotations (i.e., annotations with GO evidence codes: EXP, IDA, IPI, IMP, IGI, IEP). After gene propagation in the GO hierarchy, gene pairs co-annotated to any of the selected terms were considered positive examples, whereas gene pairs lacking co-annotation to any term were considered negative examples, except in cases where the two genes were (i) separately annotated to highly overlapping GO terms (hypergeometric p-value < 0.05) or (ii) coannotated to higher-level GO terms that may still indicate the possible presence of a functional relationship.
We then combined our expanded cell type-specific molecular gene signature sets (q = 0.75) with this functional interaction standard by defining the four classes of edges (C1, C2, C3, and C4) as described in Greene et al36, with the adjustment of allowing genes annotated to nervous system tissues to be considered for the C2 negative example class (to emphasize cell type-specificity in relation to other general nervous system genes, rather than excluding them based on the hierarchical tissue ontology as in Greene et al.).
Human data compendium
We downloaded and processed 31,157 human interaction measurements and brain expression-based profiles from over 24,000 publications, as well as experimentally defined transcription factor binding motifs, chemical and genetic perturbation data, and microRNA target profiles.
Physical interaction data were downloaded from BioGRID (version 3.2.118)78, IntAct (Nov 2014)79, MINT (2013-03-26)80, and MIPS (Nov 2014)81. Interaction edges from BioGRID were discretized into five bins (0-4), depending on the number of experiments supporting the interaction. For all other interaction databases, edges were discretized based on the presence or absence of an interaction.
A total of 6,907 expression profiles from 268 human brain expression datasets were downloaded from the Gene Expression Omnibus (GEO)82. Duplicate samples were collapsed, and genes with values missing in over 30% of the samples were removed. All other missing values were imputed83. Normalized Fisher’s z-transformed expression scores were calculated per pair of genes and discretized into the corresponding bin: (-∞, -1.5), [-1.5, -0.5), [-0.5, 0.5), [0.5, 1.5), [1.5, 2.5), [2.5, 3.5), [3.5, 4.5), [4.5, ∞).
Experimentally defined transcription factor binding motifs were downloaded from JASPAR84, and the 1 kb upstream region of each gene was scanned for presence of binding motifs using FIMO85 from the MEME software suite86. For each pair of genes, the Fisher z-transformed Pearson correlation of binding profiles was calculated and discretized into one of the corresponding bins: (-∞, -1.5), [-1.5, -0.5), [0.5, 0.5), [0.5, 1.5), [1.5, 2.5), [2.5, 3.5), [3.5, 4.5), [4.5, ∞).
Chemical and genetic perturbation and microRNA target profiles were downloaded from the Molecular Signatures Database (MSigDB, c2:CGP and c3:MIR gene sets, respectively)87. For each pair of genes, similarity based on the weighted mean of number of shared profiles (weighted by the specificity of the profile (1/len(genes)) was calculated and discretized into the corresponding bin: (-∞, -1.5), [-1.5, -0.5), [0.5, 0.5), [0.5, 1.5), [1.5, 2.5), [2.5, 3.5), [3.5, 4.5), [4.5, ∞).
Data integration
We applied our tissue-specific regularized Bayesian integration method36 for each of the 7 neuron types to train a naïve Bayesian classifier by comparing against the positive and negative examples from the cell type-specific gold standard. For each cell type, we constructed a binary class node representing the indicator function for whether a pair of genes have a cell type-specific functional relationship, conditioned on additional nodes representing each of the datasets in the data compendium. Each model was then applied to all pairs of genes in the data compendium to estimate the probability of tissue-specific functional interactions. All code for data integration is available in our open-source Sleipnir library for functional genomics88.
Network connectivity analysis
We calculated a z-score for cohesiveness of various biological process GO terms in each of the neuron-specific networks: where XGO is the mean posterior probability of all gene pairs within a particular GO term, and Xnull, SEnull are respectively the mean and standard error of the null distribution (based on gene sets randomly sampled within all genes with a GO annotation, with equivalent size to the GO term in question).
NetWAS 2.0 on AD GWAS
Here, using an AD GWAS for Braak stage (NFT pathology-based staging)43 as gold standard and the ECII-specific functional network neighborhoods as features, we applied NetWAS 2.0 with n=10,000 to rank each of the 23,950 genes for potential association to AD.
We trained support vector machine classifiers89 using (i) nominally significant (p-value < 0.01) GWAS genes as positive examples, (ii) randomly sampled non-significant genes with probability proportional to their GWAS p-value as negatives, and (iii) the network neighborhoods of genes as features. Thus, genes with lower p-values (i.e., more significant) would have a lower chance of being chosen as a negative example than genes with higher p-values. Gene-level p-values were obtained using the versatile gene-based association study 2 (VEGAS2, version:16:09:002) software90.
To ensure robustness, we independently sampled n such sets of negatives and trained n support vector machines. After applying each of the support vector machines to re-rank genes, we aggregated the n rankings into a final NetWAS 2.0 gene ranking. Intuitively, the key advance of the NetWAS 2.0 method is that it leverages the GWAS p-values as opposed to treating all non-significant genes as having equal probability of being negative examples as in the original NetWAS method36.
Establishment of the expert-curated gene set
To establish amyloid and NFT gene sets, we recruited a laboratory member, specialized in AD, completely independent from our study, who was unaware of any of the NetWAS 2.0 results. We asked the curator to search for genes involved in tau phosphorylation, aggregation, cleavage, folding, localization, clearance (for the NFT set), and in Aβ production, clearance, aggregation (for the amyloid set). The searches were done with PubMed, and included publications released between January 2000 and April 2017.
Analysis of NetWAS 2.0 predictions
Comparison against the Adult Changes in Thought study
We downloaded paired RNA-seq trascriptomes and neuropathological quantifications from the Adult Changes in Thought (ACT) study (http://aging.brain-map.org/download/index)54. We then calculated, for every gene, the Fisher’s z-transformed absolute Spearman’s correlation between its expression in the hippocampus, and IHC amyloid plaque load (ffpe) across all samples.
To aggregate the scores without selecting an arbitrary cutoff, we calculated an amyloid plaque association score for each percentile cutoff averaging the transformed correlation scores for the top x% of NetWAS 2.0 genes (with x=1%, 5%, 10%, 15%, …,100%). We compared these scores against the counterparts calculated based on ranking by the p-values in the Braak GWAS study. For the background distribution, we sampled an equivalent number of genes 1000 times per percentile cutoff.
To calculate bootstrapped 95% confidence intervals for the NetWAS 2.0 amyloid plaque association scores, we subsampled genes with replacement within each percentile cutoff.
Analysis of Dunckley et al. ECII expression dataset
We downloaded microarray expression profiles measuring tangle-bearing and control LCM ECII neurons 55. Data normalization and differential expression analysis were performed using limma (version 3.22.7)91. Genes with Benjamini-Hochberg multiple hypothesis test-corrected FDR ≤ 0.05 were considered significantly differentially expressed.
Identification of functional modules
To identify functional modules represented in our top NetWAS 2.0 genes, we created an ECII subnetwork using the top 10% (i.e., top 2,395) of NetWAS 2.0 ranked genes. Then, we used an approach based on shared k-nearest-neighbors (SKNN) and the Louvain community-finding algorithm56 to cluster the network into distinct modules. This approach alleviates the effect of high-degree genes and accentuates local network structure by connecting genes that are likely to be functionally clustered together in the ECII network. We calculated the ECII SKNN network by using the number of shared top k-nearest neighbors between genes as edge weights and taking the subnetwork defined by the top 5% of edge weights as the subnetwork for downstream analysis. The clustering presented here was calculated with k=50, but we confirmed that the clustering was robust for k between 10 and 100. Enrichment of Gene Ontology biological process terms and of other experiment-derived gene sets of interest in each module were calculated using one-sided Fisher’s exact tests, with Benjamini-Hochberg multiple hypothesis test correction to calculate FDR.
Gene connectivity analysis
For each gene g in each cell type-specific functional network, we calculated a z-score for gene connectivity, a measure of how central a gene is in the network: where is the average posterior probability of edges incident on gene g. μ, σ, and n are respectively the mean, standard deviation, and number of all edges in the network.
This is an equation line. Type the equation in the equation editor field, then put the number of the equation in the brackets at right. The equation line is a one-row table, it allows you to both center the equation and have a right-justified reference, as found in most journals.
End Matter
Author Contributions and Notes
J.-P.R., V.Y., O.T., and P.G. conceived and designed the research with inputs from M.H. and P.H. J.-P.R. generated mice with help from E.F.S., N.H. and L.B., J.-P.R., Z.P., S.K. and C.A. performed bacTRAP experiments. V.Y. and O.G.T conceived the computational analyses. V.Y. performed all bacTRAP data analyses, generated and analyzed functional networks, reprioritized genes, and re-analyzed publicly available datasets. V.Y. and J.-P.R. analyzed results from the computational analyses. M.F. curated amyloid and NFT lists. J.-P.R., V.Y., O.G.T. and P.G. wrote the manuscript with inputs from M.F. and A.B.C.
Funding
V.Y. was supported in part by US NIH grant T32 HG003284. O.G.T. is a senior fellow of the Genetic Networks program of the Canadian Institute for Advanced Research (CIFAR). This study was supported by NIH R01 GM071966 to O.G.T, the United States Army Medical Research and Material Command (USAMRMC), Award No. W81XWH-14-1-0046 to J.-P.R., the National Institute on Aging of the NIH, Award RF1AG047779 to P.G., the Fisher Center for Alzheimer’s Disease Research to P.G., the JPB Foundation to P.G. and Cure Alzheimer’s Fund to P.G. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the sponsors.
Acknowledgments
We thank Ana Milosevic, Revathy Chottekalapanda, Heike Rebholz, Markus Riessland, Benjamin Kolisnyk, Ruth Dannenfelser, Rachel Sealfon, Chandra Theesfeld, Ran Zhang, for their critical reading of the manuscript, Elisabeth Griggs for assistance with graphic design, transgenic services at Rockefeller University, Bioimaging Resource Center, Genomics Resource Center.
References
- 1.↵
- 2.↵
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.
- 13.↵
- 14.↵
- 15.
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.
- 24.↵
- 25.↵
- 26.↵
- 27.
- 28.↵
- 29.
- 30.
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.
- 60.
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- ↵