Abstract
Enhancers are the primary DNA regulatory elements that confer cell type specificity of gene expression, thus giving rise to the hundreds of different cell types that make up a complex organism. Recent studies characterizing individual enhancers have revealed their potential to direct heterologous gene expression in a highly cell-type-specific manner. However, it has not yet been possible to systematically identify such enhancers for each of the many cell types in an organism. We have developed PESCA, a scalable and generalizable method coupling ATAC and single-cell RNA sequencing protocols, to identify cell-type-specific enhancers that should enable genetic access and perturbation of gene function in virtually any mammalian cell type. Focusing on the highly heterogeneous mammalian cerebral cortex, we apply PESCA to find enhancers and generate viral reagents capable of accessing and manipulating a rare subset of somatostatin-expressing cortical interneurons with high specificity. This study demonstrates the utility of this readily generalizable platform for developing new cell-type-specific viral reagents, with significant implications for both basic and translational research.
Introduction
Enhancers are DNA elements that regulate gene expression to produce the unique complement of proteins necessary to establish a specialized function for each cell type in an organism. Large scale efforts to build a definitive catalogue of cell types(1–4) based on their gene expression have only recently been met with attempts to map epigenomic regulatory landscapes as well(5–7), enabling a mechanistic understanding of the underlying gene expression that is critical for cell-type-specific development, identity, and unique function. Importantly, characterization of individual enhancers has revealed their potential to direct highly cell-type-specific gene expression in both endogenous and heterologous contexts(8–11), making them ideal for developing tools to access, study, and manipulate virtually any mammalian cell type.
Indeed, despite being able to transcriptionally catalogue distinct cell subpopulations, our efforts to study their function are fundamentally impeded by our limited ability to specifically access them. Nowhere is this more apparent than in the mammalian cerebral cortex, which is composed of a staggeringly intricate and interconnected ensemble of cell types that together enable the complex computations that the brain must make. Glutamatergic excitatory neurons propagate signal across neural circuits, while GABAergic inhibitory interneurons play an essential role in cortical signal processing by modulating neuronal activity, balancing excitability, and gating information(12–14). Although relatively sparse, interneurons are highly diverse; for example, somatostatin-expressing cortical interneurons comprise several anatomically, electrophysiologically, and molecularly defined cell types whose dysfunction is associated with neuropsychiatric and neurological disorders(2, 15, 16). Given the vast diversity of cell types in the brain, and the inability of our current transgenic tools to access most neuronal cell types, enhancer-driven viral reagents have the potential to become the next generation of cell-type-specific transgenic tools enabling, for the first time, facile, inexpensive, cross-species, and targeted observation and functional study of neuronal cell-types and circuits.
Despite the potential of cell-type-specific enhancers to revolutionize neuroscience research, cell-type-restricted gene regulatory elements (GREs) have not yet been systematically identified. Moreover, functional evaluation of candidate GRE-driven viral vectors is currently laborious, expensive, and low-throughput, typically relying on the production of individual viral vectors and the assessment of expression across a limited number of cell types by in situ hybridization or immunofluorescence. The lack of a high-throughput platform for rapid identification and functional testing of enhancers is therefore a critical bottleneck impeding the generation of new cell-type-specific viral reagents required to elucidate the function of each cell type in a complex organism.
To address these issues, we developed a scalable Paralleled Enhancer Single Cell Assay (PESCA) to identify and functionally assess the specificity of hundreds of GREs across the full complement of cell types present in any tissue of interest. In the PESCA protocol, the expression of a barcoded pool of AAV vectors harboring GREs is analyzed by single-nucleus RNA sequencing (snRNA-seq) to evaluate the specificity of each constituent GRE across tens of thousands of individual cells in the target tissue, through the use of an orthogonal cell-indexed system of transcript barcoding (Fig. 1a).
We validated the efficacy of PESCA in the murine primary visual cortex by identifying GREs that have the ability to confine AAV expression to a layer-restricted, electrophysiologically distinct subset of somatostatin (SST)-expressing interneurons, and showed that these vectors can be used to modulate neuronal activity in a cell-type-specific manner. We chose to focus on SST neurons in the brain because this population is known to be diverse and to be composed of relatively rare subpopulations(2, 15, 17), and thus might serve as a good test case for the efficacy of PESCA. As described below, these results highlight the utility of PESCA for identifying an enhancer that drives gene expression selectively in a subset of neurons and establish the use of PESCA as a platform of broad interest to the research and gene therapy community, enabling the generation of cell-type-specific AAVs for virtually any cell type.
Results
GRE selection and library construction
To identify candidate SST interneuron-restricted gene regulatory elements (GREs), we carried out comparative epigenetic profiling of the three largest classes of cortical interneurons, somatostatin (SST)-, vasoactive intestinal polypeptide (VIP)- and parvalbumin (PV)-expressing cells. To this end, we employed the recently developed isolation of nuclei tagged in specific cell types (INTACT)(5) method to isolate purified chromatin from of each of these cell types from the cerebral cortex of adult (6-10-week-old) mice. Assay for transposase-accessible chromatin using sequencing (ATAC-Seq)(7), which marks nucleosome-depleted gene regulatory regions based on their enhanced accessibility to in vitro transposition by the Tn5 transposase, was then used to identify genomic regions with enhanced accessibility in the SST (n = 279,221), PV (n = 275,631), and VIP (n = 258,646) chromatin samples. These datasets can be used as a resource to identify putative gene regulatory elements as candidates for driving cell-type-specific gene expression for the numerous SST, PV or VIP-expressing intraneuronal subtypes across diverse cortical regions.
Among these putative gene regulatory regions, 16,386 (5.9%) were enriched or uniquely present in SST cells relative to PV or VIP-expressing interneurons (Fig. 1b, c). To enrich for GREs that thus could be useful reagents to study and manipulate interneurons across mammalian species including in humans, we subsequently filtered the resulting list to exclude GREs with poor mammalian sequence conservation (Methods, Supplementary Fig. 1). Remaining elements were ranked based on cell-type-enrichment (Methods), with the top 287 SST-enriched GREs selected for screening to identify enhancers that drive gene expression selectively in SST interneurons of the primary visual cortex (Fig. 1d).
A PCR-based strategy was used to simultaneously amplify and barcode each GRE from mouse genomic DNA (Methods). To minimize sequencing bias due to the choice of barcode sequence, each GRE was paired with three unique barcode sequences. The resulting library of 861 GRE-barcode pairs was pooled and cloned into an AAV-based expression vector, with the GRE element inserted 5’ to a minimal promoter driving a GFP expression cassette and the GRE-paired barcode sequences inserted into the 3’ untranslated region (UTR) of the GRE-driven transcript (Methods, Fig. 2a, Supplementary Fig. 2). This configuration was chosen to maximize the retrieval of the barcode sequence during single-cell RNA sequencing. The library was packaged into AAV9, which exhibits broad neural tropism(18). The complexity of the resulting rAAV-GRE library was then confirmed by Next Generation Sequencing, detecting 802 of the 861 barcodes (93.2%), corresponding to 285 of the 287 GREs (99.3%) (Fig. 2b).
PESCA screen identifies GREs highly enriched for SST+ interneurons
To quantify the expression of each rAAV-GRE vector across the full complement of cell types in the mouse visual cortex, we used a modified single-nucleus RNA-Seq (snRNA-Seq) protocol to first determine the cellular identity of each nucleus and then quantify the abundance of the GRE-paired barcodes in the transcriptome of nuclei assigned to each cell type. Two injections (800 nL each) of the pooled AAV library (1 × 1013 viral genomes/mL) were first administered to the primary visual cortex (V1) of two 6-week-old C57BL/6 mice. Twelve days following injection, the injected cortical regions were dissected and processed to generate a suspension of nuclei for snRNA-Seq using the inDrops platform(19, 20) (Methods). A total of 32,335 nuclei were subsequently analyzed across the two animals, recovering an average of 866 unique non-viral transcripts per nucleus, representing 610 unique genes (Supplementary Fig. 3a, b).
Since droplet-based high-throughput snRNA-Seq samples the nuclear transcriptome with low sensitivity(19), viral-derived transcripts were initially detected in only 3.9% of sampled nuclei. We therefore designed a modified PCR-based approach to enrich for barcode-containing viral transcripts, which yielded deep coverage of AAV-derived transcripts with simultaneous shallow coverage of the non-viral transcriptome. PCR enrichment increased the viral transcript recovery 382-fold in the sampled nuclei, to an average of 15.6 unique viral transcripts, 6.0 unique GRE-barcodes, and 5.7 unique GREs per cell (Fig. 2b, Supplementary Fig. 3c). Using this modified protocol, viral transcripts were identified across 86% of cells (Supplementary Fig. 3d, e), with a high correlation (r = 0.9, p < 2.2×10−16) observed between the abundance of each barcoded AAV in the library and the number of cells infected by that AAV (Supplementary Fig. 3f).
Nuclei were classified into ten cell types using graph-based clustering and expression of known marker genes (Methods, Fig. 2c, d, Supplementary Fig. 4). The average expression of each viral-derived barcoded transcript was analyzed across all ten cell types, and an enrichment score was calculated from the ratio of expression in Sst+ nuclei compared to all Sst− nuclei. As expected, sets of three barcodes associated with the same GRE showed highly statistically correlated enrichment scores (r = 0.53 ± 0.03, p < 2.2×10−16) (Fig. 2e, f, Supplementary Fig. 5), which were abolished when barcodes were randomly shuffled (shuffled r = 0.002 ± 0.06; Wilcox test between data and shuffled data, p = 0.003).
Having confirmed a robust, non-random correlation in enrichment scores among the three barcodes associated with each GRE, we next computed a single expression value for each of the 287 viral drivers by aggregating expression data from barcodes associated with the same GRE, and carried out differential gene expression analysis between Sst+ and Sst− cells for each rAAV-GRE. Differential gene expression analysis between Sst+ and Sst− cells for each rAAV-GRE revealed a marked overall enrichment of viral-derived transcripts in the Sst+ subpopulation (Supplementary Fig. 6a). Indeed, multiple viral drivers were identified that promoted highly specific reporter expression in the Sst+ subpopulation (q < 0.01, fold-change > 7, Fig. 2g-i, Supplementary Fig. 6b).
In situ characterization of rAAV-GRE reporter expression
We next sought to validate the cell-type-specificity of the resulting hits using methods that do not rely on single-cell sequencing-based approaches. To this end, we selected three of the top five viral drivers (GRE12, GRE22, GRE44), as well as a control viral construct lacking the GRE element (ΔGRE), for injection into V1 of adult transgenic Sst-Cre; Ai14 mice, in which SST+ cells express the red fluorescent marker tdTomato. Fluorescence analysis twelve days following injection with rAAV-GRE12/22/44-GFP revealed strong yet sparse GFP labeling centered around cortical layers IV and V (Fig. 3a-c). By contrast, the control rAAV-ΔGRE-GFP showed a strikingly different pattern of GFP expression concentrated around the sites of injection, with expression in a larger number of cells (Fig. 3d). Many rAAV-GRE12/22/44-GFP virally infected cells were indeed SST-positive, marked by the high degree of overlapping GFP and tdTomato expression: 90.7% ± 2.1% for rAAV-GRE12-GFP (170 cells, 4 animals); 72.9 ± 4.2% for rAAV-GRE22-GFP (1164 cells, 3 animals), and 95.8 ± 0.6% for rAAV-GRE44-GFP (759 cells, 4 animals). (Fig. 3e, f, Supplementary Fig. 7). By contrast, we observed that only 27.2 ± 1.9% of GFP+ cells following rAAV-ΔGRE-GFP infection were also positive for tdTomato expression (2066 cells, 3 animals, Fig. 3e, f), indicating that the tested GREs serve to effectively restrict AAV payload expression to SST+ interneurons. It is notable that the GREs seemingly not only promote expression in SST+ cells but also reduce background expression in SST− cells, indicating the tested GREs confer both enhancer and insulator functionality. Consistent with this hypothesis, the incorporation of the GREs into the rAAV both increased the number of SST+/GFP+ cells (1.7-2-fold) and dramatically (3-32-fold) decreased the number of SST− cells that expressed GFP (Fig. 3g, Supplementary Fig. 8). To further investigate the specificity of our viral drivers among cortical interneuron cell types we injected each construct into Vip-Cre; Ai14+ mice in which all VIP+ cells express tdTomato, or used fluorescence antibody staining to label PV-expressing cells (Supplementary Fig. 9). Fluorescent signal analysis indicated the percentage of GFP+ cells that were either VIP+ or PV+ (rAAV-SST12-GFP+ [2.6 ± 2.6%], rAAV-GRE22-GFP+ [3.5 ± 2.0%] and rAAV-GRE44-GFP+ [6.0 ± 2.7%], Fig. 3h). This confirms that among major interneuron cell classes, all three vectors are highly SST-specific.
Because at least five subtypes of cortical SST+ interneurons have been identified based on the laminar distribution of their cell bodies and projections(15, 21), we also investigated the laminar distribution of GFP-expressing cells for the three Sst-enriched viral drivers. Intriguingly, the majority of rAAV-GRE12-GFP+ and rAAV-GRE44-GFP+ SST+ cells were found to reside in layers IV and V, distinct from the distribution observed for the full SST+ cell population in visual cortex (p = 1.3×10−6, p < 2.2×10−16, respectively, Mann-Whitney U test, two-tailed, Fig. 3i, j), raising the possibility that these constructs may preferentially label a specific subtype(s) of SST+ interneuron. Consistent with this hypothesis, we observed that these two viral drivers mediated reporter expression in only a relatively small fraction of all SST+ cells within the region of infection (44.5 ± 12.0% for rAAV-GRE12-GFP and 35.9 ± 6.2% for rAAV-GRE44-GFP) compared to rAAV-GRE22-GFP (Fig. 3k, Supplementary Fig. 10). Together, these findings suggest that PESCA may support the isolation of viral drivers capable of discriminating between fine-grained cell-types within a given interneuron cell class.
Electrophysiological characterization of rAAV-GRE-GFP-expressing SST subtypes
In addition to variability in laminar distribution, different electrophysiological phenotypes have also been observed in cortical SST interneurons(22, 23). To determine whether AAV-GRE reporters can be used to distinguish electrophysiologically distinct SST subtypes, we injected rAAV-GRE44-GFP into the visual cortex of adult Sst-Cre; Ai14 mice and performed whole-cell current-clamp recordings from double GFP and tdTomato positive neurons (rAAV-GRE44-GFP+), as well as immediately nearby tdTomato positive but GFP negative cells (rAAV-GRE44-GFP−). Our recordings indicate that both rAAV-GRE44-GFP+ and rAAV-GRE44-GFP− neurons are adapting SST interneurons with high input resistances and features consistent with deep layer cortical SST neurons(22, 24) (Fig. 4a, b).
However, rAAV-GRE44-GFP+ SST neurons are distinct in a several electrophysiological parameters. Action potentials of rAAV-GRE44-GFP+ SST neurons are significantly broader than those of rAAV-GRE44-GFP− SST neurons (Fig. 4c, d), perhaps due to differences in expression of certain channels in these subgroups of SST neurons, such as voltage-activated potassium channels, and BK calcium-activated potassium channels(25, 26). Furthermore, rAAV-GRE44-GFP+ SST neurons have a lower rheobase, and fire action potentials with a slower rising phase, and at lower maximal frequencies compared to rAAV-GRE44-GFP− SST neurons (Fig. 4a, d, Supplementary Table 1). Our electrophysiology experiments further illuminate the potential of PESCA to identify novel, functionally distinct subgroups of previously defined interneuron types.
Modulation of neuronal activity with rAAV-GREs
Finally, we evaluated whether the identified viral drivers support sufficiently high and persistent levels of payload expression to effectively modulate SST+ cell physiology. Designer receptors exclusively activated by designer drugs (DREADDs) are a commonly employed viral payload to dynamically regulate neuronal activity in response to the synthetic ligand clozapine-N-oxide (CNO)(27). We therefore injected the visual cortex of adult wild-type mice (6-8-week-old) with rAAV-GRE12-Gq-DREADD-tdTomato and performed electrophysiological recordings from tdTomato+ cells of acute cortical slices in a whole-cell, current-clamp configuration two weeks post-injection. All recordings from tdTomato+ cells evoked with depolarizing current steps showed striking sensitivity to CNO, as shown by significantly increased firing rates and depolarized resting membrane potentials during bath application of CNO (Fig. 4e-g). To ensure that increases in firing rate upon CNO application were specific to infected SST+ neurons, we performed recordings from nearby uninfected pyramidal neurons that were identified by morphology and found that there was no statistically significant increase in firing rate upon CNO application (Fig. 4h-j). These data demonstrate the ability of these reagents to robustly and specifically modulate the activity of SST+ cells in non-transgenic animals.
Discussion
The PESCA platform merges the principle of massively paralleled reporter assays (MPRA)(28, 29) with scRNA-seq and represents a significant advancement in current approaches to viral vector design, as it enables the rapid screening of hundreds of viral permutations for enhanced cell-type-specificity. In this study, we applied PESCA to screen putative enhancer elements for drivers that robustly and specifically target a rare SST+ population of GABAergic interneurons in the mouse central nervous system. Given our understanding of the generality of enhancer function across tissues, this approach could be readily applied to virtually any other neuronal or non-neuronal cells types, diverse model organisms, tissues, and viral types. Moreover, PESCA is not limited to GRE screening; the method can be easily adapted to assess the cell-type-specificity of viral capsid variants or other mutable aspects of viral design. This study addresses the urgent need for new tools to access, study, and manipulate specific cell types across complex tissues, organ systems, and animal models, demonstrating the importance and broad utility of PESCA beyond neuroscience research. Moreover, as the promise of gene therapy to treat and cure a broad range of diseases is being realized, PESCA has the potential to pave the way for a new generation of targeted gene therapy vehicles for diseases with cell-type-specific etiologies, such as congenital blindness, deafness, cystic fibrosis, and spinal muscular atrophy.
Author Contributions
S.H. conceived the study and designed the experiments. H.S. performed ATAC-Seq. M.A.N. and S.H. analyzed ATAC-seq data. S.H. cloned the library and performed the PESCA screen. M.A.N. mapped sequencing data and viral barcodes. S.H. analyzed snRNA-seq data. S.H., C.K. and O.F.W. cloned individual viral constructs. O.F.W. performed stereotactic injections and tissue sectioning. S.H. performed imaging and image analysis. C.P.T. performed and analyzed electrophysiology recordings. M.E.G. and E.C.G. advised on all aspects of the study. S.H., C.P.T., M.A.N, E.C.G. and M.E.G. wrote the manuscript.
Competing Financial Interests
The authors declare no competing financial interests.
Methods
Mice
Animal experiments were approved by the National Institute of Health and Harvard Medical School Institutional Animal Care and Use Committee, following ethical guidelines described in the US National Institutes of Health Guide for the Care and Use of Laboratory Animals. For INTACT we crossed Sst-IRES-Cre (The Jackson Laboratory Stock # 013044), Vip-IRES-Cre (The Jackson Laboratory Stock # 010908) and Pv-Cre (The Jackson Laboratory Stock # 017320) with SUN1-2xsfGFP-6xMYC (The Jackson Laboratory Stock # 021039) and used adult (6-12 wk old) male and female F1 progeny. For PESCA screening we used adult (6-10 wk) C57BL/6J (The Jackson Laboratory, Stock # 000664) mice. For confirmation of hits we crossed Sst-IRES-Cre (The Jackson Laboratory Stock # 013044), Vip-IRES-Cre (The Jackson Laboratory Stock # 031628) and Gad2-IRES-Cre (The Jackson Laboratory Stock # 028867) mice with Ai14 mice (The Jackson Laboratory Stock # 007914) and used adult (6-12 wk old) male and female F1 progeny. All mice were housed under a standard 12 hr light/dark cycle.
INTACT purification and in vitro transposition
INTACT employs a transgenic mouse that expresses a cell-type-specific Cre and a Cre-dependent SUN1-2xsfGFP-6xMYC (SUN1-GFP) fusion protein. Nuclear purifications were performed from whole cortex of adult mice as previously described using anti-GFP antibodies (Fisher G10362)(1, 2). Isolated nuclei were gently resuspended in cold L1 buffer (50 mM Hepes pH 7.5, 140 mM NaCl, 1 mM EDTA, 1 mM EGTA, 0.25% Triton X-100, 0.5% NP40, 10% Glycerol, protease inhibitors), and pelleted at 800g for 5 minutes at 4°C. DNA libraries were prepared from the nuclei using the Nextera DNA Library Prep Kit (Illumina) according to manufacturer’s protocols. The final libraries were purified using the Qiagen minelute kit and sequenced on a Nextseq 500 benchtop DNA sequencer (Illumina).
ATAC-seq mapping
All ATAC-seq libraries were sequenced on the Nextseq 500 benchtop DNA sequencer (Illumina). Seventy-five base pair (bp) single-end reads were obtained for all datasets. ATAC-seq experiments were sequenced to a minimum depth of 20 million (M) reads. Reads for all samples were aligned to the mouse genome (GRCm38/mm10, December 2011) using default parameters for the Subread (subread-1.4.6-p3, (Liao et al., 2013)) alignment tool after quality trimming with Trimmomatic v0.33 (Bolger et al., 2014) with the following command: java -jar trimmomatic-0.33.jar SE -threads 1 -phred33 [FASTQ_FILE]
ILLUMINACLIP:[ADAPTER_FILE]:2:30:10 LEADING:5 TRAILING:5
SLIDINGWINDOW:4:20 MINLEN:45. Nextera adapters were trimmed out for ATAC-seq data. Duplicates were removed with samtools rmdup. To generate UCSC genome browser tracks for ATAC-seq visualization, BEDtools was used to convert output bam files to BED format with the bedtools bamtobed command. Published mm10 blacklisted regions (Consortium, 2012) were filtered out using the following command: bedops–not-element-of 1 [BLACKLIST_BED]. Filtered BED files were scaled to 20 M reads and converted to coverageBED format using the BEDtools genomecov command. bedGraphToBigWig (UCSC-tools) was used to generate bigWIG files for the UCSC genome browser.
ATAC-seq peak calling and quantification
Two independent peak calling algorithms were employed to ensure robust, reproducible peak calls. First, tag directories were created using HOMER makeTagDirectory for each replicate, and peaks were called using default parameters for findPeaks with -style factor. MACS2 was also called using default parameters on each replicate. The summit files output by MACS2 were converted to bed format and each summit extended bidirectionally to achieve a total length of 300 bp. As the ATAC-seq peak calls would ultimately be used to narrow down potential regulatory elements for screening of a limited subset, we applied the overly stringent requirement that a peak be called by both approaches in a given replicate for its inclusion in the final peak list for that sample. Peaks identified in any sample in this way were aggregated to produce a final superset of 323,369 regulatory elements called as accessible in at least one cell type. The featurecounts package was used to obtain ATAC-seq read counts for each of these accessible putative GREs.
Identification of SST-enriched GREs
We used genomic coordinates of a superset of 323,369 genomic regions identified as a union of ATAC-Seq peaks across various cell types in the mouse cortex (manuscript in preparation) as a list of reference coordinates over which to quantify the ATAC-Seq signal from SST+, VIP+ and PV+ cells. A matrix was constructed representing the mean ATAC-Seq signal in SST+, VIP+ and PV+ cells for each of the 323,369 GREs and normalized such that the total ATAC-Seq signal from each cell population was scaled to 107. Fold-enrichment was calculated for each region/GRE as [(Signal in cell type A)+1] / [mean(signal in cell types B and C)+1]. GREs were subsequently ranked based on fold-enrichment score.
Identification of conserved GREs
To identify GREs whose sequence is highly conserved across mammals, we first needed to identify an appropriate conservation score to use as a threshold for high conservation. We reasoned that by analyzing the conservation of DNA sequences of the same length, but an arbitrary distance of 100,000 bases away from each identified GRE, we would generate a set of DNA sequences whose conservation can be used to determine this threshold.
To this end, conservation scores for GREs and corresponding GRE-distal sequences were calculated using the bigWigAverageOverBed command to determine the average PhyloP score of each sequence based on mm10.60way.phyloP60wayPlacental.bw PhyloP scores (http://hgdownload.cse.ucsc.edu/goldenpath/mm10/phyloP60way/)(3). After plotting the conservation score (phyloP, 60 placental mammals) of 323,369 GRE-distal sequences, we determined the conservation score of the 95th percentile of this distribution (PhyloP score = 0.5) and chose it as a minimal conservation score needed to classify any GRE as conserved.
Viral barcode design
Viral barcode sequences were chosen to be at least 3 insertions, deletions, or substitutions apart from each other to minimize the effects of sequencing errors on the correct identification of each barcode. The R library “DNAbarcodes” and following functions were used:
initialPool = create.dnabarcodes(10, dist=3, heuristic=“ashlock”);
finalPool = create.dnabarcodes(10, pool = initialPool, metric=“seqlev”);
The result was a list of 1164 10-base barcodes that fit our initial criteria.
Amplification of GREs and barcoding
Genomic PCR
PCR primers were designed using primer3 2.3.7.(4) such that a 150-400 bp flanking sequence was added to each side of the GRE. The forward primers contained a 5’ overhang sequence for downstream in-Fusion (Clonetech) cloning into the AAV vector (5’-GCCGCACGCGTTTAAT). The reverse primers contained a 5’ overhang sequence containing the recognition sites for AsiSI and SalI restriction enzymes(5’-GCGATCGCTTGTCGAC). Hot Start High-Fidelity Q5 polymerase (NEB) was used according to manufacturer’s protocol with mouse genomic DNA as template.
Barcoding PCR
The unpurified PCR products from the genomic PCR were used as templates for the barcoding PCR. A forward primer containing the sequence for downstream in-Fusion (Clonetech) cloning into the AAV vector (5’-CTGCGGCCGCACGCGTTTA) was used in all reactions. Reverse primers were constructed featuring (in the 5’ ➔ 3’direction): 1) a sequence for downstream in-Fusion (Clonetech) cloning into the AAV vector (5’-GCCGCTATCACAGATCTCTCGA), 2) a unique 10-base barcode sequence, and 3) sequence complementary with the AsiSI and SalI restriction enzyme recognition sites that were introduced during the first PCR (5’-GCGATCGCTTGTCGAC). Three different reverse primers were used for each of the GREs amplified during the genomic PCR. Hot Start High-Fidelity Q5 polymerase (NEB) was used according to the manufacturer’s protocol.
PESCA Library cloning
All PCR reactions were pooled and the amplicons purified using Agencourt AMPure XP. The pAAV-mDlx-GFP-Fishell-1 was a gift from Gordon Fishell (Addgene plasmid # 83900). The plasmid was digested with PacI and XhoI, leaving the ITRs and the polyA sequence. in-Fusion was used to shuttle the pool of GRE PCR products into the vector. Following transformation into High Efficiency NEB 5-alpha Competent E.coli and recovery, SalI and AsiSI were used to linearize the AAV vector containing the GREs. The expression cassette containing the human HBB promoter and intron followed by GFP and WPRE was isolated by PCR amplification from pAAV-mDlx-GFP-Fishell-1. The expression cassette was ligated with the linearized GRE-library-containing vector using T4 ligase and transformed into High Efficiency NEB 5-alpha Competent E.coli to yield the final library. 50 colonies were Sanger sequenced to determine the correct pairing between GRE and barcode and the correct arrangement of the AAV vector.
AAV preparation
The pooled PESCA library or individual AAV constructs (100 μg) were packed into AAV9 at the Boston Children’s Hospital Viral Core. The titers (2-50 × 1013 genome copies/mL) were determined by qPCR. Next generation sequencing using the NextSeq 500 platform was used to determine the complexity of the pooled PESCA library (Fig. 2a).
V1 cortex injections
Animals were anesthetized with isofluorane (1–3% in air) and placed on a stereotactic instrument (Kopf) with a 37°C heated pad. The PESCA library (AAV9, 1.9 × 1013 genome copies/mL) was stereotactically injected in V1 (800 nL per site at 25 nL/min) using a sharp glass pipette (25-45 μm diameter) that was left in place for 5 min prior to and 10 min following injection to minimize backflow. Two injections were performed per animal at coordinates 3.0 and 3.7 mm posterior, 2.5 mm lateral relative to bregma, and 0.6 mm ventral relative to the brain surface.
Individual rAAV-GRE constructs were stereotactically injected at a titer of 1 × 1011 genome copies/mL. (250 nL per site at 25 nL/min). All injections were performed at two depths (0.4 and 0.7 mm ventral relative to the brain surface) to achieve broader infection across cortical layers. The injection coordinates relative to bregma were 3.0 or 3.7 mm posterior, 2.5 or −2.5 mm lateral.
Nuclear isolation
Single-nuclei suspensions were generated as described previously(1), with minor modifications. V1 was dissected and placed into a Dounce with homogenization buffer (0.25 M sucrose, 25 mM KCl, 5 mM MgCl2, 20 mM Tricine-KOH, pH 7.8, 1 mM DTT, 0.15 mM spermine, 0.5 mM spermidine, protease inhibitors). The sample was homogenized using a tight pestle with 10 stokes. IGEPAL solution (5%, Sigma) was added to a final concentration of 0.32%, and 5 additional strokes were performed. The homogenate was filtered through a 40-μm filter, and OptiPrep (Sigma) added to a final concentration of 25% iodixanol. The sample was layered onto an iodixanol gradient and centrifuged at 10,000g for 18 minutes as previously described1,2. Nuclei were collected between the 30% and 40% iodixanol layers and diluted to 80,000-100,000 nuclei/mL for encapsulation. All buffers contained 0.15% RNasin Plus RNase Inhibitor (Promega) and 0.04% BSA.
snRNA-Seq library preparation and sequencing
Single nuclei were captured and barcoded whole-transcriptome libraries prepared using the inDrops platform as previously described(5, 6), collecting five libraries of approximately 3,000 nuclei from each animal. Briefly, single nuclei along with single primer-carrying hydrogels were captured into droplets using a microfluidic platform. Each hydrogel carried oligodT primers with a unique cell-barcode. Nuclei were lysed and the cell-barcode containing primers released from the hydrogel, initiating reverse transcription and barcoding of all cDNA in each droplet. Next, the emulsions were broken and cDNA across ~3000 nuclei pooled into the same library. The cDNA was amplified by second strand synthesis and in vitro transcription, generating an amplified RNA intermediate which was fragmented and reverse transcribed into an amplified cDNA library.
For enrichment of virally-derived transcripts, a fraction (3 μL) of the amplified RNA intermediate was reverse transcribed with random hexamers without prior fragmentation. PCR was next used to amplify virally derived transcripts. The forward primer was designed to introduce the R1 sequence and anneal to a sequence uniquely present 5’ of the viral-barcode sequence present in the viral transcripts (5’-GCATCGATACCGAGCGC). The reverse primer was designed to anneal to a sequence present 5’ of the cell-barcode (5’-GGGTGTCGGGTGCAG). The result of the PCR is preferential amplification of the viral-derived transcripts, while simultaneously retaining the cell-barcode sequence necessary to assign each transcript to a particular cell/nucleus. Following PCR amplification (18 cycles, Hot Start High-Fidelity Q5 polymerase) all the libraries were indexed, pooled, and sequenced on a Nextseq 500 benchtop DNA sequencer (Illumina).
inDrop sample mapping and viral barcode deconvolution by cell
The published inDrops mapping pipeline (github.com/indrops/indrops) was used to assign reads to cells. To map viral sequences, a custom annotated transcriptome was generated using the indrops pipeline build_index command supplied with the following newly generated reference files: a custom genome with one additional contig comprising a shared 5’ sequence (gcatcgataccgagcgcgcgatcgc), the given 10 bp barcode, and a shared 3’ sequence (tcgagagatctgtgatagcggc) was appended to the GRCm38.dna_sm.primary_assembly.fa genome file for each cloned GRE. These sequences were also appended GRCm38.88.gtf gene annotation file, with all sequences assigned the same gene_id and gene_name, but unique transcript_id, transcript_name, and protein_id. After inDrops pipeline mapping and cell deconvolution, the pysam package was used to extract the ‘XB’ and ‘XU’ tags, which contain cell barcode and UMI sequences, respectively, from every read that mapped uniquely to any one of the custom viral contigs (i.e. requiring the read map to the 10 bp barcode with at most 1 mismatch) in the inDrops pipeline-output bam files. These barcode-UMI combinations were condensed to generate a final cell x GRE barcode UMI counts table for each sample.
Embedding and identification of cell types
Data from all nuclei (two animals, 5 libraries of ~3,000 nuclei per animal) were analyzed simultaneously. Viral-derived sequences were removed for the purposes of embedding clustering and cell type identification. The initial dataset contained 32,335 nuclei, with more than 200 unique non-viral transcripts (UMIs) assigned to each nucleus. The R software package Seurat(7, 8) was used to cluster cells. First, the data were log-normalized and scaled to 10,000 transcripts per cell. Variable genes were identified using the FindVariableGenes() function. The following parameters were used to set the minimum and maximum average expression and the minimum dispersion: x.low.cutoff=0.0125, x.high.cutoff=3, y.cutoff=0.5. Next, the data was scaled using the ScaleData() function, and principle component analysis (PCA) was carried out. The FindClusters() function using the top 30 principal components (PCs) and a resolution of 1.5 was used to determine the initial 29 clusters. Based on the expression of known marker genes we merged clusters that represented the same cell type. Our final list of cell types was: Excitatory neurons, PV Interneurons, SST Interneurons, VIP interneurons, NPY Interneurons, Astrocytes, Vascular-associated cells, Microglia, Oligodendrocytes, and Oligodendrocyte precursor cells.
Enrichment calculation
Viral vector expression for each of the 861 barcodes across the ten cell types was calculated by averaging the expression of barcoded transcripts across all the individual nuclei that were assigned to that cell type. The relative fold-enrichment in expression toward Sst+ cells was computed as the ratio of the mean expression in Sst+ cells and the mean expression in Sst- cells: (mean(Sst+ cells)+0.01)/ (mean(Sst- cells)+0.01).
Viral GRE expression for each of the 287 barcodes was calculated at the single-nucleus level as a sum of the expression of the three barcodes that were paired with that GRE. Average GRE-driven expression across the ten cell types was calculated by averaging the expression of the GRE transcripts across all the individual nuclei that were assigned to that cell type. The relative fold-enrichment in GRE expression toward Sst+ cells was determined as the ratio of the mean expression in Sst+ cells and the mean expression in Sst- cells: (mean(Sst+ cells)+0.01)/ (mean(Sst- cells)+0.01).
Differential gene expression
To identify which of the GRE-driven transcripts were statistically enriched in Sst+ vs. Sst− cells, we carried out differential gene expression analysis using the R package Monocle2(9). The data were modeled and normalized using a negative binomial distribution, consistent with snRNA-seq experiments. The functions estimateSizeFactors(), estimateDispersions() and differentialGeneTest() were used to identify which of the GRE-derived transcripts were statistically enriched in Sst+ cells. GREs whose false discovery rate (FDR) was less than 0.01 were considered enriched.
Fluorescence microscopy
Sample preparation
Mice were sacrificed and perfused with 4% PFA followed by PBS. The brain was dissected out of the skull and post-fixed with 4% PFA for 1-3 days at 4°C. The brain was mounted on the vibratome (Leica VT1000S) and coronally sectioned into 100 μm slices. Sections containing V1 were arrayed on glass slides and mounted using DAPI Fluoromount-G (Southern Biotech).
Sample imaging
Sections containing V1 were imaged on a Leica SPE confocal microscope using an ACS APO 10x/0.30 CS objective (Harvard NeuroDiscovery Center). Tiled V1 cortical areas of ~1.2 mm by ~0.5 mm were imaged at a single optical section to avoid counting the same cell across multiple optical sections. Channels were imaged sequentially to avoid any optical crosstalk.
Immunostaining
To identify parvalbumin (PV)+ cells, coronal sections were washed three times with PBS containing 0.3% TritonX-100 (PBST) and blocked for 1 h at room temperature with PBST containing 5% donkey serum. Section were incubated overnight at 4°C with mouse anti-PVALB antibody 1:2000 (Milipore), washed again three times with PBST, and incubated for 1 h at room temperature with 1:500 donkey anti-mouse 647 secondary antibody (Life Technologies). After washing in PBST and PBS, samples were mounted onto glass slides using DAPI Fluoromount-G.
Quantification of the percentage of GFP+ cells that were SST+, VIP+, and PV+
Across all images, coordinates were registered for each GFP+ cell that could be visually discerned. An automated ImageJ script was developed to quantify the intensity of each acquired channel for a given GFP+ cell. We created a circular mask (radius = 5.7 μm) at each coordinate representing a GFP positive cell, background subtracted (rolling ball, radius = 72 μm) each channel, and quantified the mean signal of the masked area. To identify the threshold intensity used to classify each GFP+ cell as either SST+, VIP+ or PV+, we first determined the background signal in the channel representing SST, VIP or PV by selecting multiple points throughout the area visually identified as background. These background points were masked as small circular areas (radius = 5.7 μm), over which the mean background signal was quantified. The highest mean background signal for SST, VIP and PV was conservatively chosen as the threshold for classifying GFP+ cells as SST+, VIP+ or PV+, respectively.
Quantification of the distribution of cells as a function of distance from pia
A semiautomated ImageJ algorithm was developed to trace the pia in each image, generate a Euclidean Distance Map (EDM), and calculate the distance from the pia to each GFP+ cell.
Quantification of the percentage of SST+ cells that were GFP+
An automated algorithm was developed to identify SST+ cells after appropriate background subtraction, image thresholding, masking and filtering for all objects of appropriate size and circularity. The number of SST+ objects (cells) was then counted within a minimal polygonal area that encompassed all GFP+ cells in that image. The ratio of the number of GFP+ cells and SST+ cells within the area of infection (here identified as area with discernable GFP+ cells) was calculated.
Slice Preparation
Acute, coronal brain slices containing visual cortex of 250 – 300 μm thickness were prepared using a sapphire blade (Delaware Diamond Knives, Wilmington, DE) and a VT1000S vibratome (Leica, Deerfield, IL). Mice were anesthetized though inhalation of isoflurane, then decapitated. The head was immediately immersed in an ice-cold solution containing (in mM): 130 K-gluconate, 15 KCl, 0.05 EGTA, 20 HEPES, and 25 glucose (pH 7.4 with NaOH; Sigma). The brains were quickly dissected and cut in the same ice-cold, gluconate based solution while oxygenated with 95% O2/5% CO2. Slices then recovered at 32°C for 20-30 minutes in oxygenated artificial cerebrospinal fluid (ACSF) in mM: 125 NaCl, 26 NaHCO3, 1.25 NaH2PO4, 2.5 KCl, 1.0 MgCl2, 2.0 CaCl2, and 25 glucose (Sigma), adjusted to 310-312 mOsm with water.
Electrophysiological Recordings
Using an Olympus BX51WI microscope equipped with a 60x water immersion objective, we used fluorescence illumination to identify rAAV-GRE44-GFP+ (red and green) and rAAV-GRE44-GFP− (only red) SST neurons in the area of injection/ AAV infection (Fig. 4a-d). rAAV-GRE44-GFP− neurons were recorded if they were in the same field of view as rAAV-GRE44-GFP+ neurons under 60x. For rAAV-GRE12-Gq-DREADD-tdTomato experiments (Fig. 4e-j), tdTomato+ cells and morphologically identified pyramidal neurons in the same field of view under 60x were recorded. Whole-cell current clamp recordings of these neurons in coronal visual cortex slices of P50 to P80 wild-type mice were performed using borosilicate glass pipettes (3-6 MOhms, Sutter Instrument, Novato, CA) filled with an internal solution (in mM): 116 KMeSO3, 6 KCl, 2 NaCl, 0.5 EGTA, 20 HEPES, 4 MgATP, 0.3 NaGTP, 10 NaPO4 creatine (pH 7.25 with KOH; Sigma). Neurobiotin (1.5%) was occasionally included in the internal solution to allow for post-hoc morphological reconstruction of recorded cells. All experiments were performed at room temperature in oxygenated ACSF. Series resistance was compensated by at least 60% in a voltage-clamp configuration before switching to current-clamp (“I Clamp Normal”). After break-in, a systematic series of 1 second current injections ranging from −100 pA to 500 pA were applied to each cell using the User List function in the “Edit Waveform” tab of pClamp. After such baseline firing rates were calculated, CNO (2 μM, Sigma) was bath applied. An average of at least three trials for each current injection was calculated before and during CNO application.
Data Acquisition and Analysis
For electrophysiology, data acquisition of current-clamp experiments was performed using Clampex10.2, an Axopatch 200B amplifier, filtered at 2 kHz and digitized at 20 kHz with a DigiData 1440 data acquisition board (Molecular Devices, Sunnyvale, CA). Analysis of electrophysiological parameters was done using Clampfit (Molecular Devices, Sunnyvale, CA), Prism 7 (GraphPad Software, La Jolla, CA), Excel (Microsoft, Redmond, WA), and custom software written and generously shared by Dr. Bruce Bean in Igor Pro version 6.1.2.1 (WaveMetrics, Lake Oswego, OR). Membrane potentials in this study were not corrected for the liquid junction potential and are thus positively biased by 8 mV. For analysis of action potential waveform in Fig. 4a-d and Supplementary Table 1, the first action potential that appeared during a current injection equivalent to the rheobase was analyzed, as well as the first action potential of the subsequent two current injections. For example, if the rheobase were 20 pA, then all the parameters defined in the next section were also analyzed for the first action potential elicited with 20, 25, and 30 pA of injected current, and averaged.
Definition of Electrophysiological Parameters
AP Height (in millivolts): the difference between the peak of the action potential and the most negative voltage during the afterhyperpolarization immediately following the spike.
AP Peak (in millivolts): the most depolarized (positive) potential of the spike.
AP Trough (in millivolts): the most negative voltage reached during the afterhyperpolarization immediately following the spike.
Fmax initial (in Hertz): the average of the reciprocal of the first three interstimulus intervals, measured at the maximal current step injected before spike inactivation.
Fmax steady-state (in Hertz): the average of the reciprocal of the last three interstimulus intervals, measured at the maximal current step injected before spike inactivation.
Rate of rise (in volts per second): maximal voltage slope (dV/dt) during the upstroke (rising phase) of the action potential.
Rheobase (in picoamperes): the minimal 1000 ms current step (in increments of 5 pA) needed to elicit an action potential.
Rin (in megaohms, MΩ): input resistance, determined by using Ohm’s law to measure the change in voltage in response to a −50 pA, 1000ms hyperpolarizing current at rest.
Spike adaptation ratio: the ratio of Fmax steady-state to Fmax initial
Spike width (in milliseconds, used interchangeably with spike half-width): the width at half-maximal spike height as defined above.
τm (in milliseconds): membrane time constant, determined by fitting a monoexponential curve to the voltage chance in response to a −50pA, 1000ms hyperpolarizing current at rest.
Threshold (in millivolts): the membrane potential at which dV/dt = 5 V/s.
Vrest (in millivolts): resting membrane potential a few minutes after breaking in without any current injection.
Acknowledgements
We thank D. Tom for help with image analysis, Drs. W. Renthal, J. Green, and members of the Greenberg lab for discussions, the HMS Single Cell Core for single-nucleus RNA-seq sample processing, Boston Children’s Hospital Viral Core for AAV packaging, Dr. B. Bean for feedback on electrophysiology, and the Harvard NeuroDiscovery Center Enhanced Neuroimaging Core for imaging and image analysis. This work was supported by the National Institute of Health BRAIN Initiative grant R01 MH114081-01, NIH grant T32GM007753 to M.A.N. and NIH Training in the Molecular Biology of Neurodegeneration grant 5T32AG000222-23 to S.H.
Footnotes
One sentence summary: Highly paralleled functional evaluation of enhancer activity in single cells generates new cell-type-specific tools with broad medical and scientific applications.