Abstract
We present Barcoded Oligonucleotides Ligated On RNA Amplified for Multiplexed and parallel In-Situ analysis (BOLORAMIS), a reverse-transcription (RT)-free method for spatially-resolved, targeted, in-situ RNA identification of single or multiple targets. For this proof of concept, we have profiled 154 distinct coding and small non-coding transcripts ranging in sizes 18 nucleotide (nt) in length and upwards, from over 200, 000 individual human induce d pluripotent stem cells (iPSC) and demonstrated compatibility with multiplexed detection, enabled by fluorescent in-situ sequencing. We use BOLORAMIS data to identify differences in spatial localization and cell-to-cell expression heterogeneity. Our results demonstrate BOLORAMIS to be a generalizable toolset for targeted, in-situ detection of coding and small non-coding RNA for single or multiplexed applications.
Manuscript
Single-cell transcriptomics is an exponentially evolving field, with recent developments in multiplexed in-situ technologies paving the way for spatial imaging of the genome and transcriptome at an unprecedented resolution1. We proposed Fluorescent In Situ Sequencing (FISSEQ) in 2003, and in 2014 demonstrated the generation and sequencing of highly multiplexed, spatially resolved in-situ RNA libraries in cells and tissues2,3. While FISSEQ’s novelty lay in enabling unbiased discovery, it is primarily limited by poor detection sensitivity, and unsuitability for targeted in-situ transcriptomics (<0.005% compared to single-molecule (sm) FISH)4,5. Padlock probes have been demonstrated for in-situ sequencing for multiplexed transcriptomics, with single-base resolution on a small number of transcripts6. However in both methods, detection efficiency is a function of RT efficiency, and subject to noise resulting from variable priming efficiency and random priming induced bias7. LNA modified primers have been demonstrated to increase RT efficiency with padlock probes (∼30%), but require careful calibration and can be cost-prohibitively expensive for genome-wide applications (∼2 order higher cost than unmodified primers)5,6,8.
smFISH based multiplexed transcriptome imaging methods overcome the limitations of RT by directly hybridizing a plurality of oligo-paint like encoded DNA probes directly on target RNA, and subsequently reading out target locations using a two-stage hybridization scheme9–14. While smFISH based methods offer the highest in-situ RNA detection efficiency, it is best suited for large transcripts (>1500nt) and can yield lower signal than RCA based methods (∼50-200 vs ∼800-1000 fluorophores/transcript, respectively). As a result, a vast segment of biologically interesting RNA species, (including short-non coding RNA) remain vastly inaccessible to Multiplexed In situ (MIS) methods1,15. Consequently, there is a strong demand for robust MIS RNA detection methods that can overcome some of the limitations of each method (sensitivity, cost barrier and transcript-size limitation), while retaining the desirable features of these technologies (scalability, single-nucleotide discrimination, and high detection efficiency).
In order to fill this gap, we developed BOLORAMIS (Barcoded Oligonucleotides Ligated On RNA Amplified for Multiplexed In-Situ), a reverse-transcription free direct RNA detection method for imaging of coding and small-non-coding RNA for multiplexed in-situ applications. BOLORAMIS is based on combinatorial molecular indexing combined with direct RNA dependent ligation and clonal amplification of barcoded padlock probes, and builds on recent improvements in RNA-splinted DNA ligation methods6,15–23.
In BOLORAMIS, cells are fixed, permeabilized and incubated with barcoded single-stranded DNA (ssDNA) probes containing two target-RNA complementary termini and a linker segment containing a universal sequencing anchor. The sequencing anchor is flanked by dual 6-base barcodes to allow sufficient barcode-diversity for whole-transcriptome multiplexing (∼106 unique barcodes). The oligonucleotides are directly ligated on target RNA with PBCV-1 DNA ligase to generate circularized barcoded ssDNA probes. Single-molecule signal is then amplified with rolling circle amplification (RCA) with the incorporation of aminoallyl deoxyuridine 5′-triphosphate (dUTP). The amplicons are crosslinked to the cellular protein matrix, using an amine-reactive bifunctional linker to generate spatially structured three-dimensional FISSEQ libraries (Figure 1A)3,5.
Using BOLORAMIS, we were able to readily generate dense punctate amplicons from targeted RNA detection in a wide range of continuous cell lines, as well as in frozen human and mouse brain sections (Figures 1, S1-S3). Negative control probes (non-targeting), or absence of PBCV 1 DNA ligase resulted in almost no signal as expected (S1-f, g). We further confirmed detection specificity by designing a probe targeting human 3’ untranslated region (3’UTR) of Actin B gene (ACTB) with ∼95% sequence identity to mouse ortholog, and tested detection on both mouse and human cell-lines. BOLORAMIS specifically generated dense amplicons in human HeLa cells, but not in mouse NIH-3T3 fibroblasts (figure S1, supplemental table S9). As control, targeting a perfectly conserved 3’ UTR sequence resulted in comparable number of amplicons in both human and mouse cell lines (figure S2).
Next, we wanted to characterize BOLORAMIS relative detection sensitivity and true discovery rate (TDR) as a function of the length of the 5’ targeting arm length for a probe with a total conserved RNA hybrid sequence length of 25 nt. We systematically tested probes with a range of 5’ targeting hybrid length (from 2 to 24 bases) and quantified the observed mean spots per cell for each condition for ACTB. False-discovery rate (FDR) rate was estimated by a mismatch probe containing a single base mismatch at the 3’ terminus for each condition. To minimize bias, the donor/acceptor ligation junction bases were kept constant in all 5’/3’ RNA hybrid length combinations. Overall, we observed an asymmetric increase in relative detection sensitivity with increasing 5’ arm length, peaking at position 18 with a TDR of 96.6%, with ∼29 fold-discrimination between probes with a single base 3’ mismatch (Figure 1-h, i, supplemental figure 1, 3, supplemental table 1). These results indicate that this method is suitable for direct-RNA detection with high specificity, nearing single-base resolution (supplemental S3).
We next evaluated BOLORAMIS performance in detecting a wide range of coding and non-coding RNA targets in a biologically meaningful context. We designed 269 probes targeting 77 miRNA and an equal number of Transcription Factors (TF’s), expressed at varying abundance levels in human iPSC cells (supplemental methods, supplemental table 7)24. Each probe was tested individually in-situ in PGP1 human iPSC cells in 384-well plates. Single-cell spot counts were quantified from a total of 217,206 human iPSC cells using automated imaging and image-processing pipeline (Figure 2, supplemental figure S4, Supplemental Pipeline, Supplemental table S9).
On a whole, BOLORAMIS expression values were indicative of a stem cell expression signature (Figure 2, b, f, g). For example, some of the highest ranked mRNA corresponded to pluripotency markers (ZFP42, GATA2, SMAD1, ID1, KAT7, OCT4, SOX2, NANOG and ZFX) (Figure 2, b). Conversely, mRNA with the lowest BOLORAMIS single-cell expression values were associated with promoting cellular differentiation. For example, the lowest ranked TFs were NR61A, NEUROG1, LIN28B, TAL1, NR4A2, CREB1, OLIG2 and NEUROG2 (Figure 2, b). Single cell BOLORAMIS expression counts varied between 1-1000 spots/cell with a mean of 53.8 +/-74.9 Stdev. (Supplemental Figure S5). The highest detected mean BOLORAMIS expression value was 236.95 puncta/cell (ZFP42, a pluripotency gene), which is comparable with the reported upper-limit of high-throughput bDNA sm-FISH detection at similar magnification (Figure 2, b, e)12,25. For a given probe, we observed highly reproducible mean spot counts per cell for independent BOLORAMIS assays (Pearson’s r: 0.859, Supplemental Figure S6, a). We estimate BOLORAMIS to perform better than RNA-seq for transcripts expressed at low abundance (<=5 FPKM). On average, we observed over 71 spots per cell with BOLORAMIS for transcripts with bulk RNAseq FPKM values <=5 (which is roughly estimated to be equivalent to ∼1 transcript per cell)26.
We next compared BOLORAMIS mRNA measurements with published RNAseq values. As a population, BOLORAMIS probes exhibited a low-positive correlation with bulk RNAseq values (Pearson’s r: 0.142). We suspected that the low correlation might likely be resulting from poor ligation-efficiencies for certain donor/acceptor pairs23. To test this hypothesis, we split our data from 192 probes into 16 donor/acceptor ligation junction categories, and found a close dependency between Pearson correlation and ligation junction composition, which ranged from −0.351 to 0.928 across all 16 categories. Consistent with previous reports, probes with a G in either donor or acceptor positions correlated poorly with bulk RNAseq measurements (Figure 2, c)23. Interestingly, C in donor/acceptor positions (dT/C, dC/C, dA/C or dC/A) displayed the highest correlation with RNAseq values (Pearson’s r: 0.928, 0.856, 0.573 and 0.57 respectively) (Figure 2, c). Probes with best ligation junctions (TC, CC and AC) display in a higher Pearson’s correlation with bulk RNA-seq (0.739, Figure 2, d).
We also observed a dependency between probe melting temperatures (Tm) and Pearson’s correlation with bulk RNAseq values, which explained some of the variance associate with the data. In general, probes with greater Tm’s displayed a higher Pearson’s correlation with bulk RNAseq values (R2 = 0.721, supplemental table S11). Due to RNA size constrains, miRNA, targeting probes were designed to avoid G in either donor or acceptor position to improve ligation efficiency, and exhibited a Pearson’s correlation of 0.524 with published bulk miRNA nCounter expression measurements in PGP1 hiPSC (Figure 2, e and supplemental table 3)24.
Single-cell expression measurements allow unique insights into population-heterogeneity, which is not possible with bulk methods that average population signal. To assess single-cell expression variability over a population of cells, we calculated the coefficient of variation (COV), defined as the ratio of standard deviation to mean, expressed in percentage for all targets. The mean COV of expression for all targets was 77.3 +/-27% (Stdev) and ranged from 38.7% COV to 217.1% COV (Figure 2-f, g). Independent probes targeting the same mRNA, on average displayed a lower variability than the observed overall single-cell expression heterogeneity. The mean COV between at least two independent probes targeting the same RNA was 52.63 +/-29.46% (Stdev, n=53, Supplemental Figure S4-b).
Single-cell spatial distribution of BOLORAMIS spots seemed non-random, and consistent for a given probe. To quantify differences in spatial localization, we calculated the ratio of mean nuclear to cytoplasmic puncta counts (N: C ratio) and identified several transcripts with signal preferentially localized in the nucleus or cytoplasm in iPSCs (Figure 2, h). For example, we observed a significantly higher nuclear signal from probes targeting SMAD1, hsa-miR-148b-3p, POU4F2, HEY1, GBX2, FOXD3, hsa-miR-301-3p, OTX2 and NFATC1 (Z-score N: C ratio >=1.64) (Figure 2, h). In contrast, probes targeting KAT7, IRX4, THAP11, ZFX, TLX3, SOX8 and LIN28 showed higher cytoplasmic localization (Z-score N: C ratio <=-1.64) (Figure 2,h, supplemental Figure S6).
We tested our ability to generate multiplexed in-situ targeted RNA detection libraries by pooling individual BOLORAMIS probes targeting either miRNA (77 plex), mRNA (77 plex) or splice-junctions (18 plex) in both HeLa and hiPSC cell lines. To verify compatibility with in-situ sequencing chemistries, which depends on clonal, spatially resolved fluorescent amplicons, we quality-checked all our libraries with single-base sequencing, which resulted in bright, punctate spectrally resolved fluorescent spots in the expected color-space (Figure 1, b, c, d, Supplemental Figure S6). BOLORAMIS amplicons displayed high stability, and withstood multiple imaging and sequencing cycles (Supplemental Figure S8).
As a proof of concept for simultaneous multiplexed detection of both coding and small non-coding RNA with FISSEQ, we performed a pooled BOLORAMIS library construction with probes targeting 14 transcripts, consisting of 4 miRNA and 10 mRNA. We sequenced 3 bases of the barcodes using sequence by ligation chemistry and extracted the relative barcode frequencies with 98.26% accuracy (511 correct barcodes out of 520, Figure 4). Barcode enrichment was non-random, and only 14 of 64 possible triplet barcodes were observed in our data, of which 9 were expected within our library. Barcodes with highest frequencies corresponded to iPSC markers ID1, NANOG, hsa-miR-148b-3p, Sox2, GATA and KAT7 (Figure 4, b and Supplemental table 7). Interestingly, we did not detect barcodes corresponding to ZEPF42 and POU5F1 in our data. We believe this is likely a barcode mis-representation artifact resulting from single-base mis-reads during base-calling. Several ongoing developments in our lab, and others should address these in the near future with the use of error-correction barcodes, longer sequencing reads, better sequencing chemistries and improved base-calling algorithms (Supplemental Figure S8)27,28.
Discussion
We have shown BOLORAMIS is a RT-free, in-situ single-molecule RNA detection method capable of visualizing single or multiplex coding and small non-coding RNA targets. This is the first demonstration of simultaneous detection of coding and small non-coding RNA with in-situ sequencing. Each BOLORAMIS barcoded amplicon results from a precise, target-RNA dependent single-molecular ligation event, nearing single-base specificity. BOLORAMIS and random-hexamer FISSEQ are complementary methods with unique, and largely non-overlapping applications. Unlike FISSEQ, BOLORAMIS cannot be used for de-novo discovery applications, since the probe sequences must be pre-determined.
Based on published estimates, SplintR mediated direct RNA detection outperforms random-hexamer-RT FISSEQ for targeted transcriptomics by 2-3 orders of magnitude in detection efficiency (∼0.005% vs ∼20%), and circumvents random-primed RT related artifact and over-representation of abundant housekeeping RNA like rRNA3–5,19,21. Further, this approach retains the key advantages of the barcoded padlock probe system (multiplexing and targeting sensitivity), without the need for RT or cost-prohibitively expensive LNA modified primers6,8.
Due to the short footprint needed for RNA-splinted DNA ligation, BOLORAMIS enables access to transcripts ∼2 orders or shorter than tiled smFISH based methods (∼20 nt vs ∼1500 nt, respectively), opening a vast repertoire of previously inaccessible RNA species for both single or multiplex analysis, including small non-coding RNA, which have traditionally be challenging to detect29,30. To the best of our knowledge, this is also the first reported direct comparison of bulk miRNA measurements with in-situ single-molecule miRNA measurements.
Cost analysis
We estimate BOLORAMIS assay to be over an order or magnitude cheaper than LNA based assays, and within the cost of typical high-content screening (∼1.5$/well) (Supplemental table 8). BOLORAMIS requires far fewer probes than smFISH based approaches. We expect the cost to reduce another 1-3 orders of magnitude with multiplexing. BOLORAMIS library construction workflow is straightforward, robust and can be completed within 1-2 days and amenable to automation.
Applications towards human cell atlas
Human Cell Atlas (HCA) aims to generate a comprehensive reference maps of all human cells. While single-cell RNAseq (scRNA-seq) methods generate the highest sequencing depth, lose spatial information, and scale more slowly. We believe BOLORAMIS can be a highly complementary approach for spatially mapping transcriptomic signatures obtained from bulk/scRNA-seq methods to identify new cell-types in a high-resolution spatial context. Further, BOLORAMIS may be valuable in constructing large-scale spatial maps of RNA splice variants non-coding RNA.
Limitations
Success of any targeted RNA detection method, including BOLORAMIS, depends on probe design, and can vary in efficiency based on RNA availability, probe hybridization Tm and ligation junction parameters. We anticipate that improvements in probe design algorithms might directly yield improvements in detection efficiency. While PBCV1 based direct RNA detection offers a high detection specificity, the overall fidelity needs to be further improved for applications pertaining to single nucleotide polymorphism detection, without compromising sensitivity31. We are actively working on improving In-situ sequencing read depth, read-length and improved base-calling algorithms with error-correction codes. We expect use of partition sequencing in combination with expansion microscopy to vastly overcome a vast majority of the existing current resolution limitations3,32.
Author Contributions
EPRI initiated the project and made high-level designs and plans with GMC and JA. EPRI, SP, SL and KJ planned and executed experiments, EPRI and MF developed image-processing and data analysis pipelines. EPRI, SP, SL collected, analyzed and interpreted data. SP and KJ performed probe optimization experiments. SL performed high-resolution confocal imaging. JM mined and helped design splice-junction experiments. TF helped with custom, sophisticated microscopy setup (for in-situ sequencing and high-resolution imaging). SR helped obtain high-content imaging technical setup. EPRI performed high-content imaging, automated quantification and statistical analysis. REK, SP, EPRI designed and executed mouse brain tissue experiments. ATW, DG, FC, SA and AS and ESB shared unpublished data and insights, and helped plan experiments. GMC, JA, DG, SA helped troubleshoot detection, in-situ sequencing data analysis. BY, LA provided human brain tissue samples, advice and insights. DM helped with RT-padlock experiments. CC explored early strategies for in-situ RNA detection using in-situ PCR. PR explored early padlock detection methods. AS, TF, KJ, MF and EPRI planned and developed in-situ sequencing automation strategies. GMC supervised the project. EPRI wrote the manuscript with GMC and JA, with contributions, edits and revisions from all authors.
Acknowledgements
We thank Thouis Ray Jones for helping with software development for in-situ sequencing. We also gratefully acknowledge Jay H. Lee, Evan Daugharty, Brian Turczyk, Rich Terry, Samuel Inverso for Rigel Chan, Elaine Lim, Dima Ter-Ovanesyan, Alexander Shineman Garruss and Alejandro Chavez for engaging discussion, advice and insights on various steps. We also thank Daniel Collett (IDT) for support on oligo probe synthesis. We gratefully acknowledge funding and support from The Center for Genomically Engineered Organs (CGEO, RM1HG008525, and NHGRI), Department of Energy (DOE, DE-FG02-02ER63445), NIH 1R01MH113279-01 (NIH, G.C. and B.A.Y.) and The Wyss Institute for Biologically Inspired Engineering at Harvard University.