Abstract
RNA is a critical component of chromatin in eukaryotes, both as a product of transcription, and as an essential constituent of ribonucleoprotein complexes that regulate both local and global chromatin states. Here we present a proximity ligation and sequencing method called Chromatin-Associated RNA sequencing (ChAR-seq) that maps all RNA-to-DNA contacts across the genome. ChAR-seq provides unbiased, de novo identification of targets of chromatin-bound RNAs including nascent transcripts, chromosome-specific dosage compensation ncRNAs, and genome-wide trans-associated RNAs involved in co-transcriptional RNA processing.
Introduction
Much of the eukaryotic genome is transcribed into non-coding RNA (ncRNA), and several studies have established that a subset of these ncRNAs form ribonucleoprotein complexes that bind and regulate chromatin1–3. Some of the most well studied ncRNAs are those involved in dosage compensation, which include roX1 and roX2 in Drosophila and Xist in mammals. In Drosophila, roX1 and roX2 are part of the male-specific lethal (MSL) complex that coats the single male X chromosome to induce H4K16 acetylation and increase transcription4. In female mammals, Xist is expressed from a single locus on the X and coats one of two X chromosomes in order to silence transcription5. Other ncRNAs, such as HOTAIR6,7 HOTTIP8, and enhancer RNAs9, have been shown to regulate expression of specific genes by localizing to chromatin and recruiting activating or repressing proteins. Finally, repetitive ncRNA transcripts have roles at chromosomal loci essential in maintaining genomic integrity over many cell divisions, including TERRA at telomeres10 and alpha-satellites near centromeres11. Despite these well-studied examples, other functional ncRNAs are likely yet to be discovered, the genomic targets of most chromatin-associated ncRNAs are unknown, and the mechanisms by which these ncRNAs regulate chromatin remain largely unexplored.
Genomic methods for studying the localization of specific RNA transcripts include ChIRP7, CHART12, and RAP13. These techniques hybridize complementary oligonucleotides to pull down a single target RNA and then identify its DNA-or protein-binding partners using next generation sequencing or mass spectrometry13. However, de novo discovery of chromatin-associated RNAs remains limited to computational predictions1 or association with previously known factors14. Alternately, nuclear fractionation allows isolation of bulk chromatin and subsequent identification of chromatin bound RNAs via sequencing, but does not provide sequence-resolved maps of RNA binding locations along the genome15. To overcome these limitations, we have developed ChAR-seq, a proximity ligation and sequencing method (Figure 1A) that both identifies chromatin-associated RNAs and maps them to genomic loci (Figure 1B).
Results
We developed and performed ChAR-seq using Drosophila melanogaster CME-W1-cl8+ cells, a male wing disc derived cell line with a normal karyotype and well-characterized epigenome and transcriptome16,17. ChAR-seq is an in situ method18 for capturing genome-wide RNA-to-DNA contacts. Briefly, cells are cross-linked with formaldehyde and permeabilized, then RNA is partially fragmented and soluble RNA is washed away. The chromatin-cross-linked RNA is then ligated to an oligonucleotide duplex ’bridge’ molecule and reverse transcribed. Genomic DNA is then digested and ligated onto the other end of the oligonucleotide ’bridge’, creating a link between chromatin-associated RNA and proximal DNA. Finally, the ligated RNA is fully converted to cDNA by RNase H digestion and second strand synthesis, and the chimeric molecules are purified, processed, and sequenced.
To enable the capture and analysis of RNA-to-DNA contacts, the oligonucleotide bridge (see Supplementary Figure 1) was designed to have several key features: 1) the 5’-adenylated end (5’App) enables increased ligation specificity for 3’-terminated ssRNA in the presence of truncated T4 Rnl2tr R55K K227Q ligase19 (Supplementary Figure 2), 2) the sequence of the bridge does not exist in the yeast, fly, mouse or human genomes and encodes a defined polarity, 3) the end of the bridge contains a restriction site for specific ligation to DNA, and 4) the bridge is biotinylated so that it can be captured and enriched. After the bridge was ligated to RNA in situ, the molecules were stabilized by reverse transcription using Bst3.0 polymerase, which can traverse the DNA-RNA junction. The genomic DNA was then digested using the restriction enzyme DpnII, which produces a median fragment size of 200 bp (Supplementary Figure 3). The digested genomic DNA was then ligated to the bridge adaptor using T4 DNA ligase.
Upon conversion of RNA-DNA contacts to a covalent chimera, the chimeric molecules were sequenced using 152 bp single-end reads. Sequencing across the bridge junction ensures identification of the RNA and DNA portions of the chimeric molecule by reading the polarity of the bridge (Figure 1B). The RNA/cDNA (Figure 1B, red) and the genomic DNA side (Figure 1B, black) of each read were computationally split and aligned to the transcriptome and genome. After post-processing for unique alignments, repeat removal, and removal of blacklisted regions, each RNA molecule was mapped to the single genomic location to which it was ligated (see Extended Methods and Supplementary Figure 4), resulting in 24.3 million high-confidence unique mapping events for 16,817 RNA transcripts. All individual RNA-to-DNA contacts for a given transcript were then combined to produce a genome-wide association map for each individual transcript (Figure 1B). To ensure that ChAR-seq signal was not due to spurious bridge-to-DNA ligation, we performed a control experiment in which we added RNase A and RNase H to lysed cells before the RNA-to-bridge ligation. This RNase-treatment dramatically reduced inthe number of bridge molecules identified, demonstrating that bridge ligation is indeed RNA-dependent (Supplementary Figure 5).
Only the 3’-hydroxyl of each RNA is available for ligation to the bridge, thus the polarity of each RNA molecule with respect to its transcriptional direction can be determined by its orientation with respect to the polarity of the bridge. The majority (85% of total) of the RNAs captured in our assay were sense, with the largest single subtype represented by sense-stranded mRNA (32% of total), due to the capture of nascent transcripts (Figure 1C). Most of the chromatin-associated antisense transcripts that we identified arose from ncRNA or intronic regions. In fact, 96% of the antisense mRNAs were intronic in origin with 64% of these originating from a single 119 kb gene (CG42339). suggesting the presence of unannotated ncRNAs in this region. The remaining chromatin-associated RNA detected in our assay arose from non-protein coding transcription (see Figure 1C), of which 18% was small nucleolar RNA (snoRNA) and 19% was small nuclear RNA (snRNA).
ChAR-seq generated RNA-to-DNA contacts can be aggregated (Figure 1D, see e.g., Total RNA). grouped by RNA class (Figure 1D, see e.g., mRNA) or viewed individually (Figure 1D). Individual RNAs mapped by ChAR-seq generally fell into one of three classes. In the first class, RNAs were found around the locus from which they are transcribed (Figure 1D, see, e.g., Hsromega, chinmo, ten-m). In the second class, RNAs were found bound to chromatin in trans, generally distributed across most or all of the genome, often in addition to a peak around the gene body from which the RNA is transcribed (Figure 1D, see e.g., snRNA:U2, snRNA:7SK). In a third class, RNAs that are part of the dosage compensation complex (Figure 1D, see roX1 and roX2) were enriched on and coat the X chromosome. To investigate this first class of RNAs, we compared aggregated RNA-to-DNA contacts with data from nascent transcription sequencing using PRO-seq20, and observed qualitative agreement between PRO-seq and ChAR-seq data sets (Figure 1E, see PRO-seq and Total RNA). Nevertheless, most RNA-to-DNA contacts in our dataset are associated in trans to genomic regions outside of the gene body from which the RNA is transcribed. For example, RNAs with strong enrichment over their own gene body, such those in class I, have on average ~20% cis contacts (Supplementary Figure 6).
ChAR-seq data can be visualized in a two-dimensional contact plot, where the genomic locus from which the RNA is transcribed is represented on the y-axis in linear genome coordinates, and the x-axis defines the genomic location where each RNA was bound. These plots provide a useful overview visualization for of the entire dataset. When we generated these contact plots for ncRNA (Figure 2A), mRNA (Figure 2B) and snRNA (Figure 2C), we observed strong horizontal lines that represent RNA transcripts that are transcribed from a single locus but are found distributed throughout the genome (class II), or in the special case of roX1 and roX2, specifically along the X chromosome (class III). Furthermore, RNAs found at sites from which they are transcribed clustered tightly along the diagonal, a feature most pronounced for mRNAs (class I) (Figure 2B). Many of the RNAs we found distributed broadly across the genome are bona fide small nuclear RNAs (snRNAs) associated with transcription (Figure 2C). In fact, one of these, snRNA:7SK, is an abundant snRNA that functions as a scaffold for a large, transcription controlling ribonucleoprotein complex that includes p-TEFb, Hexim and LARP7, while other broadly distributed snRNAs are components of the spliceosome (e.g., snRNA:U2). which largely functions co-transcriptionally21.
To identify RNAs highly enriched for substantial chromatin interactions, we plotted the normalized cumulative distribution of the number of contacts observed for each gene (Figure 2D). The majority of the RNAs in our dataset (15,020 out of 16817, 89%) had fewer than 10 FPKM contacts (Figure 2D), and were excluded from further analysis. The remainder of the 1797 RNAs (11%) accounted for 84% (20.4 million) of all chromatin contacts in our data set. To estimate the contribution of total RNA abundance to this interaction signal, we performed RNA-seq to for the CME-W1-cl8+ cell line and compared RNA expression levels with RNA-to-DNA contacts identified by ChAR-seq (Figure 2E). Unsurprisingly, we observed a correlation between RNA expression and chromatin-RNA contacts; however, a cluster of RNAs clearly generated more chromatin interactions that would be expected from the overall expression levels (Figure 2E). Using both the number of contacts and the fold-enrichment over RNA expression, we identified 73 RNAs that had more than 100 FPKM contacts and were enriched more than four-fold above expectation, though many were enriched by 2-5 orders of magnitude (Figure 2E, red symbols; Supplementary Figure 7-8).
We developed ChAR-seq using the male WME-cl8+ line, reasoning that the ncRNAs roX1 and roX2 would serve as an internal positive control. Both roX1 and roX2 are part of the MSL2 complex, which binds across the X-chromosome in male flies to recruit chromatin-modifiers that increase transcriptional output (Figure 3A)22. Indeed, ChAR-seq data showed roX1 and roX2 to be 7.6-fold (p-value < 10−10) and 8.1-fold (p-value < 10−10) enriched for interactions on the X chromosome, respectively (Figure 3B,C). In contrast, female flies express Sex lethal (Sxl). which binds to msl2 mRNA to prevent its translation, blocking assembly of the MSL2 complex22. Importantly, roX1 and roX2 require MSL2 for X-chromosome specific localization22, therefore female cells should lack detectable spreading of these ncRNAs along the X-chromosome. When we performed ChAR-seq in a female Drosophila melanogaster cell line, Kc167, we did not detect any significant roX2 localization on the X chromosome (Figure 3D), but observed excellent agreement in interaction signal from other RNAs across both cell lines (Supplementary Figure 9, Figure 3C male, CME-W1-cl8+ and Figure 3D, female, Kc167, see e.g., snRNA: 7SK and Hsromega).
High-resolution maps of roX1 and roX2 localization have previously been generated using ChIRP-Seq, which hybridizes probes against a known RNA and pulls down the associated chromatin for sequencing7,23. Comparing ChIRP-seq to ChAR-seq for both roX1 and roX2 (Figure 3E), we found that DNA contact locations were in surprisingly good agreement despite the fact that ChAR-seq reads are spread across all RNAs while ChIRP-seq reads map the specific RNA target, resulting in a large disparity in the effective sequencing depth between the methods. In ChIRP-seq, virtually all of the signal is attributable to interactions between chromatin and the target RNA. In contrast, ChAR-seq captures all RNA and DNA contacts, so that any given target RNA will comprise a subset of the total RNA-chromatin contacts in the dataset. In the case of roX1 and roX2, we observed 32,308 and 87,453 contacts, representing 0.1% and 0.36% of the ChAR-seq dataset. In contrast, the ChIRP-seq datasets plotted in Figure 3E represent ~24M and ~21M reads for roX1 and roX2, respectively. This indicates that ChAR-seq can identify RNA peaks along chromatin with high sensitivity for a given RNA.
The resolution with which we can measure the localization of an RNA to a given genomic site constrains our ability to assess its potential modes of action. To measure the accuracy of ChAR-seq measurements of RNA interaction with DNA, we used the ChIRP-seq data set to calculate the base-pair resolution of the method. We expected this resolution to be bounded—in part—by the local DpnII cut frequency (Supplementary Figure 3) and the number of contacts for any given RNA. We divided the X chromosome into evenly sized bins and calculated correlation coefficients between ChIRP-seq and ChAR-seq datasets at increasing bin sizes for both roX1 and roX2 (Figure 3F). Using this method, we noted a bi-phasic increase of the correlation coefficient, corresponding to a minor plateau around 200 bp and a major plateau at ~25 kbp. The minor plateau is likely due to the DpnII distribution bias in the ChAR-seq tracks, while the major plateau is an estimate of the resolution of our assay, which is on the order of other proximity-ligation sequencing assays like Hi-C24.
To test if we could identify the functional roles for our most highly enriched RNAs, we clustered the snRNA class of RNAs based on their genomic contacts. These snRNAs collectively comprised 23% of all the RNA-to-DNA contacts in our dataset (Figure 4A) and are a substantial component of the spliceosome, a multi-megadalton ribonucleoprotein complex that catalyzes pre-mRNA splicing25,26. The composition and conformation of the spliceosome is highly dynamic, though two dominant species exist in eukaryotes: the major spliceosome comprised of U1, U2, U4, U5 and U6 snRNAs, and the minor spliceosome comprised of U4:atac, U6:atac, U5, U11, and U1226. Many members of this class of snRNAs have highly similar gene duplication variants in the Drosophila genome. We therefore first calculated the base sequence similarity of these variants to one another and aggregated signals that were tightly clustered (Supplementary Figure 10). When we then correlated genome-wide binding signal within this class, we found that the distribution patterns of the major spliceosome snRNAs U1, U2, U4, U5, U6 clustered together along with snRNA:7SK (Figure 4B), which is part of the p-TEFb complex that relieves pausing of RNA Polymerase II at promoters27 and may participate in the release of paused polymerase during RNA splicing 28. The components of the minor spliceosome did not cluster together, likely due to their low abundance26 and consequently low representation in our dataset.
We next reasoned that spliceosome RNAs—as part of the co-transcriptional RNA processing machinery—should also be enriched in gene bodies. We therefore aggregated spliceosomal RNA signals over gene bodies (Figure 4C, red lines). putative enhancers29 (Figure 4C, blue dashed lines) and a random distribution of genomic bins of similar size (Figure 4C, black lines). We observed an enrichment of snRNAs (7SK, U2 and U6), but not roX1 or roX2, over gene bodies (Figure 4C) with a broad peak around transcription start sites, in good agreement with ChIRP data for 7SK in mice30.
In contrast to the small number of well-defined and well-characterized snRNAs involved in splicing, there are more than 200 snoRNAs in flies31 that are significantly divergent in sequence and, surprisingly, were highly represented in our dataset (Supplementary Figure 7-8 and Figure 1C). Most of these snoRNAs have either unknown function or are computationally identified and indirectly implicated in the maturation and modification of ribosomal rRNA31.
To determine if our enriched chromatin-associated RNAs, in particular snRNAs and snoRNAs, might localize to euchromatic or heterochromatic states or with specific transcription factors, we cross-correlated our ChAR-seq signal against modENCODE datasets available for the CME-W1-cl8+ cell line. To normalize the signals for comparison, we first calculated the expected contacts per 2 kb bin for each RNA under a uniform distribution, based on the total number of genome-wide contacts for each RNA and the number of DpnII sites per bin. This null model was then used to calculate the log2 ratio of the observed to the expected contacts per bin for each RNA, which was then transformed into a z-score ((x−μ)/σ) based on the whole-genome mean (μ) and standard deviation (σ). Similarly, we re-binned the modENCODE tracks, removed bins that did not contain a DpnII site, and transformed the log2 mean-shift values to a z-score. We then calculated the pair-wise Pearson correlation coefficients between each signal track, and then clustered the data (Figure 4D). We observed discrete clustering of roX1 and roX2 with known dosage compensation complex factors, MOF, the histone modifications H4K16ac and H3K36me332, and JIL-1 kinase33, validating this analytical approach (Figure 4D). Beyond this sub-cluster of dosage compensation factors, the remainder of the chromatin-associated RNAs fell into two distinct and anti-correlated categories: those associated with active chromatin and transcription (e.g., RNAPII, H4K8ac, H3K18ac) or heterochromatin (e.g., HP2, H3K9me3, HP1a) (Figure 4D). In particular, we note that snRNA:U2 and snRNA:7SK cluster tightly with the transcription-associated chromatin marks, while many of the snoRNAs and minor spliceosome snRNAs that we identified are associated with heterochromatin, likely due to co-localization of heterochromatin factors to the nucleolus. Interestingly, snRNA:U5, a component of both the major and minor spliceosome, has variants that clearly cluster with either transcriptionally active chromatin (63BC) or heterochromatin (23D, 38ABa, and 34A). Previous work has shown that the snRNA:U5:38Aba variant (Figure 4D, heterochromatin cluster) exhibits a unique tissue-specific expression profile with the greatest abundance in neural tissue, which led the authors to propose isoform-dependent functions in alternative splicing34. The differential clustering that we observe for snRNA:U5, and indeed between major and minor spliceosome snRNAs, between euchromatin and heterochromatin might reflect such isoform-specific functions of the spliceosome in different chromatin states.
Discussion
ChAR-seq maps the chromosomal binding sites of all chromatin-associated RNAs, independent of whether they are associated as nascent transcripts or bound as part of ribonucleoprotein complexes (RNPs). In this way, ChAR-seq can be thought of as a massively parallelized de novo RNA mapping assay capable of generating hundreds to thousands of RNA-binding maps. ChAR-seq also detects multiple classes of chromatin-associated RNAs. We validated ChAR-seq using chromosome-specific ncRNAs roX1 and roX2 associated with dosage compensation. The comparison between ChAR-seq and ChIRP-seq, which vary dramatically in the sequencing depth needed to analyze a specific RNA, highlights the utility of ChAR-seq as a de novo chromatin-associated RNA discovery tool. ChAR-seq also maps nascent RNAs found at the loci from which they are transcribed. ChAR-seq is similar to a recently published method38, but has two key distinctions. First, proximity ligations are performed in situ in intact nuclei, which reduces nonspecific interactions18. Second, ChAR-seq uses long single-end reads to sequence across the entire junction of the ‘bridge’, ensuring that RNA-to-DNA contacts are mapped with high confidence and reporting on the polarity of the bridge-ligated RNA.
We used ChAR-seq to discover and map several dozen ncRNAs that are pervasively bound across the genome. Many of these ncRNAs are components of ribonucleoprotein complexes associated with transcription elongation (snRNA:7SK). splicing (snRNA:U2, etc) and RNA processing (snRNAs, snoRNAs and scaRNAs). Interestingly, more than half of the chromatin-associated RNAs identified based on our enrichment criteria are snoRNAs, most of which—but not all—correlate with heterochromatin. Generally, snoRNA ribonucleoproteins (snoRNPs) use intermolecular base pairing to direct chemical modification of the 2’-hydroxyl groups or the isomerization of uridines to pseudouridine3 and snoRNAs are known abundant components of chromatin in both flies35 and in mice36. Despite their abundance and the their known role in RNA modification, we do not yet understand the functions of these modifications, or the implication of the abundance of snoRNAs and scaRNAs in cells or associated with chromatin3. Finally, we demonstrate that ChAR-seq can be used with orthogonal genome-wide datasets to identify and classify RNAs that are associated with specific chromatin states (e.g., euchromatin vs heterochromatin), which we expect will be particularly useful in higher organisms that use lncRNAs such as HOTAIR, HOTTIP and BRAVEHEART as scaffolds for ribonucleoproteins that regulate facultative heterochromatin.
We anticipate that ChAR-seq will be a powerful new high throughput discovery platform capable of simultaneously identifying new chromatin-associated RNAs and mapping their chromatin binding sites (and associated epigenetic chromatin states), all of which will be particularly useful in comparing ‘epigenomic’ changes that coincide with cellular differentiation and/or tumorigenesis.
Author contributions
JCB, VIR, WLJ and DJ conceived of the idea and planned experiments. JCB, NAT and OKS prepared all ChAR-seq libraries. DJ prepared ATAC-seq libraries. JCB, NAT, VIR, DJ and OKS processed and analyzed data. AFS, WJG, and JMS provided advice and material support. All authors discussed and interpreted the results and contributed to the writing and editing of the manuscript.
Method Summary
Drosophila melanogaster CME-W1-cl8+ cells (Drosophila Genome Resource Center) were grown in T-75 flasks at 27°C in Shields & Sang M3 media supplemented with 5 μ g/mL insulin, 2% FBS, 2% fly extract and 100 μg/mL Pen-Strep16. Approximately 100-400 million cells were harvested for each library by centrifugation at 2000 × g for 2-4 minutes, resuspended in fresh media plus 1% formaldehyde and fixed for 10 minutes at room temperature. Fixation was quenched by adding 0.2 M glycine and mixing for 5 minutes at room temperature. Cells were centrifuged at 2000 × g for 2-4 minutes, resuspended in 1 mL of PBS, and centrifuged again at 2000 × g for 2 minutes. The supernatant was aspirated and discarded, and the cell pellet was flash frozen in liquid nitrogen and stored at −80C until needed. Cells were thawed in lysis buffer and the cross-linked nuclei and cellular material were isolated by centrifugation for the in situ ligation protocol (see Extended Methods for details).
Briefly, RNA was lightly and partially chemically fragmented by heating in the presence of magnesium. The pellet was isolated and washed, and RNA ends were ligated using truncated T4 Rnl2tr R55K K227Q ligase (hereafter referred to as trT4KQ RNA ligase) to an oligonucleotide ’bridge’ molecule containing a 5’-adenylated ssDNA overhang. The RNA ligase was inactivated, the pellet was washed and the RNA strand was stabilized by first strand synthesis of the RNA through extension of the bridge by Bst 3.0 polymerase. The polymerase was inactivated and the pellet was washed. Genomic DNA was then digested with DpnII, followed by ligation of the DpnII digested genomic DNA to the opposite end of the oligonucleotide ’bridge’. Second strand synthesis was then performed using RNaseH and DNA Polymerase I to complete cDNA synthesis of the RNA-encoded side of the new, chimeric molecules. The sample was then deproteinized and crosslinks were reversed by heating overnight in SDS and proteinase K. DNA was then ethanol precipitated and sheared to ~200 bp fragments using a Covaris focused ultra-sonicator. DNA fragments containing the biotinylated bridge were then purified using magnetic streptavidin-coated beads. DNA ends were repaired using the NEBNext End Repair and dA tailing module, and ligated to NEBNext hairpin adaptors for Illumina sequencing. The adaptor hairpin was cleaved using USER, and DNA fragments were amplified by ∼8-12 rounds of PCR with NEBNext Indexing Primers for Illumina (TruSeq compatible). The partially amplified library was then purified using AmPure XP beads to remove adaptor dimers, and the optimum number of additional PCR cycles was determined by qPCR to achieve approximately 30% saturation. Library amplification was then completed by the additional rounds of PCR, and the library was purified and size selected to a target range of 100-500 bps using AmPure XP beads. The size distribution of the library was checked by capillary electrophoresis using an Agilent Bioanalyzer, and quantified using qPCR against a phiX Illumina library standard curve. Libraries were sequenced using the Illumina MiSeq platform for quality control, and subsequently sequenced on the Illumina NextSeq platform (Stanford Functional Genomics Facility) using single-end 152 bp reads. Data was processed and analyzed using a custom pipeline (see Extended Methods for details).
Acknowledgements
This project was funded by a Stanford Center for Systems Biology (NIH P50 GM107615) Seed Grant to JCB, DJ, VIR & WLJ. JCB and DJ were also supported by NIH Ruth Kirchstein National Research Service Awards (F32GM116338 to JCB) and (F32GM108295 to DJ). JCB was also supported by the Stanford School of Medicine Dean’s Fellowship. VIR was supported by the Walter V. and Idun Berry Fellowship. NAT was supported by the Stanford Genetics Training Program (5T32HG000044-19). OKS was supported by the Molecular Pharmacology Training Grant (NIH T32-GM113854-02). WLJ was supported by a NIH T32 Training Fellowship (GM007276) and the National Science Foundation Graduate Research Fellowship (DGE-114747). We acknowledge support from NIH RO1 HD085135 to JMS and AFS, HHMI-Simons Faculty Scholar Award to JMS, NIH grants P50HG00773501 and R21HG007726 to WJG, and R01GM106005 to AFS. We would like to thank Julia Salzman, Kyle Eagen, and members of the Straight, Greenleaf and Skotheim labs for thoughtful discussions. We thank Lucy O’Brien for sharing equipment. We acknowledge the Drosophila Genomics Resource Center (NIH grant 2P40OD010949-10A1) for providing cell lines and the Stanford Functional Genomics Facility for providing sequencing services.