Scalable Design of paired CRISPR Guide RNAs for Genomic Deletion ================================================================ * Carlos Pulido-Quetglas * Estel Aparicio-Prat * Carme Arnan * Taisia Polidori * Toni Hermoso * Emilio Palumbo * Julia Ponomarenko * Roderic Guigo * Rory Johnson ## Abstract Using CRISPR/Cas9, diverse genomic elements may be studied in their endogenous context. Pairs of single guide RNAs (sgRNAs) are used to delete regulatory elements and small RNA genes, while longer RNAs can be silenced through promoter deletion. We here present CRISPETa, a bioinformatic pipeline for flexible and scalable paired sgRNA design based on an empirical scoring model. Multiple sgRNA pairs are returned for each target. Any number of targets can be analyzed in parallel, making CRISPETa equally appropriate for studies of individual elements, or complex library screens. Fast run-times are achieved using a precomputed off-target database. sgRNA pair designs are output in a convenient format for visualisation and oligonucleotide ordering. We present a series of pre-designed, high-coverage library designs for entire classes of non-coding elements in human, mouse, zebrafish, *Drosophila* and *C. elegans*. Using an improved version of the DECKO deletion vector, together with a quantitative deletion assay, we test CRISPETa designs by deleting an enhancer and exonic fragment of the *MALAT1* oncogene. These achieve efficiencies of ≥50%, resulting in production of mutant RNA. CRISPETa will be useful for researchers seeking to harness CRISPR for targeted genomic deletion, in a variety of model organisms, from single-target to high-throughput scales. Keywords * CRISPR * Cas9 * sgRNA * knockout * deletion * genome editing * long noncoding RNA * lncRNA * loss of function * enhancer * microRNA * MALAT1 ## Introduction CRISPR/Cas9 is a simple and versatile method for genome editing that can be applied to deleting virtually any genomic region for loss-of-function studies. Recent vector tools have been developed for complex library cloning that are compatible with pooled screening (1,2). Whether performing pooled screens on hundreds of targets, or deletion of a single target, researchers need to design efficacious pairs of sgRNAs. We present here a flexible and scalable software pipeline to address the needs of both types of project. CRISPR/Cas9 makes it possible to investigate the function of genomic elements in their endogenous genetic context. The Cas9 nuclease is recruited to desired genomic sites through its binding to an engineered, single guide RNA (sgRNA) (3). Early studies focussed on protein coding genes, utilizing individual sgRNAs to induce small indel mutations in genomic regions encoding target proteins’; open reading frame (ORFs). Such mutations frequently give rise to inactivating frameshift mutations, resulting in complete loss of function (4,5). The delivery of a single sgRNA in such experiments is technically straightforward, and can be scaled to genome-wide, virally-delivered screens. CRISPR has also been brought to bear on non-coding genomic elements, including regulatory regions and non-coding RNAs, which have traditionally resisted standard RNA interference (RNAi) (6,7). Unlike coding genes, functional knockout of non-coding elements with a single sgRNA is probably not practical, because small indel mutations caused by single sgRNAs are less likely to ablate function. Instead, a deletion strategy has been pursued: a pair of sgRNAs are used to recruit Cas9 to sites flanking the target region (1,7). Simultaneous strand breaks are induced, and non-homologous end joining (NHEJ) activity repairs the lesion. In a certain fraction of cases, this results in a genomic deletion with a well-defined junction. Cas9 targeting is achieved by engineering the 5’ region of the sgRNA. This hybridises to a complementary “protospacer” region in DNA, immediately upstream of the “protospacer adjacent motif” (PAM) (8). For the most commonly used *S. pyogenes* Cas9 variant, the PAM sequence consists of “NGG”. A growing number of software tools are available for the selection of optimal protospacer targeting sequences (9-15). The key selection criteria are (1) the efficiency of a given sequence in terms of generating mutations, and (2) “off-targeting”, or the propensity for recognising similar, yet undesired, sites in the genome. Based on experimental data, scoring models for on-target efficiency have been developed, for example that presented by Doench et al (13). At the same time, tools have become available for identifying unique sgRNA sites genome-wide, mitigating to some extent the problem of off-targeting (16). However, few tools presented so far are designed for large-scale designs, and to the best of our knowledge, none was created to identify optimal sgRNA *pairs* required for deletion studies. To address this need, we here present a new software pipeline called CRISPETa (CRISPR Paired Excision Tool) that selects optimal sgRNAs for deletion of user-defined target sites. The pipeline has two useful features: first, it can be used for any number of targets in a single, rapid analysis; second, it returns multiple, optimal pairs of sgRNAs, with maximal predicted efficiency and minimal off-target activity. The pipeline is available as both standalone software and as a user-friendly webserver. In addition, we make available a number of pre-designed deletion libraries for various classes of non-coding genomic elements in a variety of species. Finally, we validate CRISPETa predictions experimentally by means of an improved version of the published DECKO deletion technique (1). Using a quantitative deletion assay, we find that CRISPETa predictions are highly efficient in deleting fragments of a human gene locus’; resulting in detectable changes to the cellular transcriptome. CRISPETa is available at [www.crispeta.crg.eu](http://www.crispeta.crg.eu). ## Materials and Methods ### Details of CRISPETa Code The pipeline is outlined in Figure 1A. As input, CRISPETa requires a standard BED6-format file describing all target regions. This file must contain coordinates of one or more targets. Unstranded entries are assigned to the + strand, while those without identifiers are assigned a random ID. CRISPETa first defines design regions based on parameters *g/du/dd/eu/ed* (see Table 1 for full list of parameters) (Figure 1A,B), and extracts their sequences using the BEDtools *getfasta* function. Design regions are searched for canonical PAM elements (NGG) using a regular expression. For every such PAM, a total of 30 nucleotides (NNNN[20nt]NGGNNN) are stored. Protospacers containing the RNA Pol III stop sequence (TTTT) are removed. ![Figure 1:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2016/05/22/052795/F1.medium.gif) [Figure 1:](http://biorxiv.org/content/early/2016/05/22/052795/F1) Figure 1: Overview of CRISPETa pipeline. (A) Schematic of CRISPR-mediated genomic deletion. The aim is elimination of the Target region through recruitment of a pair of Cas9 proteins. Red boxes represent protospacers, the 20 bp upstream of a PAM and recognised by the sgRNA. (B) The CRISPETa workflow. (C) Sequence preferences that drive the efficiency prediction score. Note that positions outside the protospacer are also predictive of efficiency. View this table: [Table 1.](http://biorxiv.org/content/early/2016/05/22/052795/T1) Table 1. User-defined parameters. View this table: [Table2](http://biorxiv.org/content/early/2016/05/22/052795/T2) Next, candidate protospacers are searched against a precomputed, database-stored list of potential protospacers and their number of similar sequences with up to 4 mismatches, genome-wide (see “Off-target analysis” section, below). By default, protospacers with one or more off-targets with ≤2 mismatch are discarded (this cutoff can be modified by the user through parameter *t*). Remaining protospacers are then compared with the positive and negative mask BED files using BEDtools *intersectBed.* Candidate sequences not fully overlapping the positive mask file, or overlapping the negative mask by one basepair, are tagged as “disfavoured”. Next, 30mer regions encompassing remaining protospacers, including disfavoured ones, are assigned an efficiency score (see below) between 0 and 1, and those above the score threshold (controlled by parameter *si*) are carried forward. Next, candidate sequences are assembled into pairs and filtered. For each target region, all possible pairs of upstream and downstream candidates are generated. If pairs are designed for DECKO cloning (which utilizes the U6 promoter for the 5’ sgRNA gene, controlled by c), an additional filter is applied: sgRNA pairs are rearranged as necessary, such that all output pairs have the first sgRNA starting with G, and any pairs where neither commences with G are removed. A combined score for the resulting pairs is computed. By default, this is the sum of the two individual sgRNA scores, but users may choose to define the pair score as the product of individual scores (parameter *sc*). Pairs are now filtered with a pair score threshold, and ranked by score (or, optionally, reversed rank by distance, parameter *r*). An optional “diversity” cutoff can be used to remove pairs such that no individual candidate sequence appears in more than a given fraction of returned pairs (parameter *ν*). Finally the program returns the top ranked pairs up to the maximum number specified by the user, *n*. CRISPETa is implemented in Python and available for download from git-hub and the CRISPETa web-server (see availability below). ### Target features and mask files All target sets and mask files were prepared in BED format, and obtained in April 2016. Coding genes were obtained from the Gencode v19 annotation, filtered for the “protein_coding” biotype (17). CTCF binding sites for GM12878 cells were downloaded from ENCODE data hosted in the UCSC Browser (18). Enhancers were obtained from Vista (19). Pre-miRNAs were obtained from miRBASE (20). Disease-associated SNPs were obtained from the GWAS database ([http://www.ebi.ac.uk/gwas/api/search/downloads/full](http://www.ebi.ac.uk/gwas/api/search/downloads/full)). Ultraconserved regions were obtained from UCNEbase (21). For human positive and negative masks we used DNaseI hypersensitive sites identified through genome-wide profiling in 125 diverse cell and tissue types by the ENCODE consortium (22) and RepeatMasker repetitive regions (23)’ respectively. To generate random intergenic locations’; the entire span of all Gencode v19 genes (both coding and noncoding’ introns and exons)’ in addition to 100 kb up-and downstream’ were subtracted. Random locations were selected within the remaining regions. ### Off-target analysis Off-target analysis was performed using Crispr-Analyser (16). We searched for all canonical PAM regions (NGG) in the genome and stored the 20nt that precedes each. Then using “search” and “align” options we obtained the number of off-targets with 0,1,2,3 and 4 mismatches for each unique 20mer. This data was stored in a MySQL database. Precomputed files containing this information for various genomes can be directly downloaded (see **CRISPETa availability** section). Downloadable files contain 6 comma-separated fields in this order: sequence of the sgRNA without the PAM sequence and the number of off-targets with 0,1,2,3 and 4 mismatches for this sgRNA. These files can be used as input for CRISPETa-MySQL module to generate the MySQL database. ### CRISPETa availability and webserver CRISPETa can be run through the web-server ([http://crispeta.crg.eu](http://crispeta.crg.eu)) or locally. The software runs on python2.7. In order to run CRISPETa locally two additional programs are required: BEDtools and MySQL. Source code to run locally can be found on git-hub ([https://github.com/guigolab/CRISPETA](https://github.com/guigolab/CRISPETA)) and also on “Get CRISPETa” section of the web-server. Source code consist of two scripts: CRISPETA.py that execute the main pipeline described above’ and crispeta_mysql.py that helps users to create the off-target MySQL database. Two other files can be found within the source code: func.py that contains all functions necessary to execute the two main scrips’ and config.py that stores the information needed to login to MySQL. ### sgRNA scoring algorithm CRISPETa uses the scoring method developed by Doench et al (13)’ based on an experimentally trained logistic regression model employing 72 sequence features. The code was downloaded from [http://www.broadinstitute.org/rnai/public/analysis-tools/sgrna-design-v1](http://www.broadinstitute.org/rnai/public/analysis-tools/sgrna-design-v1). ### Benchmarking A test target set contains 1000 random elements from each of the individual target annotations’; for a total of 7000. Benchmarking analyses were run on a workstation running CentOS6’ 86.6 Gb of memory and 12 CPUs (Intel(R) Xeon(R) CPU E5649 @ 2.53GHz). ### DECKO2 design and molecular cloning A detailed protocol for DECKO2 molecular cloning is available from Supplementary File 1. Selected sgRNA pairs were converted to overlapping series of 6 oligonucleotides (Figure 4B’ Supplementary File 5) using a custom design spreadsheet (available as Supplementary File 2). Note that Oligos 3&4 do not vary between experiments. Oligos were synthesised commercially and combined at a final concentration of 0.1μm’ together with 100-200ng of BsmbI-digested backbone pDECKO_mCherry in 10 μl volume’ and 10 μl of 2x Gibson mix. The latter was prepared in house’ according to the protocol described previously (1). We incubated the mixture at 50°C for 1 hour’ and fast-transformed 2 μl of this into 50 μl of z-Stbl3 competent cells (prepared with Mix and Go *E. coli* transformation kit from Zymo Research’ Cat. T3001). Resulting “intermediate” plasmids (corresponding to Figure 4B) were amplified and purified. The Insert-2 fragment (Figure 4C) was amplified from plasmid pDECKO _GFP (1) (Addgene ID XXX) using primers “Scaffold F” and “H1 R” (Supplementary File 4)’ and gel-purified. This was inserted’ by Gibson assembly’ into the intermediate plasmid that had previously been linearized by BsmbI-digestion and column-purified. We sequence-verified these constructs with primers “Seq 3 F/R” (Supplementary File 4). ### Genotyping by PCR gDNA was extracted with GeneJET Genomic DNA Purification Kit (Thermo Scientific). PCR was performed with primers flanking the deleted region (primers “out F/R” in Supplementary File 4). ### QC-PCR assay gDNA was extracted with GeneJET Genomic DNA Purification Kit (Thermo Scientific) and quantitative real time PCR (qPCR) from 1.6 ng of purified gDNA was performed on a LightCycler 480 Real-Time PCR System (Roche). Primer sequences can be found in Table S3 and S4 of Supplementary File 4. Target sequence primers (Enhancer in F / Enhancer out R for enhancer’ Exon in F / Exon out R for exon) were normalised to primers GAPDH F/R amplifying a distal’ non-targeted region. Another non-targeting primer set’ LdhA F/R were treated in the same way. Data were normalised using the ΔΔCt method (24)’ incorporating primer efficiencies. The latter were estimated using a dilution series of gDNA’ and efficiency calculated by the slope of the linear region only (Supplementary File 3). We noted a decrease in efficiency at high template densities. ### Cell culture and DECKO2 knockouts HEK293T were grown in Dulbecco’;s modified Eagle’;s medium (DMEM’ Life Technologies) and IMR90 in Eagle’;s Minimum Essential Medium (EMEM’ ATCC). Media were supplemented with 10% fetal bovine serum (FBS’ Gibco)’ 5% Penicillin-Streptomycin Streptomycin (Life Technologies). Cells were maintained at 37°C in a humid atmosphere containing 5% CO2 and 95% air. Cells were transfected with Lipofectamine 2000 (Life Technologies) following the manufacturer’;s protocol. For lentivirus production, pDECKO_mCherry plasmids was co-transfected into HEK293T cells with the packaging plasmids pVsVg (Addgene 8484) and psPAX2 (Addgene 12260). To create Cas9 stably-expressing cells, we transfected Cas9 plasmids and selected for more than 5 days with blasticidin at 10μg/ml. ### Genomic deletion at low multiplicity of infection For lentivirus production, pDECKO\_mCherry plasmids (pDECKO\_mCherry\_TFRC_B) (3 μg) were cotransfected into HEK293T cells with the packaging plasmids pVsVG (Addgene 8484) (2.25 μg) and psPAX (Addgene 12260) (750 ng) in 10 cm dishes. Viral supernatant was collected after 48 hours and filtered through 0.45 μm cellulose acetate syringe filter. We used between 0.5 and 1 ml of viral supernatant, along with polybrene at a final concentration of 10 μg/ml, to perform overnight infection of IMR90-Cas9BFP cells seeded approximately at 60% confluence in 6 well plates. Media was changed the following day, and half of the cells were analyzed by flow cytometry to ascertain infection rate. Remaining cells were double selected with puromycin and blasticidin for 14 days. Cells were lysed with 50 μl of Lysis Buffer (25 nM NaOH, 0.2 mM EDTA) and heated at 95°C for 30 min (25). The reaction was inactivated with Tris Buffer (40 mM Tris-HCl) and lysates centrifuged for 5 min at 4000 rpm. For genotyping, we performed qPCR directly from cell lysates with primers “TFRC_B out F” and “TFRC_B in R” and normalized against “GAPDH F/R” (see Supplementary File 4). ## Results ### The CRISPETa pipeline for paired sgRNA design We and others have recently demonstrated the cloning of paired CRISPR targeting constructs for deletion of genomic regions. This creates the need for a design pipeline to select optimal pairs of sgRNAs. Our solution is the CRISPETa pipeline, whose principal steps are shown in Figure 1B. The guiding principles of CRISPETa are flexibility and scalability: the user has control over all aspects of the design process if desired (otherwise reasonable defaults are provided), and the design may be carried out on individual targets, or target libraries of essentially unlimited size. The full set of user-defined variables, and their default values, are shown in Table 1. We use here the standard term “protospacer” to designate the 20 bp of genomic DNA sequence preceding the PAM sequence (8), as distinct from the sgRNA sequence itself, composed of the protospacer sequence and the constant, scaffold region. The CRISPETa workflow may be divided into three main steps: target region definition, protospacer selection, and sgRNA pair prioritisation (Figure 1B). Given a genomic target region or regions in BED format, CRISPETa first establishes pairs of “design regions” of defined length in which to search. Design regions may be separated from the target itself by “exclude regions”. The user may also specify “mask regions”: sgRNAs falling within the positive mask are prioritised, whereas those within the negative mask will be de-prioritized (although not removed altogether). Positive masks might include regions of DNaseI-accessible chromatin, while negative masks may be composed of, for example, repetitive regions or compact chromatin. Using this information, the entire set of potential protospacers is defined. First, the design region sequence is extracted and searched for all possible 20mer sites followed by canonical *S. pyogenes* “NGG” PAM sites-candidate protospacers. These are considered with respect to two core metrics: their potential for off-target binding, and their predicted efficiency. Off-targeting, or the number of identical or similar sites with a given number of mismatches, is estimated using precomputed data for each genome. This strategy increases the speed of CRISPETa dramatically. We created off-target databases for five commonly-studied species, human, mouse, zebrafish, *Drosophila melanogaster, Caenorhabditis elegans* (Table 2), varying widely in genome size (Figure 2A). The default off-targeting cutoff is set at (0:1, 1:0, 2:0, 3:x, 4:x), that is, sequences having no other genomic site with ≥2 mismatches. At this default, 77% of candidate protospacers are discarded in human, compared to just 13% in *Drosophila,* reflecting the relative uniqueness and compactness of the latter (Figure 2B). ![Figure 2:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2016/05/22/052795/F2.medium.gif) [Figure 2:](http://biorxiv.org/content/early/2016/05/22/052795/F2) Figure 2: Benchmarking and performance. (A) Genome size and filtered protospacer density for the five species tested. (B) The fraction of protospacers passing filters of off-targeting, efficiency score, and both. The latter are defined as “filtered protospacers”, whose density is shown in (A). Data are displayed as a proportion of the total number of canonical PAM sequences in each genome. (C) The effect on library quality of modifying design variables. Y-axis denotes the percent of target regions, divided by: “successful”, where n=10 distinct sgRNA pair designs are returned per target; “intermediate” designs, where 00.01’ paired *t* test). Conventional PCR genotyping of transfected cells’; DNA showed amplification products consistent with target site deletion for all pDECKO constructs, but not for control cells (Figure 5C, left panels). QC-PCR analysis of independent biological replicates showed loss of ~40% of enhancer target sites for each of the four sgRNA pair designs (Figure 5C, right panels). A non-targeted genomic region was not affected (“Non-targeted”). Higher efficiencies were observed for the exon-targeting constructs, yielding >60% efficiency for the top two sgRNA pairs. We did not observe a strong difference in the deletion efficiency between the four sgRNA pairs targeting the enhancer, although for the exon region, the lower-scoring two constructs displayed reduced efficiency. This underlines the value of using predicted efficiency scores in sgRNA selection, and supports the effectiveness of CRISPETa-predicted sgRNA pairs. ### Mutant RNA arising from genomic mutation We next sought to verify that the engineered deletions in the *MALAT1* exon result in the expected changes to transcribed RNA. cDNA was generated from bulk cells treated with pDECKO vectors targeting MALAT1 exon. Given that cells were not selected, this sample should contain a mixture of RNA from both wild-type and mutated alleles. RT-PCR using primers flanking the targeted region amplified two distinct products, of sizes expected for wild-type and deleted sequence (Figure 6). The specificity of these PCR products was further verified by Sanger sequencing. Therefore, targeted deletions by CRISPETa are reproduced in the transcriptome, and may be used in future dissect RNA functional elements. ![Figure 6:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2016/05/22/052795/F6.medium.gif) [Figure 6:](http://biorxiv.org/content/early/2016/05/22/052795/F6) Figure 6: Observation of mutated MALAT1 RNA. RT-PCR was performed on RNA from bulk cells where MALAT1 exon region was deleted (sgRNA Pair 1’ in two biological replicates)’ or control cells transfected with pDECKO_EGFP. Primers flanking the deleted region were used’ and are expected to amplify fragments of the indicated sizes’; depending on whether the RNA arises from a wild type or a deleted allele. Specificity was ensured by the exclusion of the reverse transcriptase enzyme in control reactions (“RT-”). ### Genomic deletion under screening conditions at low multiplicity of infection In future, the DECKO_mCherry and CRISPETa tools may be used in library screening-type experiments (4). The above experiments were carried out by transfection, meaning that each cell receives multiple copies of pDECKO plasmid. In contrast, library screening requires the integration of a single targeting sequence in each cell. This requirement is met at low multiplicity of infection (MOI) when ≤20% of cells are infected, a condition that can be conveniently monitored by mCherry fluorescence (29). Thus, in a final experiment we sought to test whether, under such conditions of single-copy integration, pDECKO_mCherry remains effective in genomic deletion. For these experiments, we selected IMR90 fibroblasts since they are not transformed and hence more suitable model for phenotypic screening. Using a previously-validated pDECKO_mCherry lentivirus targeting the promoter of the *TFRC* gene (1), we infected cells at decreasing titres. By means of flow cytometry gated on mCherry fluorescence, we could monitor infection rate. Infected cells were cultured under antibiotic selection to create a pure population, then genotyped by PCR (Figure 7). We observed the presence of correctly mutated alleles in cells carrying single-copy pDECKO insertions. Thus, CRISPETa-designed libraries are likely to be suitable for pooled screens. ![Figure 7:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2016/05/22/052795/F7.medium.gif) [Figure 7:](http://biorxiv.org/content/early/2016/05/22/052795/F7) Figure 7: Efficacy of DECKO in screening conditions at low multiplicity of infection. (A) IMR90 cells stably expressing Cas9-BFP were infected with varying volumes of lentiviral supernatant carrying a pDECKO construct targeting the *TFRC* gene (TFRC_B construct’ see (1)). Infection rates were estimated by flow cytometry monitoring fluorescence from the mCherry gene carried by pDECKO. In this example’ infection rate is ~12% (see next panel). (B) Genotyping PCR carried out on genomic DNA from lentivirally-infected cells. The infection rate in the original sample is indicated below. Note that antibiotic selection was subsequently used to remove non-infected cells. Primers flank the deletion site’ and expected amplicon sizes for wild type (unmutated’ “WT”) and deleted alleles are indicated. ## Conclusions We have here presented a versatile and scalable design solution for CRISPR deletion projects. To our knowledge’ CRISPETa is the first tool for selection of optimal sgRNA pairs. A key feature is its scalability’ making it equally suitable for focussed projects involving single target regions’; and screening projects involving thousands of targets. The user has a large degree of control over the design process. On-target efficiency is predicted using the latest’ experimentally-informed design algorithm’ while running speed is boosted by an efficient off-target calculation. In the course of this work we developed an updated lentiviral CRISPR deletion tool. Compared to the original version’ DECKO2 represents a more cost-effective method for individual target knockout. The series of short oligonucleotides required cost less to synthesise from commercial vendors compared to a single 165 nt sequence employed previously (1). DECKO2 has also replaced the second ligation step of the original DECKO by Gibson assembly’ further simplifying the protocol (27). The QC-PCR technique presented here now allows one to quantify and compare the efficiency of CRISPETa designs. For the 8 sgRNA pairs in two regions that we tested’ deletion efficiencies of ~40-60% were consistently observed. The induced deletions’; when occurring within a transcribed region’ are also observed in expressed RNA molecules. The suitability of this approach for screening is demonstrated by the fact that single-copy genomic insertions give rise to deletion of target regions. CRISPR enables us to study the function of non-coding genomic elements in their endogenous cellular context for the first time. The power of CRISPR lies both in its versatility’ but also in its ready adaptation to large-scale screening approaches. The CRISPETa pipeline and experimental methods described here will’ we hope’ be useful for such studies. Supplementary File 1: Extended DECKO2 cloning protocol. Supplementary File 2: Design spreadsheet for creating DECKO2 oligonucleotides. Supplementary File 3: Estimation of QC-PCR primer efficiencies. Supplementary File 4: Oligonucleotide sequences. Supplementary File 5: Detailed figure of 6-oligo Insert-1 cloning. Supplementary File 6: Comparing efficiency of fluorescent Cas9 variants. Supplementary File 7: Details of *MALAT1* sgRNA pairs. ## Acknowledgements We thank members of the Guigo lab and the CRG Bioinformatics and Genomics Programme for many ideas and discussions, in addition to Carlo Carolis of the CRG Biomolecular Screening & Protein Technologies Unit. We thanks John G. Doench and David E. Root (Broad Institute of MIT) for generous advice regarding the implementation of sgRNA efficiency predictions. We acknowledge the administrative support of Romina Garrido. We also acknowledge support of the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013-2017’, SEV-2012-0208. This work was financially supported by the following grants: CSD2007-00050 from the Spanish Ministry of Science, grant SGR-1430 from the Catalan Government, grant ERC-2011-AdG-294653-RNA-MAPS from the European Community financial support under the FP7 and grant R01MH101814 by the National Human Genome Research Institute of the National Institutes of Health, to RG. Ramon y Cajal RYC-2011-08851 and Plan Nacional BIO2011-27220 to RJ. * Received May 21, 2016. * Accepted May 21, 2016. * © 2016, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) ## References 1. 1.Aparicio-Prat, E., Arnan, C., Sala, I., Bosch, N., Guigo, R. and Johnson, R. (2015) DECKO: Single-oligo, dual-CRISPR deletion of genomic elements including long non-coding RNAs. BMC genomics, 16, 846. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/s12864-015-2086-z&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=26493208&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 2. 2.Vidigal, J.A. and Ventura, A. (2015) Rapid and efficient one-step generation of paired gRNA CRISPR-Cas9 libraries. Nature communications, 6, 8083. 3. 3.Jinek, M., East, A., Cheng, A., Lin, S., Ma, E. and Doudna, J. (2013) RNA-programmed genome editing in human cells. eLife, 2, e00471. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.7554/eLife.00471&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23386978&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 4. 4.Shalem, O., Sanjana, N.E., Hartenian, E., Shi, X., Scott, D.A., Mikkelsen, T.S., Heckl, D., Ebert, B.L., Root, D.E., Doench, J.G. et al. (2014) Genome-scale CRISPR-Cas9 knockout screening in human cells. Science, 343, 84–87. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjExOiIzNDMvNjE2Ni84NCI7czo0OiJhdG9tIjtzOjM3OiIvYmlvcnhpdi9lYXJseS8yMDE2LzA1LzIyLzA1Mjc5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 5. 5.Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K.R., MacLeod, G., Mis, M., Zimmermann, M., Fradet-Turcotte, A., Sun, S. et al. (2015) High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell, 163, 1515–1526. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2015.11.015&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=26627737&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 6. 6.Korkmaz, G., Lopes, R., Ugalde, A.P., Nevedomskaya, E., Han, R., Myacheva, K., Zwart, W., Elkon, R. and Agami, R. (2016) Functional genetic screens for enhancer elements in the human genome using CRISPR-Cas9. Nature biotechnology, 34, 192–198. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nbt.3450&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=26751173&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 7. 7.Ho, T.T., Zhou, N., Huang, J., Koirala, P., Xu, M., Fung, R., Wu, F. and Mo, Y.Y. (2015) Targeting non-coding RNAs with the CRISPR/Cas9 system in human cell lines. Nucleic acids research, 43, el7. 8. 8.Mojica, F.J., Diez-Villasenor, C., Garcia-Martinez, J. and Almendros, C. (2009) Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology, 155, 733–740. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1099/mic.0.023960-0&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=19246744&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000264515100009&link_type=ISI) 9. 9.Moreno-Mateos, M.A., Vejnar, C.E., Beaudoin, J.D., Fernandez, J.P., Mis, E.K., Khokha, M.K. and Giraldez, A.J. (2015) CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nature methods, 12, 982–988. 10. 10.Liu, H., Wei, Z., Dominguez, A., Li, Y., Wang, X. and Qi, L.S. (2015) CRISPR-ERA: a comprehensive design tool for CRISPR-mediated gene editing, repression and activation. Bioinformatics, 31, 3676–3678. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btv423&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=26209430&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 11. 11.Naito, Y., Hino, K., Bono, H. and Ui-Tei, K. (2015) CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites. Bioinformatics, 31, 1120–1123. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btu743&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=25414360&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 12. 12.Heigwer, F., Kerr, G. and Boutros, M. (2014) E-CRISP: fast CRISPR target site identification. Nature methods, 11,122–123. 13. 13.Doench, J.G., Hartenian, E., Graham, D.B., Tothova, Z., Hegde, M., Smith, I., Sullender, M., Ebert, B.L., Xavier, R.J. and Root, D.E. (2014) Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nature biotechnology, 32, 1262–1267. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nbt.3026&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=25184501&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 14. 14.Hsu, P.D., Scott, D.A., Weinstein, J.A., Ran, F.A., Konermann, S., Agarwala, V., Li, Y., Fine, E.J., Wu, X., Shalem, O. et al. (2013) DNA targeting specificity of RNA-guided Cas9 nucleases. Nature biotechnology, 31, 827–832. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nbt.2647&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23873081&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 15. 15.Zhu, L.J., Holmes, B.R., Aronin, N. and Brodsky, M.H. (2014) CRISPRseek: a bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genome-editing systems. PloS one, 9, el08424. 16. 16.Hodgkins, A., Fame, A., Perera, S., Grego, T., Parry-Smith, D.J., Skarnes, W.C. and Iyer, V. (2015) WGE: a CRISPR database for genome engineering. Bioinformatics, 31, 3078–3080. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btv308&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=25979474&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 17. 17.Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B.L., Barrell, D., Zadissa, A., Searle, S. et al. (2012) GENCODE: the reference human genome annotation for The ENCODE Project. Genome research, 22,1760–1774. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjk6IjIyLzkvMTc2MCI7czo0OiJhdG9tIjtzOjM3OiIvYmlvcnhpdi9lYXJseS8yMDE2LzA1LzIyLzA1Mjc5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 18. 18.(2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature11247&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22955616&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000308347000039&link_type=ISI) 19. 19.Pennacchio, L.A., Ahituv, N., Moses, A.M., Prabhakar, S., Nobrega, M.A., Shoukry, M., Minovitsky, S., Dubchak, I., Holt, A., Lewis, K.D. et al. (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature, 444, 499–502. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature05295&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=17086198&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000242215700049&link_type=ISI) 20. 20.Kozomara, A. and Griffiths-Jones, S. (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic acids research, 39, D152–157. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/nar/gkq1027&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=21037258&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000285831700027&link_type=ISI) 21. 21.Dimitrieva, S. and Bucher, P. (2013) UCNEbase-a database of ultraconserved non-coding elements and genomic regulatory blocks. Nucleic acids research, 41, D101–109. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/nar/gks1092&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23193254&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000312893300015&link_type=ISI) 22. 22.Neph, S., Vierstra, J., Stergachis, A.B., Reynolds, A.P., Haugen, E., Vernot, B., Thurman, R.E., John, S., Sandstrom, R., Johnson, A.K. et al. (2012) An expansive human regulatory lexicon encoded in transcription factor footprints. Nature, 489, 83–90. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature11212&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22955618&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000308347000041&link_type=ISI) 23. 23.Tarailo-Graovac, M. and Chen, N. (2009) Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis … [et al.], Chapter 4, Unit 4 10. 24. 24.Schmittgen, T.D. and Livak, K.J. (2008) Analyzing real-time PCR data by the comparative C(T) method. Nature protocols, 3,1101–1108. 25. 25.Bell, C.C., Magor, G.W., Gillinder, K.R. and Perkins, A.C. (2014) A high-throughput screening strategy for detecting CRISPR-Cas9 induced mutations using next-generation sequencing. BMC genomics, 15,1002. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/1471-2164-15-1002&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=25409780&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 26. 26.Canver, M.C., Bauer, D.E., Dass, A., Yien, Y.Y., Chung, J., Masuda, T., Maeda, T., Paw, B.H. and Orkin, S.H. (2014) Characterization of genomic deletion efficiency mediated by clustered regularly interspaced palindromic repeats (CRISPR)/Cas9 nuclease system in mammalian cells. The Journal of biological chemistry, 289, 21312–21324. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamJjIjtzOjU6InJlc2lkIjtzOjEyOiIyODkvMzEvMjEzMTIiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxNi8wNS8yMi8wNTI3OTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 27. 27.Gibson, D.G., Young, L., Chuang, R.Y., Venter, J.C., Hutchison, C.A., 3rd and Smith, H.O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature methods, 6, 343345. 28. 28.Gutschner, T., Hammerle, M., Eissmann, M., Hsu, J., Kim, Y., Hung, G., Revenko, A., Arun, G., Stentrup, M., Gross, M. et al. (2013) The noncoding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells. Cancer research, 73, 1180–1189. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiY2FucmVzIjtzOjU6InJlc2lkIjtzOjk6IjczLzMvMTE4MCI7czo0OiJhdG9tIjtzOjM3OiIvYmlvcnhpdi9lYXJseS8yMDE2LzA1LzIyLzA1Mjc5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 29. 29.Sims, D., Mendes-Pereira, A.M., Frankum, J., Burgess, D., Cerone, M.A., Lombardelli, C., Mitsopoulos, C., Hakas, J., Murugaesu, N., Isacke, C.M. et al. (2011) High-throughput RNA interference screening using pooled shRNA libraries and next generation sequencing. Genome biology, 12, R104. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/gb-2011-12-10-r104&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22018332&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2016%2F05%2F22%2F052795.atom) 30. 30.Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. and Siepel, A. (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome research, 20, 110–121. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjg6IjIwLzEvMTEwIjtzOjQ6ImF0b20iO3M6Mzc6Ii9iaW9yeGl2L2Vhcmx5LzIwMTYvMDUvMjIvMDUyNzk1LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==)