SUMMARY
Genetic screens are powerful tools for the functional annotation of genomes. In the context of multicellular organisms, interrogation of gene function is greatly facilitated by methods that allow spatial and temporal control of gene abrogation. Here, we describe a large-scale transgenic short guide (sg) RNA library for efficient CRISPR-based disruption of specific target genes in a constitutive or conditional manner. The library consists currently of more than 2600 plasmids and 1400 fly lines with a focus on targeting kinases, phosphatases and transcription factors, each expressing two sgRNAs under control of the Gal4/UAS system. We show that conditional CRISPR mutagenesis is robust across many target genes and can be efficiently employed in various somatic tissues, as well as the germline. In order to prevent artefacts commonly associated with excessive amounts of Cas9 protein, we have developed a series of novel UAS-Cas9 transgenes, which allow fine tuning of Cas9 expression to achieve high gene editing activity without detectable toxicity. Functional assays, as well as direct sequencing of genomic sgRNA target sites, indicates that the vast majority of transgenic sgRNA lines mediate efficient gene disruption. Furthermore, we conducted the so far largest fully transgenic CRISPR screen in any metazoan organism, which further supported the high efficiency and accuracy of our library and revealed many so far uncharacterized genes essential for development.
INTRODUCTION
The functional annotation of the genome is a prerequisite to gain a deeper understanding of the molecular and cellular mechanisms that underpin development, homeostasis and disease of multicellular organisms. Drosophila melanogaster is a particularly powerful genetic model system that has provided many fundamental insights into metazoan biology. Early experiments made use of naturally occurring mutations to provide the first evidence that genes are located on, and can be mapped to, chromosomes (Sturtevant, 1913; Bridges, 1914). Subsequent studies have used X-rays or chemicals to mutagenize the genome at high frequency. Combined with phenotyping of large numbers of individuals these can be used for forward genetic screens to systematically catalog the phenotypic effects of genetic perturbations. While such efforts in Drosophila have been extremely successful, for example revealing many of the genes underlying animal development (Nüsslein-Volhard and Wieschaus, 1980), identifying conserved regulators of cell proliferation (Udan et al., 2003; Wu et al., 2003) or providing first evidence for a genetic basis of behaviour (Konopka and Benzer, 1971), they are limited by the large number of individual organisms that are required to probe many or all genetic loci and difficulties in identifying causal genetic variants.
In contrast, reverse genetic approaches, such as RNA interference (RNAi), are gene-centric designed and allow to probe the function of a large number of genes (Boutros and Ahringer, 2008; Mohr et al., 2014; Heigwer et al., 2018; Horn et al., 2011). Furthermore, RNAi reagents can be genetically encoded and used to screen for gene function with spatial and temporal precision (Dietzl et al., 2007; Kaya-Çopur and Schnorrer, 2016; Ni et al., 2009). In recent years tissue specific-RNAi screens have provided multiple new insights into the biology of multicellular animals (reviewed in Heigwer et al., 2018). However, RNAi is often limited by incomplete penetrance due to residual gene expression and can suffer from off-target effects (Perkins et al., 2015; Echeverri et al., 2006; Ma et al., 2006).
While genetic screens such as the ones described above have contributed enormously to our understanding of gene function, large parts of eukaryotic genomes remain not or only poorly characterized (Dickinson et al., 2016; Brown et al., 2009; White et al., 2013). For example, in Drosophila only 20% of genes have associated mutant alleles (Kaufman, 2017). There is therefore a need to develop innovative approaches that complement and extend the currently available tools to gain a more complete understanding of the functions encoded by the various elements of the genome.
Clustered Regularly Interspersed Short Palindromic Repeats (CRISPR) - CRISPR-associated (Cas) systems are adaptive prokaryotic immunsystems that have been adopted for genome engineering applications (Doudna and Charpentier, 2014; Wang et al., 2016). The prototypical Streptococcus pyogenes Cas9 endonuclease can be guided to genomic target sites by a single chimeric guide RNA (sgRNA) to introduce DNA double strand breaks (DSBs) upstream of a protospacer adjacent motif of the sequence NGG. Cells have evolved a plethora of DSB repair pathways, some of which are error prone and induce with high frequency small insertions and deletions (indels) at the break point (Chapman et al., 2012). It is therefore possible to alter the sequence of a given genetic element by expressing Cas9 and a specific sgRNA. A common application of CRISPR mutagenesis is to test the function of protein coding genes. To this end Cas9 is typically targeted to the beginning of the open reading frame (ORF), where out-of frame mutations are more likely to have a functional impact. However, not all Cas9 mediated indel mutations abrogate gene function. To compensate for that, strategies have been developed to introduce several mutations in the same gene in parallel. The efficiency of such multiplexing strategies has been demonstrated in flies, mice, fish and plants, and several sgRNAs are often required to generate bi-allelic loss-of function mutations in all cells (Port and Bullock, 2016; Xie et al., 2015; Yin et al., 2015).
Probing gene function in multicellular organisms can be greatly facilitated by methods which enable spatial or temporal control of gene disruption. This allows to study the function of genes that are essential for early development in later stages of life, characterize in detail genes that play roles in multiple tissues or stages of development or to effectively model diseases that arise through somatic mutations, such as cancer. In order to restrict CRISPR mutagenesis to defined cells or tissues in multicellular animals, Cas9 and sgRNAs have been delivered by targeted injection or packaged into viruses that have a specific tissue tropism (Friedland et al., 2013; Yamaguchi et al., 2018; Bäck et al., 2019; Weber et al., 2015). However, such strategies are only suitable for a relatively small subset of cell types and are not available in many organisms. In contrast, genetically encoded CRISPR components can be spatially restricted by the use of tissue-specific regulatory elements, a strategy that is available in any genetically tractable organism. However, Cas9 expression vectors with tissue-specific enhancers often display ‘leaky’ Cas9 expression in other tissues and poor control of CRISPR mutagenesis has been observed in multiple systems, including flies, mice and patient derived xenografts (Dow et al., 2015; Chen et al., 2017; Hulton et al., 2019; Port and Bullock, 2016).
Here, we describe a large-scale resource for spatially restricted mutagenesis in Drosophila. We show that gene editing with Cas9 and two sgRNAs targeting the same gene under the control of Gal4 is robust across target genes, giving rise to a large fraction of cells containing gene knock-outs and displays tight spatial and temporal control. We developed a series of tunable Cas9 lines that allow gene editing with high efficiency and low toxicity independent of enhancer strength. These can be used with a growing library of sgRNA transgenes, which currently comprise over 1400 Drosophila strains, for systematic mutagenesis in any somatic tissue or the germline. Furthermore, we present the first large-scale transgenic CRISPR screen using this resource, which confirms its high efficiency and specificity and reveals multiple uncharacterized genes with essential, but unknown function.
RESULTS
Robust tissue-specific CRISPR mutagenesis
We set out to develop a large-scale resource that would allow systematic gene disruption with high spatial and temporal control. In Drosophila, tissue-specific expression of transgenes is most commonly performed via the binary Gal4/UAS system (Brand and Perrimon, 1993) and thousands of Gal4 lines with specific temporal and spatial expression patterns are publicly available. To harness this resource for tissue-specific CRISPR mutagenesis UAS-Cas9 transgenes have been developed and combined with sgRNAs expressed from the ubiquitous U6 promoter (Port et al., 2014; Xue et al., 2014). However, this system frequently gives rise to mutations outside of the Gal4 expression domain (Figure 1A, Port and Bullock, 2016). Such ectopic mutagenesis is prevented by also expressing sgRNAs under UAS control as part of RNA polymerase II transcripts and flanked by tRNAs for subsequent excision (Port and Bullock, 2016; Xie et al., 2015, Figure 1A). Our previous proof-of principle study introduced the conditional UAS-sgRNA vector pCFD6, but was restricted to testing its performance with two sgRNAs targeting the Wnt secretion factor Evenness interrupted (Evi, also known as Wntless or Sprinter; (Port and Bullock, 2016; Bartscherer et al., 2006; Bänziger et al., 2006). We therefore asked whether this system is robust across target genes and tissues, a prerequisite to generate large-scale libraries of sgRNA strains targeting many or all Drosophila genes. To this end we created transgenic fly lines harboring a pCFD6 transgene encoding two sgRNAs targeting a single gene at two independent positions in the open reading frame. These were crossed to flies containing a UAS-cas9.P2 transgene and a tissue-specific Gal4 driver. We then analysed if mutations were efficiently induced and restricted to the appropriate cells by either staining for the gene product, observing a visual loss-of function phenotype, or sequencing PCR amplicons spanning the sgRNA target sites.
We first targeted the Drosophila beta-Catenin homolog armadillo (arm). Wing imaginal discs of nub-Gal4 UAS-cas9.P2 pCFD6-arm2x animals lost immunoreactivity with an anti-Arm antibody specifically in the nub-Gal4 expression domain, demonstrating efficient and spatially restricted CRISPR mutagenesis (Figure 1B). Similarly, targeting the transcription factor senseless (sens) in the dorsal compartment of the wing imaginal disc using ap-Gal4, or the transmembrane protein smoothened in the posterior compartment with hh-Gal4, resulted in a loss of Sens or Smo staining in most, but not all cells (Suppl. Fig. 1). To test tissue-specific CRISPR mutagenesis in a different tissue context, we set out to mutagenize Notch (N) in the Drosophila midgut, which is derived from the endoderm. We observed a strong increase in stem cell proliferation and an accumulation of cells with small nuclei, when we induced expression of UAS-cas9.P2 and pCFD6-N2x specifically in intestinal stem cells of adult flies (Fig. 1C and Suppl. Fig. 2). This matches the described phenotype of N mutant clones in the midgut (Ohlstein and Spradling, 2006). Interestingly, we observed a qualitative difference between perturbation of N expression by RNAi, which only induces hyperplasia in female flies (Suppl. Fig. 2, (Hudry et al., 2016; Siudeja et al., 2015), and N mutagenesis by CRISPR, which induces strong overgrowth in male and female midguts (Suppl. Fig. 2). Next, we generated a pCFD6-sgRNA2x line targeting neuralized (neur), a ubiquitin ligase involved in Notch signaling, and combined it with pnr-Gal4 UAS-cas9.P2. We observed a loss of sensory bristles on the thorax exclusively in a broad stripe centred around the dorsal midline, which is where pnr-Gal4 is expressed (Figure 1D). Similar results were obtained with a pCFD6 transgene encoding sgRNAs targeting the pigmentation gene yellow (y) (Suppl. Fig. 1). We also tested tissue-specific CRISPR mutagenesis in the developing Drosophila eye, using GMR-Gal4 to drive expression of UAS-cas9.P2 and a pCFD6 plasmid encoding two sgRNAs targeting the eye pigmentation gene sepia (se). GMR-Gal4 UAS-cas9.P2 pCFD6-se2x animals had eyes of sepia colouration, in contrast to control animals (Figure 1E). Together these experiments indicate that using the Gal4/UAS system to drive expression of Cas9 and sgRNAs results in highly efficient and spatially restricted mutagenesis at various target loci and in different somatic tissues.
Next, we tested whether pCFD6-sgRNA2x also mediates efficient mutagenesis in the germline, where some UAS vectors are silenced (DeLuca and Spradling, 2018; Huang et al., 2018). This is a particularly important application, as it allows to create stable and sequence verified mutant fly lines, which can be backcrossed to remove potential off-target mutations. We crossed previously described nos-Gal4VP16 UAS-Cas9.P1 flies (Port et al., 2014) to sgRNA strains targeting either neur, N, cut (ct), decapentaplegic (dpp) or Ras85D. Despite the fact that all five genes are essential for Drosophila development and act in multiple tissues, nos-Gal4 UAS-Cas9.P1 pCFD6-sgRNA2x flies were viable and morphologically normal, demonstrating tightly restricted mutagenesis. We then tested their offspring for CRISPR induced mutations at the sgRNA target sites. All crosses segregated mutations at the target locus, with pCFD6-sgRNA2x targeting neur, N, ct and Ras85D passing on mutations to most or all analysed offspring (Figure 1F). Mutations were often found on both target sites, were frequently out-of-frame and included large deletions of 8 and 14kb between the sgRNA target sites. Flies carrying the pCFD6-dpp2x transgene had low fertility and transmitted only a single in-frame allele to 1/11 of analysed offspring, consistent with an important role of Dpp in the male germline (Kawase et al., 2004). These experiments demonstrate that sgRNA expression from pCFD6 mediates efficient and tightly restricted mutagenesis also in the germline to allow for germline transmission of loss-of function mutations in essential genes.
Together these data suggests that tissue-specific CRISPR mutagenesis in Drosophila is robust across genes and tissues.
A series of tunable Cas9 transgenes to optimise activity and toxicity
The experiments described above were largely performed using UAS-cas9.P2 (Port and Bullock, 2016), which compared to UAS-cas9.P1 (Port et al., 2014), was designed to mediate lower expression levels to prevent toxicity previously observed when expressing Cas9 from the Gal4/UAS system (Port et al., 2014; Poe et al., 2019; Huynh et al., 2018). However, in combination with several Gal4 driver lines UAS-cas9.P2 still resulted in abnormal phenotypes or lethality. This effect was dosage dependent, as flies with one copy of UAS-cas9.P2 in combination with hh-Gal4 or nub-Gal4 did not display morphological abnormalities, but had wing defects or were non-viable when UAS-cas9.P2 was present in two copies. In contrast, transgenes such as act-cas9, where the Cas9 open reading frame is directly downstream of an endogenous promoter, express Cas9 protein at much lower levels than typically observed with the Gal4/UAS system and can be kept as healthy homozygous stocks (Suppl. Fig. 3).
To directly analyse if morphological defects and lethality result from excessive amounts of cell death, we expressed a fluorescent apoptosis sensor to visualize dying cells in the wing disc epithelium (Schott et al., 2017). Only a small number of dying cells was observed when the apoptosis sensor alone was expressed with nub-Gal4 (Suppl. Fig. 2B). In contrast, we observed a dramatic increase in the number of apoptotic cells when we co-expressed UAS-cas9.P2, highlighting the extend of toxicity arising from high Cas9 expression levels (Suppl. Fig. 2B).
We therefore sought to engineer UAS-cas9 vectors that would express lower levels of Cas9 when induced by Gal4. Since the expression level of UAS constructs strongly depends on the strength of the Gal4 driver, an ideal system would allow to fine tune Cas9 levels depending on the Gal4 driver. We turned our attention to a method that uses upstream open reading frames (uORF) to predictably reduce the levels of translation of the main, downstream ORF, which has been used in Drosophila to reduce the expression of a toxic DNA methylase (Southall et al., 2013). In this system the length of the uORF is inversely correlated with the translation level of the downstream main ORF (Kozak, 2001; Luukkonen et al., 1995; Ferreira et al., 2013), Figure 2A). We created a series of six UAS-cas9 plasmids containing uORFs of different length, ranging from 33bp (referred to as UAS-uXSCas9), encoding the N-terminal 11 amino acids of EGFP, to an uORF encoding full-length EGFP (UAS-uXXLCas9, Figure 2A). These plasmids were integrated at the same genomic attP landing site, to exclude positional effects, and recombined with the nub-Gal4 driver. We then stained wing discs for Cas9 protein, imaged them under identical conditions and quantified the fluorescent intensity of the antibody staining (Figure 2B,C). nub-Gal4 UAS-uXSCas9 discs expressed high levels of Cas9 in the wing pouch (Figure 2B,C). Cas9 expression from plasmids that contained uORFs of increasing length was gradually reduced and was undetectable in nub-Gal4 UAS-uXLCas9 and nub-Gal4 UAS-uXXLCas9 wing discs (Figure 2B, C). We then crossed animals expressing Cas9 at different levels to the GC3Ai apoptosis sensor to test if lower expression levels would reduce Cas9 mediated toxicity. While nub-Gal4 UAS-uXS-cas9 UAS-GC3Ai wing discs showed a strong increase in the number of apoptotic cells compared to nub-Gal4 UAS-GC3Ai discs, all other genotypes had a similar number of apoptotic cells than control animals (Figure 2B,D). Furthermore, nub-Gal4 UAS-uXSCas9 animals were homozygous lethal at 25°C, while all other genotypes, containing longer uORFs, were homozygous viable without obvious phenotypes. Together this demonstrates that varying the length of uORFs is an effective strategy to systematically and predictably lower Cas9 expression levels and prevent toxicity in Drosophila.
Next, we asked which nub-Gal4 UAS-uCas9 variants still mediate efficient gene editing when paired with sgRNAs. To this end we crossed these animals to a previously described pCFD6-evi2x line (Port and Bullock, 2016). We then stained wing discs for endogenous Evi and quantified the amount of Evi staining in the nub-Gal4 expression domain. All genotypes lead to a strong reduction of Evi protein in the nub-Gal4 domain compared to control animals, suggesting that they mediate efficient gene editing (Figure 2B,E). Surprisingly, even animals that expressed Cas9 from UAS-uXLCas9 or UAS-uXXLCas9 transgenes had multiple patches of evi mutant cells in all wing discs analysed (Figure 2B,E), despite the fact that in these genotypes Cas9 protein is undetectable by immunohistochemistry (Figure 2B,C). This highlights that mutagenesis can be induced by even minute amounts of Cas9 protein. All genotypes containing uORFs between 33bp and 159bp in length had comparable levels of evi mutagenesis, which typically lead to a complete loss of Evi staining in the nub-Gal4 domain (Figure 2B,E).
Together, these experiments show that it is possible to substantially reduce Cas9 expression levels without compromising gene editing efficiency. Importantly, reducing the amount of Cas9 is sufficient to reduce the number of apoptotic cells to background levels and avoid organismal toxicity.
A toolbox for tissue-specific Cas9 expression
We focused our further efforts mainly on UAS-uMCas9, as our analysis suggests that in combination with Gal4 drivers that are either weaker or stronger than nub-Gal4 it would retain high activity and low toxicity. We created additional insertions of this plasmid at different attP landing sites to facilitate generation of Gal4 UAS-uMCas9 stocks (Figure 3A). We then started to build a collection of such fly lines, each having a Gal4 driver recombined on the same chromosome as the Cas9 transgene (Figure 3B). Such stocks can be crossed to transgenic sgRNA lines to induce conditional CRISPR mutagenesis in Gal4 expressing cells. We tested the spatial mutagenesis pattern for a number of novel Gal4 UAS-uMCas9 lines in the wing imaginal disc of third instar larva (Figure 3C,D and Suppl. Fig. 4). To this end we crossed Gal4 UAS-uMCas9 flies to transgenic pCFD6-evi2x animals and visualized mutagenesis indirectly by examining Evi protein expression by immunohistochemistry. We also performed immunostaining for Cas9 protein, to visualize the Gal4 expression pattern at this stage. With some Gal4 drivers, such as ap-Gal4 (Fig. 3C) or cut-Gal4 (Suppl. Fig. 4), loss of Evi was only observed in cells that also express Cas9 in third instar wing discs. However, with several other Gal4 lines mutagenesis was also observed in cells that had no detectable Cas9 expression at this stage. For example, ptc-Gal4 UAS-uMCas9 expresses Cas9 in a narrow band of cells along the anterior-posterior boundary in third instar wing discs, but mutagenesis induced by this line is observed in most of the anterior compartment (Figure 3D). Mutagenesis in cells that have no detectable Cas9 in third instar discs was also observed with dpp-Gal4 and ser-Gal4 (Suppl. Fig. 4). Importantly, spatial mutagenesis patterns with these Gal4 lines were highly stereotyped, suggesting that they do not arise by leaky expression of CRISPR components (Suppl. Fig. 4). Instead, they likely reflect either specific low level expression of Cas9 below the detection limit of immunohistochemistry or a broader expression of Gal4 at earlier developmental stages. This is expected to lead to broad mutagenesis patterns, as CRISPR mutagenesis can be readily observed at very low levels of Cas9 (see Figure 2) and causes permanent genetic alterations. The UAS-uCas9 series will be an ideal tool to modulate mutagenesis patterns in combination with Gal4 drivers that have graded expression levels. To harness Gal4 lines that show dynamic expression patterns during development for more tightly restricted mutagenesis will require additional regulatory mechanisms to temporally control Cas9 expression. This can be achieved through inhibition of Gal4 by a temperature-sensitive Gal80 repressor, as demonstrated by restriction of Notch mutagenesis to adult intestinal stem cells (Figure 1C, Suppl. Fig. 2). To allow temporal control at almost constant temperature we created a transgene that harbors a FRT-flanked GFP-Stop cassette between the UAS promoter and the uMCas9 expression cassette (UAS-FRT-GFP-FRT-uMCas9, Figure 3E). A brief pulse of Flp recombinase (from a hs-Flp transgene) can be used to excise the GFP-stop cassette at the desired time and induce Cas9 expression. Using this approach Cas9 can be induced in all Gal4 expressing cells or only in a random subset, the later approach resulting in fluorescently marked mosaics. Such mosaics can be a powerful method to analyze mutant and wildtype cells next to each other in the same tissue, as demonstrated by the mutagenesis of ct in the pouch of the wing imaginal disc (Figure 3E).
A large-scale transgenic sgRNA library
Next, we focused our efforts on the generation of a large-scale sgRNA resource to enable reverse-genetic CRISPR screens in defined cell types and developmental stages in Drosophila. Such screens require the use of negative controls, which should account for phenotypes arising from Cas9 expression and the induction of DNA damage. For this purpose we generated and validated three sgRNA lines targeting genes with highly restricted expression patterns (Figure 4A, (Graveley et al., 2011), which can be used as negative controls in the majority of tissues where their target gene is not expressed. To allow systematic screening of functional gene groups we then designed sgRNAs against all Drosophila genes encoding transcription factors, kinases and phosphatases, as well as a number of other genes encoding fly orthologs of genes implicated in human pathologies (Figure 4B, see methods). We used CRISPR library designer (Heigwer et al., 2016) to compile a list of all possible sgRNAs without predicted off-target sites. We then selected sgRNAs depending on the position of their target site within the target gene. We choose sgRNAs targeting coding exons shared by all mRNA isoforms and located in the 5’ half of the open reading frame, where indel mutations are expected to have the largest functional impact. We then grouped sgRNAs in pairs, with each pair targeting sites typically separated by approximately 500 bp of coding sequence (Figure 4B). Next, we devised an efficient cloning protocol to insert defined sgRNA pairs into pCFD6. This utilized synthesized oligonucleotide pools, which allow cloning of hundreds to thousands of sgRNA plasmids in parallel in a single tube, followed by clonal selection of individual pCFD6-sgRNA2x plasmids and sequence validation (Figure 4C, see methods). This growing resource currently contains 2610 plasmids. pCFD6-sgRNA plasmids were transformed into Drosophila using a pooled microinjection protocol (Bischof et al., 2013) and inserted at attP40 on the second chromosome (Figure 4D). Transgenic flies were genotyped to establish which sgRNA plasmid they carry and stable transgenic stocks were generated, which collectively we refer to as the ‘Heidelberg CRISPR Fly Design Library’ (short HD_CFD library). Currently, the library contains 1428 fly stocks targeting 1264 unique genes (Suppl. Table 1). These include 490/754 (65%) transcription factors, 181/230 (79%) protein kinases and 128/207 (62%) phosphatases (Figure 4E).
HD CFD sgRNA lines mediate efficient mutagenesis and allow robust CRISPR screening
To test the on-target activity of HD_CFD sgRNA strains, we crossed a random selection of 28 HD_CFD lines to an act-cas9;;tub-Gal4/TM3 strain, that is expected to mediate ubiquitous mutagenesis in combination with active sgRNAs. We then extracted genomic DNA from flies expressing Cas9 and sgRNAs, amplified the genomic locus spanning the sgRNA target sites by PCR and sequenced the resulting amplicons. We performed this analysis by Sanger sequencing followed by Inference of CRISPR edits (ICE) analysis (Hsiau et al., 2019), as this frequently allowed to analyse both sgRNA traget sites on the same PCR amplicon, which is necessary to account for deletions between both sites. We found that the vast majority (26/28) of HD_CFD sgRNA lines resulted in gene editing on both target sites (Figure 5A). For 12/28 of lines editing on both sites was inferred to be at least 50% and 23/28 reached this threshold on at least one target site. In contrast, only a single line (HD_CFD00032) resulted in no detectable gene editing at either sgRNA target site. This suggests that HD_CFD sgRNA lines mediate robust and efficient mutagenesis of target genes across the genome.
Next, we performed a large-scale transgenic CRISPR screen. We crossed HD_CFD animals to act-cas9;;tub-Gal4/TM3 with the aim to induce mutations ubiquitously in the offspring and determined viability at five to seven days after eclosion. 290/639 (45%) of all crosses did not yield any viable offspring, while 269 (42%) lines produced viable adults and 53 (8%) of lines resulted in lethality with incomplete penetrance (Figure 5B and Suppl. Table 2). In order to benchmark the performance of the screen, we manually annotated viability information based on genetic alleles stored in the Flybase database to determine which HD_CFD lines target genes known to be essential or non-essential during Drosophila development (see methods). This resulted in a list of 210 lines which target known essential genes and would be expected to result in lethality in combination with act-cas9;;tub-Gal4/TM3. Of those, 167 (79%) resulted in lethality, 20 (10%) were scored as semi-lethal, and 23 (11%) gave rise to viable adult offspring. Interestingly, there was a strong enrichment of genes known to play important roles and to be highly expressed during early embryonic development among the targets of sgRNA lines that produced false-negative results. Furthermore, sequencing the sgRNA target sites of some randomly selected false-negative lines revealed efficient gene editing on one or both sites in 3/3 lines (Suppl. Fig. 5), suggesting that false-negative results do not necessarily arise through inactive sgRNAs. Next, we analysed our data set for the occurence of false-positive, i.e. lines that target non-essential genes, but result in lethality. Among the 639 lines present in our screen, 54 target genes annotated as viable. Of those 48 (89%) gave rise to viable adult offspring, one resulted in semi-lethal offspring and 5 (9%) produced no viable offspring. False-positive results might arise due to off-target mutagenesis, mutations that affect neighboring genes or cis-elements located at the target-locus, or reflect incorrect annotations in the database. Of the five lines giving rise to false-positive results in our screen two target the same gene (Blos1), arguing against sgRNA-mediated off-target mutagenesis in this case. For the other three lines no second line with non-overlapping sgRNAs is currently available.
Screening for a well annotated phenotype such as lethality allowed us to benchmark our novel screening technology and revealed multiple lines targeting uncharacterized genes with putative essential functions (Suppl. Table 3). For example, sgRNA line HD_CFD558 targets CG9890, an evolutionary conserved (55% amino acid similarity to the human ortholog) zinc finger protein of unknown function. Another interesting example is CG6470, which is targeted by HD_CFD557 and HD_CFD599 with independent sgRNAs. CG6470 encodes an uncharacterized zinc finger protein that despite its essential role is evolutionary restricted to the genus Drosophila. These examples highlight the value of our lethality screen beyond benchmarking of our technology. However, lethality is a relatively undiscriminating phenotype, as genes performing unrelated functions often die with similar characteristics. Tissue-specific screens typically give rise to a richer set of phenotypic features and genes performing similar cellular functions often give rise to phenotypes with high similarity. To test this assumption we crossed several lines targeting genes associated with dpp/TGFb signaling with nub-Gal4 UAS-uMCas9 flies, which drive CRISPR mutagenesis in selected tissues, including cells giving rise to the adult wing. All these lines result in lethality in combination with a ubiquitous CRISPR system (Suppl. Table 2), but gave rise to viable adults in combination with nub-Gal4 UAS-uMCas9, highlighting the tight control of mutagenesis. Moreover, all lines resulted in offspring that had wings of abnormal size and morphology and faithfully recapitulated the known phenotypes of loss-of function mutations of their target genes (Figure 5E). Together these results show that lines of the HD_CFD library can be used for systematic CRISPR screens in vivo and mediate relevant phenotypes with very high penetrance and specificity.
DISCUSSION
Here, we present a large-scale collection of transgenic sgRNA strains for conditional CRISPR mutagenesis in Drosophila. In combination with the associated toolbox of novel Cas9 constructs, the sgRNA lines mediate efficient mutagenesis with precise temporal and spatial control. This allows the rapid targeted disruption of genes in various contexts in the intact organism. Current tools used for this purpose are limited by incomplete abrogation of gene function, random nature of mutagenesis or lack of spatial and temporal control. The high performance of this resource relies on a) use of conditional sgRNA constructs to achieve a strict dependency of CRISPR mutagenesis on Gal4, b) tunable Cas9 expression to achieve high on-target activity with low toxicity, c) the use of two sgRNAs targeting independent positions in the same gene to increase the fraction of cells that harbor non-functional mutations in both alleles. We validate our library by conducting a fully transgenic CRISPR mutagenesis screen, to our knowledge the largest in any multicellular animal, which revealed 259 putative essential genes, of which 56 are poorly characterized.
Direct detection of induced mutations by sequencing and observation of the phenotypic outcome of mutagenesis in a large-scale screen both support the notion that the large majority of HD_CFD lines efficiently produce mutations at the target locus and induce specific phenotypes. Both assays suggest that the fraction of lines that produce no or only few on-target mutations is less than 10%, which compares favorably to current RNAi libraries available for conditional gene knock-down in Drosophila. However, the presence of mutations alone does not necessarily lead to the functional impairment of the target gene. CRISPR mutagenesis typically gives rise to small indels, which often do not impair the function of the encoded protein if they occur in-frame. While on average in-frame mutations occur with a frequency of only 33%, sequence microhomologies around the site of the induced DSB can lead to an overrepresentation of particular mutations (Allen et al., 2018; Shen et al., 2018). Furthermore, also out-of frame mutations have in some cases been shown to be functionally ameliorated, for example by induced alternative splicing (Tuladhar et al., 2019) or genetic compensation (El-Brolosy et al., 2019; Ma et al., 2019). Lastly, cells with induced bi-allelic knock-out alleles might be removed from tissues that also contain cells with wildtype alleles or functional mutations by cell competition (Clavería and Torres, 2016). While all of these mechanisms are likely to exist in Drosophila and might limit CRISPR induced phenotypes in some cases, the high penetrance and frequency of relevant phenotypes observed in the large-scale screen reported here, suggests that they are no major source of false-negative results in this context. This is likely at least in part due to the fact that our system uses two sgRNAs per target gene.
A striking feature of the transgenic CRISPR mutagenesis system is its high activity, with most sgRNA lines efficiently mediating mutagenesis and only minute amounts of Cas9 protein, below the detection limit of immunohistochemistry, required to induce DSBs. The small amount of Cas9 that is needed for efficient gene editing allows expression of Cas9 at low levels, which avoids toxicity without sacrificing on-target activity. Toxicity mediated by high levels of Cas9 have been previously described by us and others in Drosophila and have also been observed in various other species (Port et al., 2014; Yang et al., 2018; Jiang et al., 2014; Poe et al., 2019). It has recently been reported that Cas9 transgenic lines directly fused to tissue-specific enhancers can mediate conditional mutagenesis with reduced toxicity (Poe et al., 2019). However, the UAS-uCas9 system reported here has several advantages, such as the ability to harness the thousands of publicly available tissue-specific Gal4 lines, the possibility to modulate Cas9 expression levels independent of enhancer strength, or the availability of the temperature-sensitive Gal4 repressor Gal80 for additional temporal control of gene editing.
Our screen also suggests that CRISPR mutagenesis is relatively specific and causes only a small number of false-positive results. This is consistent with experiments in other in vivo models showing that Cas9 mediated cleavage often occurs with high precision in primary cells (Zuo et al., 2019). However, mutagenesis at spurious sites is likely to occur at some frequency and other unintended consequences of Cas9 mediated DSBs, such as very large deletions and chromosomal rearrangements have been documented in other systems (Kosicki et al., 2018). It is therefore important to control for potential artefacts, which in a screening setting can be best achieved by the use of a second sgRNA line targeting the same gene with independent sgRNAs. The HD_CFD library contains two independent sgRNA lines for some genes and other groups are in the process of creating similar and complementary resources (https://fgr.hms.harvard.edu/fly-in-vivo-crispr-cas; Meltzer et al., 2019). Eventually candidates from CRISPR screens will need to be followed up by creating stable mutations by germline editing, which can be sequence verified and backcrossed to remove unlinked secondary mutations and tested for complementation with a transgenic construct. Induction of mutations in the germline is highly efficient with our sgRNA strains and UAS-uCas9 lines with low Cas9 expression will be valuable tools to allow the creation of heratible alleles even in genes that play essential roles in the germline itself.
While CRISPR and RNAi screens share a number of features, there also exist fundamental differences, which suggest that these are complementary tools for genome annotation. Most importantly, RNAi is limited by incomplete mRNA knock-down, and it has been shown that the majority of lines of a large-scale Drosophila RNAi library retain 25% or more of mRNA levels (Perkins et al., 2015). It is expected that this will frequently mask phenotypes, which non-functional mutations induced by CRISPR could reveal. However, CRISPR mutagenesis can be limited by other mechanisms, for example the induction of genetic mosaics and perdurance of pre-existing mRNAs. This latter point is illustrated by the fact that sgRNA lines giving false-negative results in our ubiquitous lethality screen are enriched for target genes that have high amounts of maternally contributed mRNA or have high expression levels during early embryogenesis. In the future, in circumstances where perdurance is limiting, CRISPR mutagenesis and RNAi might be used in combination to achieve yet more penetrant phenotypes.
Two decades after the publication of the genome sequence of humans, mouse, flies, worms and many other organisms, the functional annotation of these genomes are still far from complete. CRISPR-Cas genome editing is accelerating the rate at which new gene functions are described. The resources described here will facilitate context-dependent functional genomics in Drosophila. New insights into the function of the fly genome will inform the functional annotation of the human genome, reveal conserved principles of metazoan biology and suggest control strategies for insect disease vectors.
MATERIAL AND METHODS
Plasmid construction
PCRs were performed with the Q5 Hot-start 2x master mix (New England Biolabs (NEB) and cloning was performed using the In-Fusion HD cloning kit (Takara Bio) or restriction/ligation dependent cloning. Newly introduced sequences were verified by Sanger sequencing. Oligonucleotide sequences are listed in Suppl. Table 4.
UAS-uCas9 plasmids
The UAS-uCas9 series of plasmids was generated using the pUASg.attB plasmid backbone (Bischof et al., 2013). The plasmid was linearized with EcoRI and XhoI and sequences coding for mEGFP(A206K) and hCas9-SV403’UTR were introduced by In-Fusion cloning using standard procedures. Coding sequences for mEGFP(A206K) were ordered as a gBlock from Integrated DNA Technologies (IDT) and amplified with primers mEGFPfwd and mEGFPrev (Suppl. Table 4). The sequence coding for SpCas9 and a SV40 3’UTR were PCR amplified from plasmid pAct-Cas9 (Port et al., 2014) with primers Cas9SV40fwd and Cas9SV40rev. Both PCR amplicons and the linearized plasmid backbone were assembled in a single reaction to generate plasmid UAS-uXXLCas9. UAS-uCas9 plasmids with shorter uORFs were generated by PCR amplification using UAS-uXXLCas9 as template and the common fwd primer uCas9fwd in combination with rev primers binding at various positions in the mEGFP ORF (uXSCas9rev for UAS-uXSCas9; uSCas9rev for UAS-uSCas9; uMCas9rev for UAS-uMCas9; uLCas9rev for UAS-uLCas9; uXLCas9rev for UAS-uXLCas9). PCR products were cirularized by In-Fusion cloning and the sequence between the hsp70 promoter and the attP site was verified by Sanger sequencing. The UAS-uCas9 plasmid series and the full sequence of each plasmid will become available from Addgene (Addgene plasmids 127382-127387).
UAS-FRT-GFP-FRT-uMCas9
To generate UAS-FRT-GFP-FRT-uMCas9 plasmid UAS-Cas9.P2 (Port and Bullock, 2016) was digested with EcoRI and the plasmid backbone was gel purified. The FRT-GFP-FRT cassette was ordered as two separate gBlocks from IDT (GFPflipout5 and GFPflipout3) and individually PCR amplified with primers GFPflipout5fwd and GFPflipout5rev or GFPflipout3fwd and GFPflipout3rev and gel purified. The two amplicons were mixed at equalmolar ratios and fused by extension PCR, adding primers GFPflipout5fwd and GFPflipout3rev after 8 PCR cycles for an additional 25 cycles. The final FRT-GFP-FRT cassette was gel purified. The uMCas9EcoRI fragment was PCR amplified from plasmid UAS-uMCas9 with primers uMCas9EcoRIfwd and uMCas9EcoRIrev and gel purified. The plasmid backbone, FRT-GFP-FRT cassette and uMCas9EcoRI fragment were assembled by In-Fusion cloning and sequence from the first FRT site to the end of Cas9 was verified by Sanger sequencing. The UAS-FRT-GFP-FRT-uMCas9 plasmid and the full sequence will become available from Addgene (Addgene plasmid 127388).
sgRNA design
All possible sgRNA sequences targeting all transcription factors, kinases, phosphatases and a number of other - mostly disease relevant - genes in the D. melanogaster genome version BDGP6 were identified using the CRISPR library designer (CLD) software version 1.1.2 (Heigwer et al., 2016). CLD excludes sgRNA sequences that have predicted off-target sites elsewhere in the genome. The resulting pool of sequences was further filtered according to additional criteria. Specifically, sequences with BbsI and BsaI restriction sites were excluded. In addition, sequences containing stretches of 4 or more identical nucleotides were removed from the pool. Two pairs of sgRNAs targeting each gene were then selected using a random sampling approach. For each gene, up to 10,000 pairs of sgRNA sequences were selected at random from the pool of available sequences. Each sequence pair was then evaluated according to a custom scoring function. In order to preferentially select sgRNA pairs that target constitutive exons, the scoring function awarded bonus points for each transcript targeted by either of the sgRNAs. Bonus points were further given to sgRNAs targeting the first half of the gene and small distances to the gene’s transcription start site were awarded additionally. To avoid selecting pairs of overlapping sgRNAs that could potentially interfere with each other’s activity, sgRNA pairs that were less than 75 bp apart from each other were strongly penalized. Further, sgRNAs targeting the gene within 500 bp of each other were penalized. This was done to avoid functional protein products in cases where the second sgRNA might correct an out-of-frame mutation introduced by the first sgRNA. Finally, we penalized sgRNA with predicted off-target effects according to CLD. The two top-scoring pairs for each gene were selected for the HD_CFD library.
sgRNA library cloning
sgRNA pairs were cloned into BbsI digested pCFD6 (Port and Bullock, 2016) following a two-step pooled cloning protocol. Oligonucleotide pools were ordered from Twist Biosciences and Agilent Technologies. Each oligonucleotide contained two sgRNA protospacer sequences targeting the same gene separated by a BsaI restriction cassette. Furthermore, oligos contained sequences at either end for PCR amplification and BbsI sites at the 5’ end of the first and 3’ end of the second protospacer. An annotated example oligo is shown in Suppl. Table 4. Oligo pools were resuspended in sterile dH2O and amplified by PCR with primers Libampfwd and Libamprev, followed by BbsI digestion and gel purification. Digested oligo pools were then ligated into BbsI digested pCFD6 plasmid backbone, transformed into chemically competent bacteria and plated on agarose plates containing Carbenicillin. After incubation overnight at 37°C transformed bacteria were resuspended and plasmid DNA was extracted and digested with BsaI. Next, the sgRNA core sequence and tRNA required between the two protospacers, but not encoded on the oligos, were introduced. These were PCR amplified from pCFD6 using primers Core_tRNAfwd and Core_tRNArev. PCR amplicons were digested with BsaI and ligated into the BsaI digested pCFD6 plasmid pool containing the library oligos, transformed into chemically competent bacteria and plated on agarose plates containing Carbenicillin. The next day single colonies were picked and used to inoculate liquid cultures. The following day plasmid DNA was extracted and the sgRNA cassette was sequenced with primer pCFD6seqfwd2 to determine which oligo was inserted and to verify the sequence. Individual sequence verified pCFD6-sgRNA2x plasmids were stored at −20°C and make up the HD_CFD plasmid library.
Drosophila strains and culture
Transgenic Drosophila strains used or generated in this study are listed in Suppl. Table X. Unless specified otherwise flies were kept at 25°C with 50±10% humidity with a 12h light/12h dark cycle.
Transgenesis
Transgenesis was performed with the PhiC31/attP/attB system and plasmids were inserted at landing site (P{y[+t7.7]CaryP}attP40) on the second chromosome. Additional insertions of UAS-uMCas9 were generated at (M{3xP3-RFP.attP}ZH-51D) on the second chromosome and (M{3xP3-RFP.attP}ZH-86Fb) and (PBac{y+-attP-3B}VK00033) on the third chromosome. Microinjection of plasmids into Drosophila embryos was carried out using standard procedures either in house, or by the Drosophila Facility, Centre for Cellular and Molecular Platforms, Bangalore, India (http://www.ccamp.res.in/drosophila) or by the Fly Facility, Department of Genetics, University of Cambridge, UK (www.flyfacility.gen.cam.ac.uk/). Transgenesis of sgRNA plasmids was typically performed by a pooled injection protocol, as previously described (Bischof et al., 2013). Briefly, individual plasmids were pooled at equimolar ratio and DNA concentration was adjusted to 250 ng/μl in dH2O. Plasmid pools were microinjected into y[1] M{vas-int.Dm}ZH-2A w[*]; (P{y[+t7.7]CaryP}attP40) embryos, raised to adulthood and individual flies crossed to P{ry[+t7.2]=hsFLP}1, y[1] w[1118]; Sp/CyO-GFP. Transgenic offspring was identified by orange eye color and individual flies crossed to P{ry[+t7.2]=hsFLP}1, y[1] w[1118]; Sp/CyO-GFP balancer flies. In the very rare case that a plasmid stably inserted at a genomic locus different than the intended attP40 landing site, this typically resulted in a noticeably different eye colouration and such flies were discarded.
Genotyping of sgRNA flies
Transgenic flies from pooled plasmid injections were genotyped to determine which plasmid was stably integrated into their genome. If transgenic flies were male or virgin female, animals were removed from the vials once offspring was apparent and prepared for genotyping. In the case of mated transgenic females genotyping was performed in the next generation after selecting and crossing a single male offspring, to prevent genotyping females fertilised by a male transgenic for a different construct. Single flies were collected in PCR tubes containing 50 μl squishing buffer (10 mM Tris-HCL pH8, 1 mM EDTA, 25 mM NaCl, 200 μg/ml Proteinase K). Flies were disrupted in a Bead Ruptor (Biovendis) for 20 sec at 30 Hz. Samples were then incubated for 30 min at 37°C, followed by heat inactivation for 3 min at 95°C. 3 μl of supernatant were used in 30 μl PCR reactions with primers pCFD6seqfwd2 and pCFD6seqrev2. PCR amplicons were analysed by Sanger sequencing with primer pCFD6seqrev2.
Selection of lethal and viable target genes
Genes considered ‘known lethal’ or ‘known viable’ were chosen based on information available in FlyBase (release FB2018_1). For each gene report we manually reviewed the lethality information available in the phenotype category. We did not consider information based on RNAi experiments, as these typically were performed with tissue-restricted Gal4 drivers and residual expression might mask gene essentiality. Annotations of viability in FlyBase is heavily skewed towards lethal genes, likely reflecting the uncertainty in many cases whether a viable phenotype reflects residual gene activity of a particular allele.
Immunohistochemistry
Immunohistochemistry of wing imaginal discs was performed using standard procedures. Briefly, larva were dissected in ice cold PBS and fixed in 4% Paraformaldehyde in phosphate buffered saline (PBS) containing 0.05% Triton-X100 for 25 min at room temperature. Larva were washed three times in PBS containing 0.3% Triton-X100 (PBT) and then blocked for 1h at room temperature in PBT containing 1% heat-inactivated normal goat serum. Subsequently, larva were incubated with first antibody (mouse anti-Cas9 (Cell Signaling) 1:800; mouse anti-Cut (DSHB, Gary Rubin) 1:30; guinea pig anti-Sens (Boutros lab, unpublished) 1:300; rabbit anti-Evi (Port et al., 2008) 1:800) in PBT overnight at 4°C. The next day, samples were washed three times in PBT for 15 min and incubated for 2 h at room temperature with secondary antibody (antibodies coupled to Alexa fluorophores, Invitrogen) diluted 1:600 in PBT containing Hoechst dye. Samples were washed three times 15 min in PBT and mounted in Vectashield (Vectorlabs).
Image acquisition and processing
Images were acquired with a Zeiss LSM800, Leica SP5 or SP8 or a Nikon A1R confocal microscope in the sequential scanning mode. Samples that were used for comparison of antibody staining intensity were recorded in a single imaging session. Image processing and analysis was performed with FIJI (Schindelin et al., 2012). For the comparative analysis of anti-Cas9, GC3Ai and anti-Evi fluorescent intensities presented in Figure 2 raw image files were used to select the wing pouch area and measure the average fluorescence intensity.
Sequence analysis of CRISPR-Cas9 induced mutations
To determine the mutational status at each sgRNA target site the locus was PCR amplified and PCR amplicons were subjected to sequencing. To extract genomic DNA, flies were treated as described above under ‘Genotyping of sgRNA flies’. Primers to amplify the target locus were designed to hybridize 250-300 bp 5’ or 3’ to the sgRNA target site and are listed in Suppl. Table 4. PCR products were purified using the PCR purification Kit (Qiagen) according to the instructions by the manufacturer and sent for Sanger sequencing. While Sanger sequencing is less accurate and quantitative than deep sequencing of amplicons on, for example, the Illumina platform, it typically allows to cover both sgRNA targets on a single amplicon, which is necessary to account for mutations that result in deletions of the intervening sequence. In cases were this was not possible, for example due to the presence of a large intron between the target sites, each site was analysed on a separate PCR amplicon. To account for deletions in these cases additional PCR reactions containing the distal fwd and rev primers were included. Sequencing chromatograms were visually inspected for sequencing quality and presence of the sgRNA target site and analysed by Inference of CRISPR Edits (ICE) analysis ((Hsiau et al., 2019).
Author Contributions
F.P. conceived and supervised the study, performed and analysed experiments and wrote the paper; C.S., J.F., B.P., K.R., performed experiments; B.R. and F.H. designed the sgRNA library, C.S., M.S., C.B., A.H., K.K., R.M., L.S. L.V., generated the sgRNA library; E.V. generated essential IT infrastructure; M.B. conceived and supervised the study, acquired funding and wrote the paper.
Material Availability
All materials are available upon request. Transgenic fly lines will be distributed through a public stock center. Cas9 plasmids will be made available through Addgene (Addgene plasmids 127382-127388).
Acknowledgements
We would like to thank Lelia Wagner for technical assistance and David Ish-Horvitz, Tony Southall and Norbert Perrimon for discussions. We would like to thank Sandra Müller (Teleman lab, DKFZ) for microinjections and Kadri Oras and Simon Collier (Fly Facility, University of Cambridge) and Deepti Trivedi Vyas (Drosophila Facility, NCBS, Bangalore) for Drosophila transgenesis. We are grateful for support by the DKFZ Light Microscopy Core Facility, the Zeiss Application Center at DKFZ and the Nikon Imaging Center at Heidelberg University. Work in the lab of M.B. is in part supported by grants from the European Research Council (ERC) and the DFG (SFB/TRR186, SFB1324).