Abstract
Site-specific insertion of DNA into endogenous genes (knock-in) is a powerful method to study gene function. However, traditional methods for knock-in require laborious cloning of long homology arms for homology-directed repair. Here, we report a simplified method in Drosophila melanogaster to insert large DNA elements into any gene using homology-independent repair. This method, known as CRISPaint, employs CRISPR-Cas9 and non-homologous end joining (NHEJ) to linearize and insert donor plasmid DNA into a target genomic cut site. The inclusion of commonly used elements such as GFP on donor plasmids makes them universal, abolishing the need to create gene-specific homology arms and greatly reducing user workload. Using this method, we show robust gene-specific integration of donor plasmids in cultured cells and the fly germ line. Furthermore, we use this method to analyze gene function by fluorescently tagging endogenous proteins, disrupting gene function, and generating reporters of gene expression. Finally, we assemble a collection of donor plasmids for germ line knock-in that contain commonly used insert sequences. This method simplifies the generation of site-specific large DNA insertions in Drosophila cell lines and fly strains, and better enables researchers to dissect gene function in vivo.
Summary We report a new homology-independent genomic knock-in method in Drosophila to insert large DNA elements into any target gene. Using CRISPR-Cas9 and non-homologous end joining (NHEJ), an entire donor plasmid is inserted into the genome without the need for homology arms. This approach eliminates the burden associated with designing and constructing traditional donor plasmids. We demonstrate its usefulness in cultured cells and in vivo to fluorescently tag endogenous proteins, generate reporters of gene expression, and disrupt gene function.
Introduction
Insertion of DNA into the animal genome is a powerful method to study gene function. This approach is multipurpose, and can be used to visualize protein localization (Crivat and Taraska 2012; Hasse et al. 2016; Kanca et al. 2017), to disrupt gene function (Housden et al. 2017), to assay gene expression (Brand and Perrimon 1993; Ivics et al. 2009; Bouabe and Okkenhaug 2013), or to purify endogenous proteins (Kimple et al. 2013). Furthermore, the ability to insert large DNA elements such as promoters, protein coding sequences, or entire genes into the genome offers researchers endless options for genome modification. Drosophila melanogaster is an excellent animal model to analyze gene function because of its many genetic tools, fast generation time, and in vivo analysis (Venken et al. 2016; Korona et al. 2017; Bier et al. 2018).
The two most commonly used methods in Drosophila to knock-in DNA into endogenous genes involve either transposable elements or homology directed-repair (HDR). Transposable DNA elements insert randomly in the genome by a Transposase enzyme (Bellen et al. 2011), and cannot be used to target a user-specified gene. In contrast, HDR is used to insert DNA into a specific genomic location by cleavage at the genomic locus and precise homologous recombination of the DNA insert into the genome (Bier et al. 2018). Circular plasmids are commonly used as donor DNA for HDR because they can carry a large DNA insert (≤10kb) and homology arms corresponding to the target locus are added by traditional cloning techniques. While HDR is a useful method, the design and construction of unique plasmid donors for each gene is laborious. As a cloning-free alternative, synthesized single-stranded DNA (ssDNA) with short homology arms (∼50-100 bp each) (Bier et al. 2018) or long ssDNAs of ≤2kb (Quadros et al. 2017) can be used as donors. However, ssDNA donors are limited to relatively small insertions such as epitope tags, and, like HDR plasmid donors, must be designed and produced for each gene that is targeted. Therefore, there is a need for easier, faster, and cheaper alternatives to knock-in large DNA elements into the Drosophila genome.
It was recently shown that large DNA elements could be knocked into a specific target locus without homology arms, known as homology-independent insertion (Cristea et al. 2013; Maresca et al. 2013; Auer et al. 2014; Katic et al. 2015; Lackner et al. 2015; Schmid-Burgk et al. 2016; Suzuki et al. 2016; Katoh et al. 2017). In this method, simultaneous cutting of a circular donor plasmid and a genomic target-site by a nuclease such as Cas9 results in integration of the linearized donor plasmid into the genomic cut site by non-homologous end joining (NHEJ). This removes the need to construct homology arms, and only requires cloning or synthesizing a gene-specific single guide RNA (sgRNA). Furthermore, this approach is modular, since donor plasmids containing common insert sequences (e.g. GFP) can be targeted to different genomic locations and are thus “universal”. Generating knock-ins by homology-independent insertion has been successfully applied in human cell lines (Cristea et al. 2013; Maresca et al. 2013; Lackner et al. 2015; Schmid-Burgk et al. 2016; Katoh et al. 2017), mouse somatic cells (Suzuki et al. 2016), zebrafish (Auer et al. 2014), and C. elegans (Katic et al. 2015). However, this approach has not yet been applied in Drosophila.
Here, we show that homology-independent insertion functions effectively in Drosophila by using the CRISPaint method. We first characterize this method in cultured S2R+ cells, showing that a universal mNeonGreen donor plasmid can be used to fluorescently tag endogenous proteins at their C-terminus. We then demonstrate that this approach works in vivo, using a universal T2A-Gal4 donor plasmid in the fly germ line to obtain fly lines with insertions in a number of characterized genes. We show that these insertions can be used as expression reporters for the target gene and to generate loss-of-function phenotypes. Finally, we present a collection of different universal donor plasmids for the purpose of enabling the Drosophila research community to employ this method for their specific uses.
Materials and Methods
Plasmid cloning
pCFD3-frame_selector_(0, 1, or 2) plasmids (Addgene #s 127553-127555) were cloned by ligating annealed oligos encoding sgRNAs that target the CRISPaint target site (Schmid-Burgk et al. 2016) into pCFD3 (Port et al. 2014), which contains the Drosophila U6:3 promoter.
Additional sgRNA-encoding plasmids were generated by the TRiP (https://fgr.hms.harvard.edu/) or obtained from Filip Port (Port et al. 2015). sgRNA plasmids targeting CDS close to the stop codon were GP07595 (Act5c), GP07596 (His2Av), GP07609 (alphaTub84B), and GP07612 (Lam). sgRNA plasmids targeting CDS close to the start codon were GP06461 (wg), GP02894 (FK506-bp2), GP05054 (alphaTub84B), GP00225 (esg), GP00364 (Myo1a), GP00400 (btl), GP00583 (Mhc), GP01881 (hh), GP03252 (Desat1), GP05302 (ap), pFP545 (ebony), and pFP573 (ebony). These sgRNAs were cloned into pCFD3, with the exception of those targeting esg, Myo1a, btl, and Mhc, which were cloned into pl100 (Kondo and Ueda 2013).
pCRISPaint-T2A-Gal4-3xP3-RFP (Addgene # 127556) was constructed using Gibson assembly (E2611, NEB) of three DNA fragments: 1) Gal4-SV40-3xP3-RFP was PCR amplified from pHD-Gal4-DsRed (Xu et al. 2019 submitted PNAS, (Gratz et al. 2014); 2) linear plasmid backbone generated by digesting pWalium10-roe (Perkins et al. 2015) with AscI/SacI; and 3) a synthesized double-stranded DNA fragment (gBlock, IDT) encoding the CRISPaint target site, linker sequence, T2A, and ends that overlap the other two fragments.
pCRISPaint-T2A-ORF-3xP3-RFP donor plasmids (Addgene #s 127557-127565) were cloned by PCR amplifying the ORFs and Gibson cloning into CRISPaint-T2A-Gal4-3xP3-RFP cut with NheI/KpnI. ORF sequences were amplified from templates as follows: sfGFP [amplified from pUAS-TransTimer (He et al. 2019 submitted)], LexGAD [amplified from pCoinFLP-LexGAD/Gal4 (Bosch et al. 2015)], QF2 amplified from Addgene #80274, Cas9-T2A-GFP (amplified from template kindly provided by Raghuvir Viswanatha), FLPo (amplified from Addgene #24357), Gal80 (amplified from Addgene #17748), Nluc (amplified from Addgene #62057), Gal4DBD, (amplified from Addgene #26233), and p65 (amplified from Addgene #26234).
pCRISPaint-sfGFP-3xP3-RFP (Addgene # 127566) was cloned by PCR amplifying sfGFP coding sequence and Gibson cloning into CRISPaint-T2A-Gal4-3xP3-RFP cut with NotI/KpnI.
pCRISPaint-CRIMIC_phase_(0,1, or 2) (Addgene #s 127567-127569) donor plasmids were cloned by ligating annealed oligos containing the CRISPaint target site into CRIMIC [pM37, (Lee et al. 2018)] (frames 0,1,2) cut with NsiI.
pCRISPaint-TGEM_phase_(0,1,or 2) (Addgene #s 127570-127572) donor plasmids were cloned by ligating annealed oligos containing the CRISPaint target site into T-GEM (Diao et al. 2015) (frames 0,1,2) cut with AgeI/NotI.
See Supplemental Table 6 for oligo and dsDNA sequences and Addgene for plasmid sequences.
Cell culture
Drosophila S2R+ cells stably expressing Cas9 and a mCherry protein trap in Clic (known as PT5/Cas9) (Viswanatha et al. 2018) were cultured at 25°C using Schneider’s media (21720-024, ThermoFisher) with 10% FBS (A3912, Sigma) and 50 U/ml penicillin strep (15070-063, ThermoFisher). S2R+ cells were transfected using Effectene (301427, Qiagen) following the manufacturer’s instructions. Plasmid mixes were composed of sgRNA-expressing plasmids (see above) and pCRISPaint-mNeon-PuroR (Schmid-Burgk et al. 2016). Cells were transfected with plasmid mixes in 6-well dishes at 1.8×106 cells/ml, split at a dilution of 1:6 after 3-4 days, and incubated with 2 µg/ml Puromycin (540411 Calbiochem). Every 3-5 days, the media was replaced with fresh Puromycin until the cultures became confluent (∼12-16 days). For single-cell cloning experiments, cultures were split 1:3 two days before sorting. Cells were resuspended in fresh media, triturated to break up cell clumps, and pipetted into a cell straining FACS tube (352235 Corning). Single cells expressing mNeonGreen were sorted into single wells of a 96 well plate containing 50% conditioned media 50% fresh media using an Aria-594 instrument at the Harvard Medical School Division of Immunology’s Flow Cytometry Facility. Once colonies were visible by eye (3-4 weeks), they were expanded and screened for mNeonGreen fluorescence.
Fly genetics and embryo injections
Flies were maintained on standard fly food at 25°C. Fly stocks were obtained from the Perrimon lab collection or Bloomington Stock center (indicated with BL#). Stocks used in this study are as follows: yw (Perrimon Lab), yw/Y hs-hid (BL8846), yw; nos-Cas9attP40/CyO (derived from BL78781), yw;; nos-Cas9attP2 (derived from BL78782), yw; Sp hs-hid/CyO (derived from BL7757), yw;; Dr hs-hid/TM3,Sb (derived from BL7758), UAS-2xGFP (BL6874), wg1-17/CyO (BL2980), wg1-8/CyO (BL5351), Df(2L)BSC291/CyO (BL23676), Mhc[k10423]/CyO (BL10995), Df(2L)H20/CyO (BL3180), Df(2L)ED8142/SM6a (BL24135), hh[AC]/TM3 Sb (BL1749), Df(3R)ED5296/TM3, Sb (BL9338), esgG66/CyO UAS-GFP (BL67748), Df(2R)Exel6069/CyO (BL7551).
For embryo injections, each plasmid was column purified (Qiagen) twice, eluted in injection buffer (100 µM NaPO4, 5 mM KCl), and adjusted to 200ng/µl. Plasmids were mixed equally by volume, and mixes were injected into Drosophila embryos using standard procedures. For targeting genes on Chr. 2, plasmid mixes were injected into yw;; nos-Cas9attP2 embryos. For targeting genes on Chr. 3, plasmid mixes were injected into yw; nos-Cas9attP40/CyO embryos. Approximately 500 embryos were injected for each targeted gene.
Injected G0 flies were crossed with yw. We used yw/Y hs-hid to facilitate collecting large numbers of virgin flies by incubating larvae and pupae at 37°C for 1hr. G1 flies were screened for RFP expression in the adult eye on a Zeiss Stemi SVII fluorescence microscope. G1 RFP+ flies were crossed with the appropriate balancer stock (yw; Sp hs-hid/CyO or yw;; Dr hs-hid/TM3,Sb). G2 RFP+ males that were yellow-(to remove the nos-Cas9 transgene) and balancer+ were crossed to virgins of the appropriate balancer stock (yw; Sp hs-hid/CyO or yw;; Dr hs-hid/TM3,Sb). G3 larvae and pupae were heat shocked at 37°C for 1hr to eliminate the hs-hid chromosome, which generates a balanced stock (e.g. yw; [RFP+]/CyO).
Imaging
S2R+ cells expressing mNeonGreen were plated into wells of a glass-bottom 384 well plate (6007558, PerkinElmer). For fixed cell images, cells were incubated with 4% paraformaldehyde for 30min, washed with PBS with .1% TritonX-100 (PBT) 3x 5min each, stained with 1:1000 DAPI (D1306, ThermoFisher) and 1:1000 phalloidin-TRITC (P1951, Sigma), and washed with PBS. Plates were imaged on an IN Cell Analyzer 6000 (GE) using a 20x or 60x objective. Time-lapse videos of live mNeonGreen expressing single cell cloned lines were obtained by taking an image every minute using a 60x objective. Images were processed using Fiji software.
Wing imaginal discs from 3rd instar larvae were dissected in PBS, fixed in 4% paraformaldehyde, and permeabilized in PBT. For Wg staining, carcasses were blocked for 1hr in 5% normal goat serum (S-1000, Vector Labs) at room temp, and incubated with 1:50 mouse anti-wg (4D4, DSHB) primary antibody and 1:500 anti-mouse 488 (A-21202, Molecular Probes) secondary antibody. Primary and secondary antibody incubations were performed at 4°C overnight. All carcasses were stained with DAPI and phalloidin-TRITC, and mounted on glass slides with vectashield (H-1000, Vector Laboratories Inc.) under a coverslip. Images of mounted wing discs were acquired on a Zeiss 780 confocal microscope.
Larvae, pupae, and adult flies were imaged using a Zeiss Axio Zoom V16 fluorescence microscope.
Quantification of mNeonGreen expressing S2R+ cells
For FACs-based cell counting, we collected cultures from each gene knock-in experiment before and after puromycin selection. Pre-selection cultures were obtained by collecting of 500ul of culture 3-4 days after transfection. Post-selection cultures were obtained after at least 2 weeks of puromycin incubation. Non-transfected cells were used as a negative control. 100,000 cells were counted for each sample and FlowJo software was used to analyze and graph the data. FSC-A vs GFP-A was plotted and we defined mNeonGreen+ cells by setting a signal intensity threshold where <0.02% of negative controls are counted due to autofluorescence.
For microscopy-based cell counting, the number of mNeonGreen cells was quantified by analyzing confocal images in Fiji using the manual Cell Counter Plugin (model). For transfected cells, 6 fields containing at least 200 cells were quantified (i.e. n=6). For puro-selected cells, 3 fields containing at least 200 cells were quantified (i.e. n=3).
Western blotting
Single cell-cloned cell lines were grown until confluent and 1ml of resuspended cells was centrifuged at 250g for 10min. The cell pellet was resuspended in 1ml ice cold PBS, re-centrifuged, and the pellet was lysed in 250ul 2x SDS-Sample buffer and boiled for 5min. 10ul was loaded on a 4-20% Mini-Protean TGX SDS-Page gel (4561096, BioRad), transferred to PVDF membrane (IPFL00010, Millipore), blocked in 5% non-fat dry milk, primary blotting using anti-mNeonGreen (1:1000, Chromtek 32F6) or hFAB™ Rhodamine Anti-Actin (12004164 BioRad), and secondary blotting using 1:3000 anti-mouse HRP (NXA931, Amersham), imaging using ECL (34580, ThermoFisher) on a ChemiDoc MP Imaging System (BioRad).
PCR, sequencing, and sgRNA cutting assays
S2R+ cell genomic DNA was isolated using QuickExtract (QE09050, Lucigen). Fly genomic DNA was isolated by grinding a single fly in 50µl squishing buffer (10 mM Tris-Cl pH 8.2, 1 mM EDTA, 25 mM NaCl) with 200µg/ml Proteinase K (3115879001, Roche), incubating at 37°C for 30 min, and 95°C for 2 minutes. PCR was performed using Taq polymerase (TAKR001C, ClonTech) when running DNA fragments on a gel, and Phusion polymerase (M-0530, NEB) was used when DNA fragments were sequenced. DNA fragments corresponding to mNeonGreen or T2A-Gal4 insertion sites were amplified using primer pairs where one primer binds to genomic sequence and the other primer binds to the insert. For amplifying non-knock-in sites, we used primers that flank the sgRNA target site. Primer pairs used for gel analysis and/or Sanger sequencing were designed to produce DNA fragments <1kb. Primer pairs used for next-generation sequencing of the insertion site were designed to produce DNA fragments 200-280bp. DNA fragments were run on a 1% agarose gel for imaging or purified on QIAquick columns (28115, Qiagen) for sequencing analysis. See Supplemental Table 6 for oligo sequences.
Sanger sequencing was performed at the DF/HCC DNA Resource Core facility and chromatograms were analyzed using Lasergene 13 software. Next-generation sequencing was performed at the MGH CCIB DNA Core. Fastq files were analyzed using CRISPresso2 (Clement et al. 2019) by entering the PCR fragment sequence into the exon specification window and setting the window size to 10 bases. Quantification of insertion types (seamless, in-frame in/del, and frameshift in/del) was taken from the allele plot and frame shift analysis outputs of CRISPresso2. The small proportion of “unmodified” reads that were not called by frameshift analysis were not included in the quantification.
T7 endonuclease assays (M0302L, NEB) were performed following the manufacturer instructions.
Data availability
Donor plasmids and frame selector sgRNA plasmids will be deposited at Addgene. Fly strains, S2R+ cell lines, and sequence data are available on request. Oligo and dsDNA sequences are listed in Supplemental Table 6.
Results
To test homology-independent knock-in in Drosophila, we implemented a strategy known as CRISPaint (Schmid-Burgk et al. 2016). This system is used to insert a protein tag or reporter gene into the coding sequence of an endogenous gene. Although it was originally designed for mammalian cell culture, CRISPaint has several advantages for use in Drosophila. First, this system uses CRISPR-Cas9 to induce double-strand breaks (DSBs), which is known to function efficiently in Drosophila cultured cells (Bottcher et al. 2014; Viswanatha et al. 2018) and the germ line (Kondo and Ueda 2013; Ren et al. 2013; Yu et al. 2013; Bassett et al. 2014). Second, its use of a frame-selector gRNA target site makes insertion into the appropriate translation frame simple and modular (see below). Third, a collection of existing CRISPaint donor plasmids (Schmid-Burgk et al. 2016) containing common tags (e.g. GFP, RFP, Luciferase) are seemingly compatible for expression in Drosophila.
The CRISPaint system works by introducing three components into Cas9-expressing cells: 1) a single guide RNA (sgRNA) targeting a genomic locus; 2) a donor plasmid containing an insert sequence; and 3) a frame selector sgRNA targeting the donor plasmid (Figure 1A). This causes simultaneous cleavage of the genomic locus and donor plasmid, leading to the integration of linearized donor into the genomic cut site by non-homologous end joining (NHEJ). To ensure that the donor plasmid inserts in-frame with the endogenous gene, one of three frame selector sgRNAs are used. Importantly, these frame-selector sgRNAs do not target the Drosophila genome.
Homology-independent insertion functions efficiently in Drosophila S2R+ cells to produce endogenous protein tags
To test if the CRISPaint method functions in Drosophila, we set out to replicate the findings of (Schmid-Burgk et al. 2016) in cultured S2R+ cells by genetically tagging endogenous proteins at their C-terminus. To accomplish this, we generated plasmids expressing frame-selector sgRNAs (frame 0,1, or 2) under the control of Drosophila U6 sequences (Port et al. 2015) (Figure 1A). In addition, we generated plasmids expressing sgRNAs that target the 3’ coding sequence of endogenous Drosophila genes. We chose to target Actin5c, His2Av, alphaTub84B, and Lamin because these genes are expressed in S2R+ cells (Hu et al. 2017) and encode proteins with known subcellular localization (actin filaments, chromatin, microtubules, nuclear envelope, respectively). For donor plasmid, we used pCRISPaint-mNeonGreen-T2A-PuroR (Schmid-Burgk et al. 2016), which contains a frame-selector sgRNA target site upstream of coding sequence for the fluorescent mNeonGreen protein and Puromycin resistance protein (PuroR) linked by a cleavable T2A peptide sequence. Importantly, only integration of the donor plasmid in-frame with the target gene coding sequence will result in translation of mNeonGreen-T2A-PuroR (Figure 1A).
We transfected Cas9-expressing S2R+ cells (Viswanatha et al. 2018) with a mix of three plasmids: pCRISPaint-mNeonGreen-T2A-PuroR donor, target-gene sgRNA, and the appropriate frame-selector sgRNA (Figure 1A, Supplemental Table 1). As an initial method to detect knock-in events, we used PCR to amplify the predicted insertion sites from transfected cells. Using primers that are specific to the target gene and mNeonGreen sequence, we successfully amplified gene-mNeonGreen DNA fragments for all four genes (Figure 1B). Furthermore, next-generation sequencing of these amplified fragments revealed that 34-50% of sense-orientation insertions are in frame with the target gene (Figure 1B, Supplemental Figure 1, Supplemental Table 1).
Next, we measured mNeonGreen fluorescence in transfected S2R+ cells as a more direct method of quantifying the frequency of in-frame knock-ins. Flow cytometry-based cell counting of transfected cells revealed that the number of mNeonGreen+ cells range from 0.19-2.4% (Figure 1C, Supplemental Table 1), in agreement with published results in human cultured cells (Schmid-Burgk et al. 2016). These results were confirmed by confocal analysis of transfected cells, which showed mNeonGreen fluorescence in a small subset of cells (Figure 1C). Analysis of confocal images of Act5c and His2Av samples showed that 3.2% and 2.4% of transfected cells expressed mNeonGreen (Figure 1C, Supplemental Table 1), which roughly agreed with flow cytometry cell counting. Finally, mNeonGreen localized to the expected subcellular compartments, most obviously observed by His2Av-mNeonGreen and Lam-mNeonGreen co-localization with the nucleus, and Act5c-mNeonGreen and alphatub-mNeonGreen exclusion from the nucleus (Figure 1C). These results suggest that a significant number of transfected S2R+ cells received in-frame insertion of mNeonGreen at their C-terminus using the CRISPaint homology-independent insertion method.
For knock-in cells to be useful in experiments, it is important to derive cultures where most cells, if not all, carry the insertion. Therefore, we enriched for in-frame insertion events using Puromycin selection (Figure 1D). After a two-week incubation of transfected S2R+ cells with Puromycin, flow-cytometry and confocal analysis revealed that most cells express mNeonGreen and exhibit correct subcellular localization (Figure 1E, Supplemental Table 1). For alphaTub84B, cell counting by flow-cytometry greatly underestimated the number of mNeonGreen+ cells counted by confocal analysis, likely because the mNeonGreen expression level was so low. Therefore, Puromycin selection is a fast and efficient method of selecting for mNeonGreen expressing knock-in cells after transfection.
A subset of cells in Puro-selected cultures had no mNeonGreen expression or unexpected localization (Figure 1E). Since each culture is composed of different cells with independent insertion events, we used FACs to derive single-cell cloned lines expressing mNeonGreen for further characterization (Figure 2A). At least 14 single-cell cloned lines were isolated for each target gene and imaged by confocal microscopy. Within a given clonal culture, every cell exhibited the same mNeonGreen localization (Figure 2B), confirming our single-cell cloning approach and demonstrating that the insertion is genetically stable over many cell divisions. Importantly, while many clones exhibited the predicted mNeonGreen localization, a subset of the clonal cell lines displayed an unusual localization pattern (Figure 2B). For example, three Act5c-mNeonGreen clones had localization in prominent rod structures, and 12 Lamin-mNeonGreen clones had asymmetric localization in the nuclear envelope (Figure 2B). In addition, some clones had diffuse mNeonGreen localization in the cytoplasm and nucleus (Figure 2B).
To better characterize the insertions in single cell-cloned lines, we further analyzed three clones per gene (12 total), selecting different classes when possible (correct localization, unusual localization, and diffuse localization) (Supplemental Table 2). Using PCR amplification of the predicted insertion site (Figure 1A, Figure 2C) and sequencing of amplified fragments (Supplemental Table 2), we determined that all clones with correct or unusual mNeonGreen localization contained an in-frame insertion of mNeonGreen with the target gene. In contrast, we were unable to amplify DNA fragments from the expected insertion site in clones with diffuse mNeonGreen localization (Figure 2C). Western blotting of cell lysates confirmed that only clones with in-frame mNeonGreen insertion express fusion proteins that match the predicted molecular weights (Figure 2D). All together, these results suggest that clones with correct mNeonGreen localization are likely to contain an in-frame insert in the correct target gene.
Since S2R+ cells are polyploid (Lee et al. 2014), clones expressing mNeonGreen could bear one or more insertions. Furthermore, in/dels induced at the non-insertion locus could disrupt protein function. To explore these possibilities, we amplified the non-insertion locus in our single-cell cloned lines and used Sanger and next-generation sequencing to analyze the DNA fragments (Figure 2C, Supplemental Table 2). For each gene, we could find in/dels occurring at the non-insertion sgRNA cut site. For example, we could distinguish four distinct alleles in clone B11: a 3bp deletion, a 2bp deletion, a 1bp deletion, and a 27bp deletion. In addition, we identified an unusual mutation in clone C6, where a 1482bp DNA fragment inserted at the sgRNA cut site, which corresponds to a region from alphatub84D. We assume that this large insertion was caused by homologous recombination, since alphatub84D and alphatub84B share 92% genomic sequence identity (Flybase). For Act5c-mNeonGreen clones A5 and A19, numerous in/del sequences were found, suggesting this region has an abnormal number of gene copies. We were unable to amplify a DNA fragment from Lam-mNeonGreen D9, despite follow-up PCRs using primers that bind genomic sequence further away from the insertion site (not shown).
One useful application of cell lines with fluorescently tagged endogenous proteins is to track their localization over time. Therefore, we used live confocal imaging of our single-cell cloned lines to capture mNeonGreen localization during cell division (Figure 2E, Supplemental Videos). Time-lapse images of dividing cells showed that Act5c-mNeonGreen localized to rod structures that asymmetrically or symmetrically distribute into daughter cells, His2Av-mNeonGreen localized to chromosomes that segregate into daughter cells, Lam-mNeonGreen showed disassembly and reassembly at the nuclear envelope, and alphaTub84B-mNeonGreen localized to mitotic spindles. These results demonstrate the usefulness of knock-in Drosophila cell lines to track the dynamic localization of endogenous proteins.
In vivo germ line knock-in of T2A-Gal4 into endogenous genes using homology-independent insertion
We next tested if homology-independent insertion could function in the Drosophila germ line for the purpose of generating knock-in fly strains. Compared to cultured cells, the isolation of flies bearing insertions that are in-frame with endogenous genes required additional considerations. As opposed to antibiotic selection, visible markers are commonly used to identify transgenic animals. In addition, since some genes are expressed at low levels, target gene expression of an inserted reporter element may be insufficient to identify in-frame insertion events.
To overcome these issues, we constructed the donor plasmid pCRISPaint-T2A-Gal4-3xP3-RFP (Figure 3A). This donor contains a frame-selector sgRNA target site upstream of the reporter gene T2A-Gal4, which encodes a form of the transcription factor Gal4 that is cleaved from tagged endogenous protein (Diao and White 2012). Insertion of this element in-frame with genomic coding sequence would result in Gal4 translation, which can be detected using a UAS-reporter transgene (Brand and Perrimon 1993). In addition, this donor plasmid contains a 3xP3-RFP selectable marker gene that expresses bright red fluorescence in Drosophila larval tissues and the adult eye (Berghammer et al. 1999; Gratz et al. 2014) (Figure 3B). Importantly, the expression of 3xP3-RFP is not dependent on in-frame insertion with the target gene.
Next, we tested for pCRISPaint-T2A-Gal4-3xP3-RFP insertion into 11 endogenous genes (Supplemental Table 3). These genes were selected based on their known expression pattern, expression levels, or loss of function phenotype. Furthermore, we targeted pCRISPaint-T2A-Gal4-3xP3-RFP to insert into the 5’ portion of the coding sequence (Figure 3A). This insertion location is designed to disrupt the protein product by premature truncation. Plasmid mixes were injected into nos-Cas9 embryos, the resulting G0 progeny were outcrossed to yw, and G1 adults were screened for RFP fluorescence (Figure 3C). Each RFP+ founder fly was outcrossed to an appropriate balancer stock to establish a stable line. Figure 3D and Supplemental Table 3 shows the integration efficiency results for each gene and Supplemental Table 4 has information on each balanced RFP+ line. From this data, we find that the frequency of G0 crosses yielding RFP+ G1 progeny varies between 5% and 21% (Figure 3D, Supplemental Table 3). For example, when targeting ebony with pFP545, 3 out of 16 G0 crosses produced ≥1 RFP+ G1 flies. Therefore, the pCRISPaint-T2A-Gal4-3xP3-RFP donor can insert into the genome of germ line cells in a homology-independent manner.
To gain insight into the genomic location of the insertions in our RFP+ lines, we first analyzed them by simple genetic crosses. During the fly stock balancing process, we determined that each insertion was located on the intended chromosome (Supplemental Table 4). In addition, flies that were homozygous for the insertion exhibited known phenotypes. For example, homozygous insertions in ebony produced flies with dark cuticle pigment (Figure 3E). Furthermore, flies with insertions targeting wg, Mhc, hh, and esg were homozygous lethal, which is consistent with known loss of function mutations in these genes (Supplemental Table 4). To test if the lethality of flies with homozygous insertions was due to on- or off-target gene disruption, we performed complementation tests by crossing RFP+ insertion lines with lines containing a known loss of function allele or genomic deletion spanning the gene. In all cases tested, trans-heterozygous combinations were lethal (Supplemental Table 4). Together, these results suggest that the pCRISPaint-T2A-Gal4-3xP3-RFP donor plasmid inserted into the intended target genes.
For T2A-Gal4 to be expressed by the target gene, the linearized donor plasmid must insert in the sense orientation relative to the target gene and in-frame with the coding sequence. As an initial screen for such events, we crossed RFP+ lines to a UAS-GFP line and assayed progeny for fluorescence. Through this approach, we identified Gal4-expressing lines for ebony, myo1a, wg, and Mhc (Figure 4A, Supplemental Tables 3, 4). wg-T2A-Gal4 (#1 and 4), Mhc-T2A-Gal4 (#1 and 2), and Myo1a-T2A-Gal4 (#1) insertions express in the imaginal disc, larval muscle, and larval gut (Figure 4A), respectively, which matches the known expression patterns for these genes. Furthermore, wg-T2A-Gal4 #1 and #4 insertions were expressed in a distinctive Wg pattern in the wing disc pouch (Figure 4B). The expression pattern of ebony is less well understood. We find that ebony-T2A-Gal4 pFP545 #2 is expressed in the larval brain (Figure 4A) and throughout the pupal body (Figure 4C), which is consistent with a previous study (Hovemann et al. 1998). However, ebony-T2A-Gal4 pFP545 #2 is also expressed in the larval tracheal openings (Figure 4A), indicating that ebony may play a role in this tissue.
Next, we analyzed the insertion orientation and sequence structure in the RFP+ lines that express Gal4. We PCR amplified a region flanking the predicted insertion site from genomic DNA using primer pairs to distinguish sense and anti-sense insertions (Supplemental Figure 2). All RFP+ lines with Gal4 expression had insertions that were in the sense orientation (Supplemental Table 4, Supplemental Figure 2). Sequencing the resulting PCR fragments showed that the insert was present at the sgRNA cut site and each insertion contained an in/del between the target gene and T2A-Gal4 sequence (Figure 4D). For example, ebony-T2A-Gal4 pFP545 #2 contains a 15bp genomic deletion that is predicted to keep T2A-Gal4 in-frame with ebony. Similarly, wg-T2A-Gal4 #1 contains an in-frame 45bp deletion and 21bp insertion. Remarkably, wg-T2A-Gal4 #4 contains a frameshift in/del (Figure 4D), yet still expresses Gal4 in the Wg pattern, albeit at significantly lower levels than wg-T2A-Gal4 #1 (Figure 4B). In similar cases, Mhc-T2A-Gal4 lines #1, #2, and Myo1a-T2A-Gal4 #1, each have in/dels that put T2A-Gal4 out of frame with the target gene coding sequence. These findings confirm that our Gal4-expressing lines have T2A-Gal4 inserted in the correct gene and orientation, but that in-frame insertion with the target gene is not necessarily a requirement.
To better characterize the insertion events in our collection of RFP+ lines, we analyzed those that did not produce fluorescence when crossed with UAS-GFP (Supplemental Table 4, Supplemental Figures 2,3). Using PCR and sequencing analysis, we found that some lines contained insertions in the correct target site but in the anti-sense orientation. In addition, we identified lines with insertions in the sense orientation, but were out of frame relative to the target gene. Unexpectedly, we found that wg-T2A-Gal4 #6 contained a sense orientation in-frame insertion. Yet, unlike wg-T2A-Gal4 #1, wg-T2A-Gal4 #6 does not express Gal4. Importantly, our molecular analysis of every independently isolated RFP+ line (20 in total) revealed that each contained an insertion in the intended target site (Supplemental Table 4, Supplemental Figures 2,3).
We did not obtain RFP+ insertions when targeting ap, alphaTub84B, btl, or Desat1. Therefore, we investigated whether the sgRNAs targeting these genes were functional. All 4 sgRNAs used for germ line knock-ins have an acceptable efficiency score of >5, with the exception of the sgRNA targeting btl (Supplemental Table 5). We tested whether the sgRNAs were functional in transfected S2R+ cells by performing a T7 endonuclease assay that detects in/dels at the cut site. This test revealed that sgRNAs targeting ap, alphaTub84B, btl can cut at the target site, whereas the results with desat1 were inconclusive (Supplemental Figure 4A). As an alternative functional test, we used PCR to detect knock-in events in S2R+ cells transfected with the pCRISPaint-T2A-Gal4-3xP3-RFP donor plasmid. This showed that sgRNAs targeting ap, alphaTub84B, btl and desat1 can successfully knock-in pCRISPaint-T2A-Gal4-3xP3-RFP (Supplemental Figure 4B). Finally, we sequenced the sgRNA target sites in the nos-Cas9 fly strains and found a SNP in the btl sgRNA binding site (not shown). The 10 remaining sgRNAs had no SNPs in the target site. In summary, we conclude that the sgRNAs targeting ap, alphaTub84B, btl, and Desat1 are able to induce cleavage at their target site in S2R+ cells, but that the sgRNA targeting btl will not function in the germ line using our nos-Cas9 strains.
A resource of CRISPaint donor plasmids for germ line knock-ins in Drosophila
To facilitate the insertion of other sequences using the CRISPaint insertion method, we generated 10 additional donor plasmids based on the same architecture as pCRISPaint-T2A-Gal4-3xP3-RFP (Figure 5A). These include T2A-containing donors with sequence encoding the alternative binary reporters LexGAD, QF2, and split-Gal4, as well as Cas9 nuclease, FLP recombinase, Gal80 repression protein, NanoLuc luminescence reporter, and super-folder GFP. Like T2A-Gal4, these can be used to insert at 5’ coding sequence, capturing endogenous gene expression and generating a loss-of function. In addition, we generated pCRISPaint-sfGFP-3xP3-RFP, which can be used to insert into 3’ coding sequence, generating a C-terminal GFP fusion protein.
Several groups have demonstrated that coding sequence containing a splice acceptor (SA) and inserted in a gene intron can produce a protein trap with the preceding coding exon (Morin et al. 2001; Venken et al. 2011). Recently, two studies produced SA-T2A-Gal4 donor plasmids for intron insertion by HDR, called CRIMIC and T-GEM (Diao et al. 2015; Lee et al. 2018). Therefore, we modified these two plasmids to contain a CRISPaint target site upstream of the splice acceptor (Figure 5B).
Discussion
The insertion of large DNA elements into the genome by HDR requires a great deal of expertise and labor for the design and construction of donor plasmids. Some groups have developed strategies to improve the efficiency and scale at which homology arms are cloned into donor plasmids (Housden et al. 2014; Gratz et al. 2015), but the root problem still remains. Furthermore, each new gene-specific donor plasmid requires the same amount of investment for their construction but is only used once to achieve the desired knock-in. For these reasons, we believe that the current methods for knock-in by HDR may act as a barrier to achieving widespread use by the Drosophila community.
In this study, we addressed these challenges by demonstrating that large DNA elements can insert into the Drosophila genome by a homology-independent mechanism, using the previously established CRISPaint system. This approach has two major advantages over HDR. 1) No construction of a donor plasmid is necessary, as long as a suitable CRISPaint-compatible donor plasmid already exists. The only unique reagent needed is an sgRNA that targets the endogenous gene (also required for HDR). Cloning sgRNAs into expression plasmids, such as pCFD3 (Port et al. 2014), is simple, fast, inexpensive, and works nearly every time. Furthermore, the availability of sgRNA-encoding plasmids from public resources (e.g. TRiP, Addgene), and synthesized sgRNA from commercial companies, means that researchers can increasingly order their sgRNAs. 2) CRISPaint-compatible donor plasmids are “universal” and thus modular. For example, different genes can be targeted by the same CRISPaint donor plasmid, and different CRISPaint donor plasmids can be targeted to the same gene. Publicly available collections of CRISPaint donor plasmids [(Schmid-Burgk et al. 2016), this study] ensure that researchers only need to select their insert of choice. Indeed, the CRISPaint donor plasmids originally used for mammalian cell culture also function in Drosophila S2R+ cells (Figure 1) and the 3xP3-RFP marker in our germ line donor plasmids is compatible with other insects (Berghammer et al. 1999).
An important step in obtaining correctly targeted knock-ins is molecularly validating the candidate insertions. Confirming an HDR insertion requires amplifying a large DNA fragment (∼1.5kb-2kb) that encompasses part of the insert, an entire homology arm, and a portion of genomic sequence flanking the homology arm. This is necessary to verify that the donor did not insert off-target. These PCRs can sometimes fail or give inconclusive results due to the large fragment size. In contrast, CRISPaint knock-ins are easier to characterize by PCR analysis and sequencing because the amplified region is relatively small (∼200-800bp) (Figure 1B, Figure 2C, Supplemental Figures 2,4). However, CRISPaint knock-ins require more work to screen since they can insert in two directions and in/dels occur at the insertion site. When possible, we recommend that researchers select for insert expression before molecular validation.
In this study, we generated knock-ins by inserting the entire linearized CRISPaint donor plasmid into the target gene. Since the backbone contains bacterial sequences, it may cause transgene silencing or impact neighboring gene expression (Chen et al. 2004; Suzuki et al. 2016). However, we note that thousands of transgenic fly lines contain bacterial sequences from phiC31 integration (Perkins et al. 2015) with no reports of ill effects. Another issue is that insertion of the entire plasmid restricts the design of gene-tagging events to only append the insert 3’ to the target insertion site. Different groups have used approaches that address these issues, such as providing donor plasmids as mini-circles (Schmid-Burgk et al. 2016; Suzuki et al. 2016), cutting donor plasmid twice to liberate the insert fragment (Lackner et al. 2015; Suzuki et al. 2016), or using PCR amplified inserts (Manna et al. 2019 BioRxiv). The first two modifications could in theory be made to our germ-line donor plasmids (e.g. pCRISPaint-T2A-Gal4-3xP3-RFP), but for this study we opted to establish the simplest protocol possible. Furthermore, we reasoned that cutting the donor twice would give rise to two donor fragments and this could reduce knock-in efficiency.
Using the CRISPaint method in S2R+ cells, we readily identified cell lines with endogenous proteins tagged with mNeonGreen at their C-terminus (Figure 2, Supplemental Table 2). However, some lines exhibited unusual or unexpected protein localization. In clones D6 and D9, Lamin-mNeonGreen localizes to the nuclear envelope, but in D9 this localization is enriched asymmetrically in the direction of the previous plane of cell division. Since these two clones contain the same seamless mNeonGreen insertion, we speculate that mutations at non-knock-in loci account for this difference. Indeed, clone D6 contained an in-frame 3bp deletion at the non-knock-in locus, likely retaining wild-type function, whereas D9 had no remaining non-knock-in locus. We saw a similar pattern for clones A3 and A5, where both had seamless mNeonGreen insertions in Actin5c, but clone A5 exhibited distinct rod structures. Finally, alphaTub84B-mNeonGreen fluorescence and protein levels were extremely low in all cell lines, despite alphaTub84B being highly expressed in S2R+ cells (Hu et al. 2017). We speculate that the alphaTub84B-mNeonGreen fusion protein is unstable and previous studies in other organisms have highlighted problems with C-terminal tagging of alpha-Tubulin (Carminati and Stearns 1997). Similarly, C-terminal tags can disrupt Lamin and Actin function (Davies et al. 2009; Nagasaki et al. 2017). These findings illustrate the need for experimenters to consider the existing knowledge of the protein when generating C-terminal protein fusions, and to carefully screen individual single cell cloned lines.
We constructed a CRISPaint-compatible T2A-Gal4 donor plasmid for use in the fly germ line and successfully identified insertion lines. Our knock-in efficiency, defined by the percentage of injected G0 flies that give RFP+ progeny, ranged from 5-21% (Figure 3B, Supplemental Table 3), which is roughly similar to knock-in efficiencies observed when using HDR [5-22% (Gratz et al. 2014), 46-88% (Port et al. 2015), 7-42% (Gratz et al. 2015)]. In addition, all 20 of our RFP+ fly lines, which encompass 8 different sgRNA target sites, contain an insertion at the correct location. Though, we do not rule out the possibility of a second-site off-target event on the same chromosome.
To obtain T2A-Gal4 insertions that express Gal4 under the control of the target gene (Figure 4), it was necessary to screen multiple independently derived insertions, due to the in/dels that occur at the insertion site and the two insertion orientations. However, we found for some genes the overall efficiencies were too low to obtain a successful Gal4-expressing line (hh, esg, FK506-bp2), or we did not obtain any RFP+ insertions (ap, alphaTub84B, btl, and Desat1). Additional steps could be taken to improve insertion efficiency, such as optimization of the injected plasmid concentrations, increasing the number of injected embryos, or simply reattempting with a different sgRNA. It is also possible that certain insertions are toxic to cells/animals during G0 germ-line development or in G1 progeny.
There were three unexpected findings with our germ-line insertions. First, some Gal4-expressing lines had T2A-Gal4 inserted out of frame relative to the target gene. We speculate that it may be the result of ribosome frameshifting (Ketteler 2012), an internal ribosome entry site (IRES) (Komar and Hatzoglou 2005), or the presence of alternative open reading frames (altORFs) (Mouilleron et al. 2016). However, we find no obvious evidence of these mechanisms by analyzing the sequence flanking wg, Mhc, and Myo1a insertion sites (not shown). Ultimately, we consider this a fortuitous effect as long as Gal4 is expressed in the correct pattern. Second, we found an in-frame insertion in wg (#6) that that does not express Gal4. This finding highlights the importance of screening RFP+ insertions for Gal4 expression when possible. Third, we found that, unlike in cell culture, all of our RFP+ fly lines contain in/dels at the insertion site. Germ cells are known to differ in their NHEJ mechanisms compared to somatic cells (Preston et al. 2006; Ahmed et al. 2015), but it is not clear why this would reduce the frequency of seamless insertions. Perhaps genetic or chemical manipulation of NHEJ regulators during embryo injection could address this issue in the future. This finding also suggests that the CRISPaint frame-selector approach may not be as useful in the fly germ line as it is in cell culture.
Our collection of donor plasmids (Figure 5) provides many options for inserting protein-coding sequence into target genes. However, other uses for homology-independent knock-in can be imagined, such as inserting enhancer sequences (e.g. UAS) upstream of endogenous genes to induce their overexpression (Rorth 1996), a reporter gene near non-coding regulatory sequences to capture the transcriptional expression pattern of neighboring genes (Brand and Perrimon 1993), entire genes into intergenic sequence (Sadelain et al. 2011), or sequences to be used for labeling DNA loci (Robinett et al. 1996). Furthermore, the donor plasmids described in this study could be used to simply knock out endogenous genes with a selectable marker. Indeed, all of our mNeonGreen-expressing single cell cloned lines contain mutations in the non-knock-in locus and our fly germ line insertions produced loss of function phenotypes. This approach could greatly increase the efficiency of selecting knock-out alleles, which are traditionally done by laborious PCR-based screening of frameshift in/dels. We also note that, similar to the T2A-Gal4 reporters in vivo, cell lines could be targeted with translational reporters such as NanoLuciferase or GFP. Finally, since our collection of CRISPaint donor plasmids contain enzyme restriction sites that flank the insert sequence, they are also useful as parental vectors for constructing traditional HDR donor plasmids.
In summary, our homology-independent knock-in approach enables researchers to focus more effort on screening for correct insertions in cells or flies than on designing and constructing donor plasmids. Furthermore, the techniques required for screening knock-ins are less specialized than those for constructing donor plasmids, making this trade off potentially attractive for labs with less molecular biology expertise or resources. Therefore, we hope that this method will put knock-in technology into the hands of more researchers due to its simplicity.
Acknowledgements
We thank Jonathan Schmid-Burgk for advice and the CRISPaint-mNeonGreen donor plasmid, Claire Hu and the TRiP for sgRNA design and construction, Ben Ewen-Campen for valuable comments on the manuscript, Stephanie Mohr, Oguz Kanca, and Hugo Bellen for helpful discussions, Raghuvir Viswanatha for the Cas9-T2A-EGFP template sequence, and Rich Binari and Cathryn Murphy for general assistance. J.A.B. was supported by the Damon Runyon Foundation. This work was supported by NIH grants R01GM084947, R01GM067761, R24OD019847. N.P. is an investigator of the Howard Hughes Medical Institute.