Abstract
Genome recoding will provide a deeper understanding of genetics and transform biotechnology. We bypass the reliance of previous genome recoding methods on site-specific enzymes and demonstrate a rapid recombineering based strategy for writing genomes by Stepwise Integration of Rolling Circle Amplified Segments (SIRCAS). We installed the largest number of codon substitutions in a single organism yet published, creating a strain of Salmonella typhimurium with 1557 leucine codon changes across 200 kb of the genome.
The next widely anticipated breakthrough in genetic engineering is the ability to rapidly rewrite the genomes of industrially relevant microbes, plants, and animals. Rewriting entire genomes will deepen our understanding of the genetic code and dramatically transform human health, food and energy production, and our environment1–5. A major challenge identified by the Genome Project-Write consortium is the efficiency of building and testing large modified genomes1. In particular, genome recoding involves synonymous replacement of all instances of specific codons throughout an entire genome2, requiring efficient assembly of large constructs containing thousands of designed base changes3. New foundational technologies are therefore crucial for accelerating the pace of genome synthesis and modification.
No effective strategy has previously been demonstrated for recoding large contiguous genomic regions with high frequency codon substitutions. Recent efforts in Escherichia coli recoded independent 20-50 kb genomic regions, using site-specific integrases3 or Cas9 endonucleases6 that require additional time-consuming cloning steps. Alternative editing methods such as multiplex automated genome editing7,8 are also not sufficiently high throughput when thousands of codon changes must be made. Landmark work on constructing a synthetic yeast genome4 is enabled by the high efficiency of native homologous recombination which is specific to yeast, and does not apply to most other organisms.
Our rapid genome recoding approach leverages direct iterative recombineering9,10 and bypasses the reliance on site specific enzymes. In this work, we use Stepwise Integration of Rolling Circle Amplified Segments (SIRCAS) to accumulate 1557 codon changes in 176 genes across 200 kb of the Salmonella entericaserovarTyphimurium LT2 genome (Figure 1). This strain contains the largest number of codon substitutions in a single recoded organism published to date. A recoded designer S. typhimurium could have diagnostic and therapeutic applications in thehuman body11–13.
To demonstrate the power of our recoding approach, we replaced leucine codons due to their high frequency and redundancy in the Salmonella genome14. We computationally generated a recoded S. typhimurium genome in which all 33229 TTA/TTG leucine codons were replaced with synonymous CTA/CTG codons (Supplementary Note 1). In addition to recoding, 387 kb ofpseudogenes, mobile elements and pathogenicity islands were removed to reduce genome size and instability, and 754 restriction sites for the enzyme LguI were removed to facilitate downstream cloning. From this design, we constructed 16 recoded segments (A1-A13 and B1-B3) constituting a 154 kb region A and a 46 kb region B (Figure 1a, Supplementary Note 2 and Data 1). Each segment contained 10-25 kb of recoded DNA, a selection marker, and 1 kb flanking homology regions for integration. The 10-25 kb size range was chosen to simplify construction, decrease the likelihood of unwanted internal recombination events, and minimize the cost of fixing an error in any one segment. Each segment was assembled in yeast from commercially synthesized 2-4 kb DNA fragments (Figure 1b)15. SIRCAS uses a marker swapping approach alternating between chloramphenicol and kanamycin selection (Figure 1c) for a simple phenotypic readout, similar to the strategy for building chromosomes in S. cerevisiae4.
We bypassed the use of bacterial plasmids and associated cloning steps by amplifying each recoded segment directly from yeast using rolling circle amplification16 and linearizing the resulting concatemer by LguI digestion to obtain microgram quantities of DNA for direct integration (Figure 1b). Additionally, carrying each segment on a bacterial plasmid would have required additional negative selection against the plasmid backbone to distinguish between the desired integration event vs the plasmid simply existing as an extrachromosomal replicative element3,6.
To create a S. typhimurium strain with high recombination efficiency, we constructed an integrated recombineering element (IRE) containing the lambda red genes under arabinose-inducible control9 (Supplementary Note 3). Targeted integration of the IRE to the hsd locus simultaneously removed the native hsd restriction system17 which could otherwise have impeded transformation (Figure 1a).
We successfully assembled 200 kb of recoded genome in a single strain of S. typhimurium. Hundreds of marker positive colonies were typically obtained after each round SIRCAS (Supplementary Figure 1). No colonies were obtained when arabinose or DNA was omitted. The proportion of colonies with correctly swapped markers ranged between 3-41% (median 14%, Supplementary Figure 1 and Table 1), presumably due to differences in marker integration locus18, as well as the size and content of the recoded DNA. Between each round of SIRCAS, we briefly checked for unwanted internal recombination events with Sanger sequencing. In 83% of all Sanger sequenced colonies over 16 rounds of SIRCAS, unwanted recombination events were not observed (Supplementary Table 2). Each round of SIRCAS required two days to complete (Supplementary Table 3), and only correct recombinants were used for further rounds of SIRCAS. After recoded regions A and B were assembled in two S. typhimurium strains, a conjugation step transferred region A into the strain carrying region B19 (Supplementary Note 4). Whole genome sequencing of the final strain confirmed perfect recombination of each recoded segment across the entire 200 kb.
We analyzed the error rates of SIRCAS, comparing commercial synthetic DNA obtained from clonal and non-clonal sources (Supplementary Note 5 and Data 1). Within the 156 kb of recoded regions that was written with clonal sequence-verified DNA, an overall error rate of 1 in 20000 was observed (7 point mutations and one leucine codon reversion in 200 kb). In comparison, Ostrov et al.3 reported an error rate of 1 in 5000 (an average of 9.7 mutations and 0.6 codon reversions in 50 kb). For the 44 kb of recoded regions written using non-clonal DNA, an overall error rate of 1 in 860 was found (51 errors, primarily single point deletions). Use of clonal DNA for SIRCAS is therefore preferable, producing an error rate that is competitive with that of other genome recoding methods.
Despite the vast number of changes introduced by recoding, doubling times were similar across all strains at 37 °C in LB (Figure 2, Supplementary Note 6). This result demonstrates that the Salmonella genome is amenable to large-scale recoding. Sequencing did not reveal any compensatory mutations in non-recoded regions of the genome (Supplementary Data 1).
In summary, SIRCAS is a rapid genome recoding method that does not have site-specific enzyme requirements. By integrating 20 kb recoded segments every two days, it is possible in one month to recode 300 kb in a single strain and an entire 4.5 Mb Salmonella genome by SIRCAS in 16 parallel strains. With this recoding method, we can achieve genetic containment20 of engineered Salmonella for therapeutic applications, and create strains for expressing proteins containing unnatural amino acids21. Beyond recoding, the use of SIRCAS to rapidly and precisely rewrite entire gene clusters enables interrogation of Salmonella genetics on a scale that is not possible with traditional techniques. Importantly, SIRCAS is not limited to Salmonella, but should be applicable in any recombineering compatible host, and has utility for many large-scale genome engineering applications, such as rapid integration of entire de novo designed biosynthetic pathways into industrial production strains.
METHODS
Methods are described in the supplementary methods file.
AUTHOR CONTRIBUTIONS
The overall study was conceived by PAS and JCW. Experimental design was conducted by YHL, FS, JK, MAPK and AR. The computational design of the recoded genome and other computational analyses were performed by MAPK, AR and YHL. Assembly and RCA of DNA constructs was performed by YHL, FS, CAH, ES and DL. The protocols for DNA assembly and RCA were established through guidance and troubleshooting from MTW, DGG and YAC. Genomic integration (SIRCAS) was conducted by YHL and CAH. Conjugation experiments were performed by JK. Sequence analysis was conducted by YHL and FS. Growth data for recoded strains was obtained by YHL. All authors contributed to writing the manuscript.
FUNDING
This project was supported by funding from DARPA (BRICS HR0011-15-C-0094). YHL acknowledges the Wellcome Trust for the award of a Sir Henry Wellcome postdoctoral fellowship (107402/Z/15/Z). AR acknowledges funding from a DOE CSGF fellowship (DE-FG02-97ER25308).
COMPETING FINANCIAL INTERESTS
DGG is co-founder of SGI-DNA and Vice President of DNA Technologies at Synthetic Genomics, Inc., and MTW is an employee of Synthetic Genomics, Inc.
ACKNOWLEDGEMENTS
We thank Eric Kofoid and Natalie Duleba from the laboratory of Prof. John Roth (Department of Microbiology, University of California Davis) for the kind gift of the recombineering Salmonella strain TT22971 containing plasmid pKD46. We also thank Prof. George Church (Department of Genetics, Harvard Medical School) and members of his laboratory for general discussions about genome recoding.