Abstract
Mutation accumulation experiments followed by whole-genome sequencing have revealed that for several bacterial species the rate of base-pair substitutions is not constant across the chromosome but varies in a wave-like pattern symmetrical about the origin of replication. The experiments reported here demonstrate that in Escherichia coli several interacting factors determine the wave. Perturbing replication timing, progression, or the structure of the terminus disrupts the pattern. Biases in error-correction by proofreading and mismatch repair are major factors. The activities of the nucleoid binding proteins, HU and Fis, are important, suggesting that mutation rates increase when highly structured DNA is replicated. These factors should apply to most bacterial, and possibly eukaryotic, genomes, and imply that different areas of the genome evolve at different rates.
Author Summary In several species of bacteria the rate of single base-pair mutations is not constant across the genome, but varies in a wave-like pattern that is symmetrical about the origin of replication. Using Escherichia coli as our model system, we show that this wave is determined by the timing and progression of replication and the structure of the region where replication terminates. In addition, biases in error correction and the three dimensional structure of the DNA also are important. These factors should apply to most bacterial, and possibly eukaryotic, genomes, and imply that different areas of the genome evolve at different rates.
Introduction
Recent studies of mutations accumulated non-selectively across bacterial chromosomes have revealed that rates of base-pair substitutions (BPSs) vary 2-to 4-fold in a wave like pattern that is mirrored in the two independently replicating halves of the chromosome. Such symmetrical patterns have been observed in mismatch-repair (MMR) defective strains of Escherichia coli (1), Vibrio fischeri, V. cholerae (2–4), Pseudomonas fluorescens (5), and P. aeruginosa (6). Such variation in mutation rates may affect the rate at which genes in different regions of the chromosome evolve, and may exert selective pressure on gene placement. Yet the causes of this variation are not known.
The fidelity of DNA replication, which, in E. coli, is about 1 mistake in 1000 generations (7), is determined by the intrinsic accuracy of the DNA polymerase and error correction by proofreading and mismatch repair (reviewed in (8, 9). In E. coli, proofreading is performed by epsilon, a subunit of the DNA polymerase III holoenzyme. If the polymerase inserts the incorrect base, epsilon’s 5’ to 3’ exonuclease activity degrades a few bases of the new strand and polymerase then re-synthesizes it. The accuracy of DNA synthesis is improved about 4000-fold by proofreading (10). Mismatch repair is performed by three proteins, MutS, MutL, and MutH. MutS recognizes a mismatch and recruits MutL. Together they find a nearby GATC site in which, in E. coli, the A is methylated by the Dam methylase. Because methylation lags behind replication, unmethylated As identify the “new”, and presumably error-containing, DNA strand. MutH is recruited by MutSL to the GATC site and activated to nick the unmethylated DNA strand, which is then degraded past the mismatch by the concerted activity of the UvrD helicase and one of four exonucleases. Pol III then re-synthesizes the strand. MMR improves the accuracy of DNA replication 100 to 200 fold (11).
In our previous study of the BPS density pattern in MMR-defective E. coli (1), we correlated mutation rates to the chromosomal sites that are affected by two nucleoid-associated proteins (NAPs), HU and Fis. We suggested that when the replication fork encounters regions of the chromosome with high superhelical density due to the binding of these NAPs, the mutation rate increases. An alternative explanation ties mutation rates to replication timing (2, 3). An intriguing hypothesis is that mutation rates vary in concert with fluctuations in dNTP concentration when the replication origin fires repeatedly during rapid growth (2).
In the work presented here, we investigate further the causes of the wave-like pattern of BPS rates. MMR-defective strains of E. coli additionally defective for other activities were used in mutation accumulation (MA) experiments and the mutations identified by whole genome sequencing (WGS). We also investigated the effects of different growth conditions on BPS rates. We conclude that the BPS density pattern does not have a single cause, but is the result of several factors affecting DNA replication, repair, and chromosome structure. In addition, we report that a MMR-defective Bacillus subtilis also has a wave-like BPS density pattern that is symmetrical about the origin of replication.
Results
The base-pair substitution (BPS) density pattern in mismatch repair-defective strains
In a previous paper (1) we demonstrated that the density of BPS that accumulated across the chromosome in a MMR-defective strain during an MA experiment fell into a wave-like pattern that was symmetrical about the origin. We have since performed nine additional MA experiments with MMR mutant strains, each experiment resulting in the accumulation of over 1000 BPSs, for a total of 30,061 BPSs (11). As shown in Figure 1A, the wave-like BPS density pattern was reproducible among the 10 experiments. Also as shown in Figure 1A, the mutational density pattern does not exactly match the non-interacting chromosomal macrodomains (MCs) as defined by Valens et al, 2004 (12). In particular, the wave pattern of BPSs is symmetrical about the origin whereas the Ori MC is not.
Following the analysis of Dillon et al, 2018 (2), we computed the wavelet coherence of the collected MMR− data taken in the clockwise and the counterclockwise directions around the chromosome; supplementary Figure S1A shows that, except for some asymmetry at the midpoint, the wave is symmetrical across the chromosome. The coherence is greatest between 800 to 1,600 Kb per cycle (= 8 to 16 bins per cycle), which is a similar result to that found by Dillon et al, 2018 (2). We also computed the wavelet coherence of the collected MMR− data against each of the experimental results discussed below, and these graphs are also given in the supplementary figures In the analyses below, the combined data from the 10 experiments with MMR-defective strains are used as the standard to which the results obtained in other genetic backgrounds are compared.
The spectrum of BPSs in the MMR-defective strains is dominated by A:T transitions at 5’NAC3’/3’NTG5’ sites (11). To test if the mutational density pattern was simply due to the distribution of these sites, we removed all the A:T transitions at 5’NAC3’/3’NTG5’ sites from the data set. As shown in supplementary Figures S5A, S6A, and Tables 1 and 2, although the mutation rate was reduced by 60% when these BPSs were removed, the remaining 12,542 BPSs fell into the same wave pattern. Thus, the variation in BPS rates must be reflective of regions of the chromosome and not the distribution of the hotspot DNA sequences.
Transcription
One obvious hypothesis is that the mutational density pattern reflects transcriptional patterns. Mutation rates have been reported to be both increased (13) and decreased (14) by high levels of transcription. In a previous paper (7) we reported that highly expressed genes had normal mutation rates, a finding that was confirmed in a recent study using deep sequencing to detect mutations (15). To determine if, nonetheless, transcription influences the wave pattern, we used RNA-Seq to quantitate the RNA levels in a ΔmutL mutant strain in the lag, exponential, and stationary phases of growth. These results should be representative of the cells in colonies during our MA experiments. We then compared these results to the BPS density pattern by binning the RNA-Seq results into the same bins as used for mutational data. As shown in Figure 1B and supplementary Figure S1B, there was no similarity in the two data sets.
Because ribosomal operons are homologous, we could not call SNPs in these genes. The RNA reads from the genes in ribosomal operons were also removed from the RNA-Seq data, but their positions have been indicated in Figure 1B. Interestingly, bin 35, which includes the ribosomal rrnG operon, has a high number of RNA reads even in the absence of reads from the rrnG genes. This high level of expression is almost exclusively due to the ssrA gene, which encodes transfer-messenger RNA (16), that is highly expressed under all three conditions (the RNA-Seq data will be further analyzed in a subsequent paper). The BPS density pattern of several of the strains that will be discussed below tends to reach a minimum in this general area, but it is as often at bin 33 as at bin 35. Given that the bin size is 100 Kb, it seems unlikely that the high level of transcription level of ssrA, located in the middle of bin 35, is causing these patterns.
Replication Initiation
The BPS density pattern is centered on oriC, the origin of replication, and over many experiments in different genetic backgrounds the pattern around the origin has proved to be stable. In both replichores BPS rates decline to a minimum about 300 Kb from the origin and then increase until a peak is reached about 900 Kb from the origin. Mutation rates then fall again and reach a minimum about 3/5th of the distance along each replichore.
We tested whether replication initiation was responsible for the maintenance of this pattern by performing MA experiments with strains with errant replication start sites. The rnhA gene encodes RNase H1, which degrades RNA-DNA hybrids; in the absence of RNase H1, persistent R-loops can initiate aberrant DNA replication and disrupt normal fork migration (17, 18). But an MA experiment with a ΔrnhA ΔmutL mutant strain showed no difference in the BPS density pattern from that of the ΔmutL mutant strain (Figure 1C, Supplementary Figure S1C, Tables 1 and 2), indicating that aberrant replication Initiation does not influence where mutations occur, at least not when a powerful oriC is present.
To further test the influence of replication initiation on the mutational density pattern, we performed MA experiments with strains that have a 5.1 Kb region containing oriC moved to the midpoint of the right replichore, where it is called oriZ (19). These strains are derived from E. coli K12 strain AB1157, instead of MG1655 strain, the ancestor of our MA strains, and have a large inversion in the right replichore that relieves the head-on collision between replication initiating at oriZ and transcription of the rrnCABE operon (20). As a control we created an AB1157 ΔmutL mutant strain, which had the similar wave-like BPS density pattern as our MMR− strains, indicating that the pattern is intrinsic to E. coli K12, not specific to sub-strain MG1655 (Supplementary Figures S5B, S6B, Tables 1 and 2). We performed MA experiments on ΔmutL derivatives of two additional strains: one with both oriZ and oriC (WX320 ΔmutL), and one with only oriZ (WX340 ΔmutL). The strain that contained two origins had a similar BPS density pattern as the MMR− strains (Figure 1D, Supplementary Figure S1D, Tables 1 and 2), suggesting that, under our conditions, firing of oriZ could not overcome the influence of oriC. However, the strain containing only oriZ showed a decrease in BPS rate in the 200Kb-area surrounding the new origin, similar to that normally observed about oriC (Figure 1E, Supplementary Figure S1E, Tables 1 and 2).
SeqA
As mentioned above, in E. coli the adenines in GATC sites are methylated by the Dam methylase. The SeqA protein binds to hemimethylated GATC sites, many of which are clustered around OriC, and, by so doing, SeqA occludes the replication initiation protein, DnaA, hindering origin firing. Sequestering of the origin persists for about one third of a generation; however, the mechanism of relief is not clear. In the absence of SeqA, unregulated initiation presumably results in over-replication, at least when cells are rapidly growing in rich medium (21). Downstream events, such as replication fork collapse, add to the phenotypes of seqA mutant cells (22).
Loss of SeqA affects chromosomal structure in areas distant from OriC. By binding to hemimethylated DNA, SeqA forms complexes behind the replication fork as it progresses around the chromosome (21). In addition, SeqA binds to areas of the chromosome with closely spaced GATC sites, as well as to particular genes regulated by GATC methylation (23). In the absence of SeqA the superhelicity of the chromosome increases, the nucleoid condenses (24), and transcription is altered (25).
To test whether SeqA affects the BPS density pattern, we performed an MA experiment with a ΔmutL ΔseqA mutant strain. As shown in Figure 1F, Supplementary Figure S1F, and Tables 1 and 2, loss of SeqA somewhat amplified the BPS density pattern of the right replichore, but the pattern was still highly correlated to that of right replichore of the MMR-defective strains. However, the pattern in the left replichore was disrupted in the ΔseqA ΔmutL mutant strain. Based on chromatin immunoprecipitation analysis, this area of the left replichore is not targeted by SeqA to a greater extent than the same area of the right replichore (23, 26), suggesting that the disruption of the mutational pattern is not due to loss of binding by SeqA. As shown in Figure 1B, bins 44, 38, 37, and 35 contain a number of highly expressed genes; in addition, bins 42 and 35 contain highly transcribed ribosomal RNA genes. Thus, we hypothesize that loss of SeqA makes the replication machinery particularly susceptible to interference by transcription, disrupting the mutational pattern. If this hypothesis is correct, the interference apparently makes replication more accurate, perhaps by slowing the speed of DNA polymerase.
Replication Fork Progression
In a recent report, Dillon et al., 2018 (2), hypothesized that the periodic variation in mutation rates across the chromosome was tied to the timing of DNA replication. Cellular levels of dNTPs are controlled by ribonucleotide reductase (RNR), the expression of which increases when the origin fires (27, 28) }. In fast-growing bacteria in which new rounds of replication are initiated before cell division, the levels of dNTPs should be high when each origin fires, but then fall as progression of the multiple forks dilute the dNTPs. High levels of dNTPs are predicted to increase the probability of misincorporation, and thus the mutation rate, whereas low levels of dNTPs should slow replication and improve fidelity. This systematic fluctuation in dNTP levels could account for the pattern of mutational density across the chromosome (2).
This hypothesis can be tested by examining the wave pattern from several experiments that we have already published (11). The number of replicating chromosomes is a positive function of the cell’s growth rate (29). When ΔmutL or ΔmutS mutant strains were grown on glucose minimal medium, which reduces the growth rate about 2-fold relative to growth on LB, the BPS density pattern became chaotic (Figure 2A, Supplementary Figures S2A, S5C, S6C, Tables 1 and 2). Supplementing the minimal medium with just enough LB to increase the growth rate to normal restored the wave-like BPS density pattern (Figure 2B, Supplementary Figure S2B, Tables 1 and 2). Growing the cells on diluted LB, on which the growth rate was the same as on minimal glucose medium, also preserved the wave-like BPS density pattern (Figure 2C, Supplementary Figure S2C, Tables 1 and 2). When the cells were grown at a lower temperature, which also reduced the growth rate 2-fold, the overall shape of the BPS density pattern was retained, but the increases in BPS rates that normally peak at about 900 Kb from each side of the origin were shifted about 200 Kb further from OriC (Figure 2D, Supplementary Figure S2D, Tables 1 and 2). In addition, the magnitude of the fluctuation of the BPS rate across the chromosome was doubled. Thus, it appears that growth rate per se is not a major determinant of the BPS density pattern, but other factors, such as the composition of the growth medium, are also important. For example, the expressions and DNA binding characteristics of nucleoid associated proteins (NAPs) are different under different growth conditions (30–32)
In E. coli growing aerobically dNTP levels are regulated by a Class Ia RNR encoded by the nrdA and nrdB genes. Although most of this regulation appears to be due to DnaA interactions (28), the NrdR repressor, which regulates a poorly expressed Class Ib RNR, also regulates nrdAB transcription; loss of NrdR results in increased expression of RNR throughout the cell cycle (33). Increased RNR should result in increased dNTP levels, and, indeed, the ΔnrdR ΔmutL mutant strain had twice the mutation rate as the MMR− strains, as expected when dNTP levels are high (Supplementary Table S1). As shown in Figure 2E, Supplementary Figure S2E, Tables 1 and 2, loss of NrdR did not change the basic BPS density pattern, but the peak rate was shifted away from the origin about 200 Kb on each side. This pattern was similar to that observed when cells were grown at low temperature, as described above.
Replication fork progression is aided by the accessory replication helicase, Rep, and, in its absence, the time required for chromosome duplication is doubled (34). Rep removes proteins bound to the DNA in front of the fork (34, 35). While these nucleoproteins are primarily transcription complexes (36, 37), Rep could also free the DNA of blocking NAPs. In addition, Rep aids in restarting replication forks after they stall or collapse (38, 39). As shown in Figure 2F, Supplementary Figure S2F, Tables 1 and 2, with the exception of a region close to the origin, loss of Rep disrupted the BPS density pattern across the chromosome, suggesting that slowing or stalling the fork results in a distribution of BPS that is essentially random.
Replication Termination
The results from almost all the strains tested show an increase in the BPS rate in the region where replication terminates. The pattern of this increase varies somewhat among experiments. Usually there are two unequal peaks, as shown in Figure1A, but in some experiments these peaks are better defined and of equal heights and occasionally there is just one peak. We do not know the source of this variation, but suspect it is simply random noise.
Replication terminates approximately 180° from the origin in a 1200 Kb region bounded by replication pause (Ter) sites; this region extends from bin 18 to bin 31 in our figures. The anti-helicase Tus protein binds to the Ter sites and allows each replication fork to enter but not to exit, creating a replication fork “trap”, within which the two forks fuse and the chromosome dimer is resolved (40). To determine if the interaction of replication forks with Tus contributes to the increased mutation rate within this region, we performed an MA experiment on a Δtus ΔmutL mutant strain. As shown in Figure 2G, Supplementary Figure S2G, Tables 1 and 2, loss of Tus did not affect the BPS density pattern.
The Ter macrodomain (MD) extends from 1200 Kb to 2200 Kb (12), which is roughly from bin 20 to bin 28 in our figures. The structure of the Ter MD is maintained by the MatP protein, which binds to 23 matS sites within this region (41). In the absence of MatP, the Ter MD is disorganized, the DNA is less compact, and the Ter MD segregates too early in the cell cycle and fails to localize properly at midcell (41, 42). Because the mobility of the Ter MD is increased in the absence of MatP, DNA interactions across MD barriers can occur in ΔmatP mutant cells (41).
As shown in Figure 2H, Supplementary Figure S2H, Tables 1 and 2, loss of MatP caused a severe disruption of the BPS density pattern. The mutation rates in the Ter MD were depressed whereas new peaks appeared on either side of the Ter MD in the Right and Left MDs. Interestingly, the BPS pattern near the origin was maintained in the right but not in the left replichore.
Recombination and the SOS response
Homologous recombination is intimately connected to replication. As replication proceeds, various blocks, such as DNA lesions, transcription complexes, and DNA secondary structures, can cause the replisome to pause and to eventually disassemble. This potentially lethal event is prevented by recombination, which can repair and restart the replisome (43). In addition, the termination region is subject to hyperrecombination (44, 45) particularly in the region bounded by TerA and TerB (our bins 21 to 24), named the terminal recombination zone (TRZ) (46, 47).
Elimination of E. coli’s major recombinase, RecA, had a modest effect on the mutational density pattern. As shown in Figure 3A, Supplementary Figures S3A, S5D, S6D, Tables 1 and 2, in the recA mutL and recA mutS mutant strains, BPS rates declined in bins 21 to 25, corresponding fairly well to the TRZ described above. However, the hyperrecombination within the TRZ is dependent on the recombination pathway defined by participation of the RecBCD complex. When we eliminated RecB, the mutational density pattern did not phenocopy that seen when RecA was absent (Supplementary Figures S5E, S6E, Tables 1 and 2). Either our protocol is not sensitive enough to detect an effect of loss of RecB, or another recombination pathway, e.g. RecFOR (48), is sufficient to maintain the mutation rate in the region.
In addition to its role in recombination, RecA is also a master regulator of the SOS response to DNA damage, which includes the induction of two error-prone DNA polymerases, DNA Pol IV and V. To test whether these polymerases are involved in determining the BPS density pattern, we performed an MA experiment on a strain deleted for the genes that encode Pol IV, dinB, and PolV, umuDC. As shown in Figure 3B,Supplementary Figure S3B, Tables 1 and 2, the BPS density pattern in the mutL dinB umuDC mutant strain was not significantly different than the MMR− pattern. The genes of the SOS response are repressed by the LexA protein; the lexA3 allele encodes a super-repressor LexA protein that prevents the SOS genes from being induced (49). When this allele was present the BPS density pattern was also unaffected (Figure 3C, Supplementary Figure S3C, Tables 1 and 2). Thus, the SOS response appears to play no role in determining the pattern of BPSs across the chromosome.
Nucleoid Associated Proteins
In a previous report (1), we found that the BPS density pattern of a ΔmutL strain was correlated with the density of genes activated by the HU protein and repressed by the Fis protein. Combining these two factors in a linear correlation model accounted for 33% of the variation in the mutational data. HU constrains supercoils and compacts the DNA into nucleosome-like particles; Fis also constrains supercoils but, in addition, bends the DNA (31). While both of these NAPs affect transcription, the general lack of correlation of the BPS rate with transcriptional levels (7); also see above) led us to hypothesize that mutation rates across the chromosome were correlated not with transcription per se, but with areas of high DNA structure (1). To further test this hypothesis we preformed MA experiments with MMR− mutant strains also defective for each of a number of NAPs.
HU exists as a dimer of its two subunits, HUα and HUβ, encoded by the paralogous genes hupA and hupB, respectively, in the three possible configurations. While loss of both subunits confers a severe growth defect, loss of only one has little consequence during a normal growth cycle, suggesting they can substitute for each other. HUαβ is the dominant form over most of the cell cycle, but significant amounts of HUα2 are found during lag phase and early exponential phase, and HUβ2 is prominent in stationary phase (30). Chromatin immunoprecipitation sequencing (ChIP-Seq) results revealed that HU binds non-specifically to the chromosome and the DNA binding patterns of the three dimers appear to be identical (50).
We performed MA experiments with both ΔhupA ΔmutL and ΔhupB ΔmutL mutant strains. As shown in Figures 3D, 3E, Supplementary Figures S3D, S3E, Tables 1 and 2, although loss of hupB appeared to affect the BPS density pattern, the differences from the MMR− pattern were not significant. However, loss of hupA depressed mutation rates across the chromosome, particularly in the terminus region, while creating new peaks on either side of the Ter MD.
Fis has many binding sites across the chromosome, but, based on data at RegulonDB, Sobetzko et al. (2012) (51) reported more Fis binding sites in the origin region. However both Chromatin immunoprecipitation plus microarray analysis (ChIP-chip) and ChIP-Seq studies found the density of Fis binding to be more-or-less constant across the chromosome (32, 52). Loss of Fis in both Δfis ΔmutL and Δfis ΔmutS mutant strains tended to flatten the BPS density pattern across the chromosome except in the region around the origin (Figure 3F, Supplementary Figures S3F, S5F, S6F, Tables 1 and 2).
The NAP HNS binds to DNA at its high-affinity binding sites and then spreads by oligomerization along A:T rich regions of DNA. Bridging between HNS-DNA complexes condenses the DNA into a few clusters per chromosome (53–55). However, as shown in Figure 3G, Supplementary Figure S3G, Tables 1 and 2), loss of HNS had little effect on the BPS density pattern, and, thus, the long-range structures produced by HNS appear not to affect BPS rates.
The DPS protein accumulates in stationary phase cells, condenses the nucleoid into a crystalline-like state, and protects the DNA from oxidative and other damage (56). Despite this radical physical change, loss of DPS had little effect on the BPS density pattern Figure 3H, Supplementary Figure S3H, Tables 1 and 2). Of course, we do not know the degree to which cells in stationary phase contribute to the BPS rates under our experimental conditions.
Proofreading
As mentioned above, epsilon is the proofreading subunit of the DNA polymerase III holoenzyme. The mutD5 allele encodes an epsilon protein that is inactive for proofreading, and strains carrying this allele have a mutation rate 4000-fold greater than that of wild-type strains, and 35-fold greater than that of MMR-defective strains (10). As shown in Figure 4A, Supplementary Figure S4A, Tables 1 and 2, when proofreading was inactive but MMR was active, the BPS density pattern was less dramatic than when MMR was inactive and proofreading was active; but, nonetheless, the wave pattern was basically the same. When both MMR and proofreading were inactive, which reveals the mutations solely due to replication errors, the BPS density pattern was nearly flat but around the origin it retained significant correlations to the patterns of both the MMR-defective and mutD5 mutant strains (Figure 4B, Supplementary Figure S4B, Tables 1 and 2). Thus, the pattern of BPS on both sides of the origin appears to be established by replication errors and then elsewhere across the chromosome the density pattern is largely due to differential error-correction by both proofreading and MMR.
Wild-type
The mutational density wave patterns evident in our data, and in data from other bacteria (2, 3), were obtained when MMR was inactive. Thus, the question arises: does the pattern appear in wild-type strains? It is difficult to answer this question because mutation rates in wild-type strains are so low (in E. coli, 120-fold lower than that of MMR-defective strains (7, 11) that enormous experiments would have to be conducted in order to accumulate enough mutations to approach statistical confidence. In a recent study we compared the mutation rates and spectra of a number of E. coli strains defective in various DNA repair activities; of these, the results from seven strains were indistinguishable from those of the wild-type parent (57). By combining the mutations from these strains, we achieved 1933 BPSs (11), enough to expect to see a wave pattern if it existed. As is evident in Figures 4C, 4D, Supplementary Figures S4C, S4D, Tables 1 and 2, these BPSs did not create fall into a recognizable pattern. Indeed, the pattern from the wild-type strains appears to be random; the variance to mean ratio of the binned mutations is 1.4, indicating the values are not disperse, and the MatLab “runstest”, a test for runs, returns a P value of 0.58, also indicating that the bin values are random. In the wild-type strain both MMR and proofreading are active, and, while these two activities have similar correction biases (Figure 4A), proofreading is much more powerful, producing the pattern seen in the MMR-defective strains. Although we cannot conclude that the mutational density pattern in the wild-type strain is other than random, it does have similarity to both the patterns seen in the MMR-defective strains and in the mutD5 mutant strain, particularly around the terminus (Figures 4C, 4D, Supplementary Figures S4C, S4D and Tables 1 and 2).
Bacillus subtilis
In additional to E. coli strains, symmetrical mutational density patterns have been demonstrated in MMR-derivatives of Vibrio fischeri, V. cholera (2–4), Pseudomonas fluorescens (5), and P. aeruginosa (6). Here we add Bacillus subtilis to this list. As shown in Figure 4E, Supplementary Figure S4E, Tables 1 and 2, the BPS mutation rates in a B. subtilis mutS::Tn10 mutant strain fell into a wave like pattern that was symmetrical about the origin. Although similar in shape, the pattern was significantly different from that of E. coli (Figure 4E, Supplementary Figure S4F, Tables 1 and 2). However, as in E. coli, the BPS rate appeared to increase in the terminus region, which in B. subtilis is not 180° from the origin and corresponds to bins 20-24 in Figure 4E.
Discussion
In this report we have examined a number of factors that could be responsible for establishment and maintenance of the symmetrical wave-like BPS density pattern across the chromosome. In broad terms these factors were: transcription; DNA replication initiation, progression, and termination; recombination and the SOS response to DNA damage; the binding of nucleoid-associated proteins; and, error-correction by MMR and proofreading. As discussed above, we found that transcription and the SOS response had little effect, and the effect of recombination was modest and confined to the terminal region. We discuss the more significant, factors in greater detail here.
DNA replication initiation, progression, and termination
Providing additional replication origins, either by eliminating RNase H1 or by inserting an ectopic oriC (oriZ), did not disrupt the wave (Figure 1C and 1D). However, when oriZ was the only origin of replication, the region of depressed BPS rate that surrounds oriC when it is in the normal position was re-established about the new origin (Figure 1E). Note that only 5.1 Kb of DNA containing oriC was relocated (19), whereas the region of reduced mutation rate is about 200 Kb; thus, the mutation rate is not determined just by the DNA sequence surrounding the origin. We hypothesize that the process of replication initiation protects the DNA from damage and/or newly established replication forks have a low error-rate. In addition, the BPS mutation rate was increased for about 1000 Kb (10 bins) on either side the new origin so that it resembled the same region about the normal origin. The size of this area is close to the same size detected as “interacting zones” around oriZ (58). However, the mutational density pattern across the rest of the chromosome did not re-establish itself to be symmetrical about oriZ, but remained symmetrical about the absent oriC. Thus, other factors must be important at distant regions. We can also conclude that the overall structure of the BPS pattern across the chromosome is not determined by active replication initiation per se, but may have evolved in response to replication initiation.
Both V. fischeri and V. cholera have multiple circular chromosomes of different sizes; by comparing the mutational density patterns of these chromosomes, Dillon et al, 2018 (2) identified the timing of replication as a significant determinant of the mutational density patterns. As mentioned above, they suggested that the pattern could be the result of variations in the levels of dNTPs as origins fire during rapid growth. Our results provide partial support for this hypothesis, but also indicate that growth on rich medium, not growth rate per se, is a significant determinant of the mutational density pattern, possibly because of effects on the expression and DNA binding of HU and Fis (see below). Our results with a ΔnrdR mutant strain (Figure 2E) also show that dNTP levels are important, but, again, other factors are driving the overall wave pattern. In addition, the results with the Δrep mutant strain (Figure 2F) indicate that radically interrupting the progression of the replication fork disrupts the mutational density pattern.
After declining to a local minimum about 3/5th of the distance along each replichore, mutation rates rise in the terminus region (Figure 1A). We originally hypothesized that this increase was due to collisions of the replication complexes with the Tus anti-helicase, most of which would take place in the region between bins 18 and 31 (1). However, elimination of Tus had no effect on the wave pattern (Figure 2G, Supplementary Figure S2G), refuting this hypothesis. But, eliminating MatP, the protein that maintains the structure of the Ter MD, reduced mutation rates in the Ter MD, suggesting that replication of highly structured DNA is error-prone (Figure 2H, Supplementary Figure S2H). Interestingly, in ΔmatP mutant cells mutation rates increased in the Right and Left MCs, suggesting that in the absence of MatP, these adjacent regions of DNA gained structure. The mechanism by which MatP structures the Ter MC is not clear (59, 60), but it probably involves supercoils which, when unconstrained, could migrate into adjacent areas. However, two studies have found that loss of MatP increases the mobility and long-range interactions of adjacent DNA (41, 61). Resolution of these conflicts will require further experimentation.
Nucleoid-associated proteins
Our previous results predicted that the NAPs HU and Fis should play a role in establishing or maintaining the wave-pattern of BPS, but HNS should not (1). The results presented here confirmed that prediction. In addition, we also found that Dps had no effect on the BPS density pattern. Because the NAPs affect gene expression in various ways, we cannot conclude that DNA binding by the NAPs themselves is responsible for the mutational pattern. And, indeed, we found no significant positive correlations between the BPS pattern and published binding sites of the NAPs, although the location of the binding sites themselves vary widely among published results (e.g. see (32, 50) and the data in RegulonDB (62). Nonetheless, we favor the hypothesis that structuring of the DNA by the NAPs, directly or indirectly, contributes to the BPS pattern.
The local effect of HU binding is to bend the DNA, but dimer-dimer interactions produce higher-order HU-DNA complexes that can constrain negative supercoils (63, 64). The analysis of the effect of HU on the mutational density pattern is also complicated by the variation in the cellular concentrations of the three forms with growth cycle (30). Although the two subunits can compensated for each other for viability, the affinities of three forms of HU for various DNA structures (linear, nicked, and gapped) differ (65). Both HUα2 and HUαβ can constrain supercoils, but HUβ2 apparently cannot, at least in vitro (30). Since HUβ is not stable (66), the phenotype of a ΔhupA mutant strain may simply be that of low cellular HU concentration.
While loss of HUβ had little effect on the mutational density pattern (Figure 3D, Supplementary Figure S3D), loss of HUα reduced the overall BPS rate by 33% and also changed the pattern (Figure 3E, A Supplementary Figure S3E). In the ΔhupA ΔmutL mutant strain, mutation rates were low in the terminus region but elevated in the Right and Left MCs, very similar to the pattern seen in the ΔmatP ΔmutL mutant strain (Figure 2H). The low mutation rate in the Ter MD corresponds well with the density of genes activated in wild-type cells but not in HU-deleted cells (67). HU regulates transcription by modifying the DNA superhelical density, suggesting that, like MatP, HU increases the structure of the DNA in the terminal region, causing DNA replication to become error-prone.
The mutational pattern in the ΔhupA ΔmutL mutant strain is also similar to that of the ΔrecA ΔmutL strain (Figure 3A). An mutation accumulation experiment with a ΔhupA ΔrecA ΔmutL mutant strain resulted in a pattern similar to both single mutant strains, and so was not informative (data not shown).
Fis is a major transcriptional regulator, either activating or repressing, directly or indirectly, nearly a thousand genes (52). At some promotors Fis acts as a classic transcriptional regulator by interacting with RNA polymerase, but at other promotors Fis affects gene expression by altering the DNA superhelical density (68). The Fis DNA binding sequence is degenerate and the estimated number of binding sites identified by sequence analysis across the genome varies widely among studies. The results of ChIP analyses also vary: the number of DNA regions associated with Fis in different studies ranges from 200 to 1500, which may reflect the effects of different growth conditions (32, 52, 69).
The loss of Fis nearly doubled the overall BPS rate, but flattened the wave pattern outside of the origin region (Figure 3F, Supplementary Figures S5F, S6F). As mentioned above, the mutational density pattern MMR− strains is correlated to the density of genes activated in a Δfis mutant strain as reported by Blot et al, 2006 (70). This correlation is particularly strong in bins 14 to 33 (ρ = 0.62, P = 0.003), which corresponds to the area flattened in the Δfis ΔmutL and Δfis ΔmutS mutant strains. However, no correlation exists between our mutational data and genes found to be responsive to Fis in a recent study (32). Clearly more studies are needed to resolve these conflicts.
Error-correction
Assuming that the BPS recovered from the mutD5 ΔmutL mutant strain are due to intrinsic errors made by DNA polymerase, we conclude that the polymerase is accurate close to the origin, then becomes increasingly less accurate as replication proceeds to about 1/3 of the replichore, at which point accuracy increases again (Figure 4B). The mutational pattern in the MMR-defective strains must reflect the biased ability of proofreader to correct these polymerase errors. Thus, proofreading is effective close to the origin, but then declines for about 1/3 of the replichore, increases in the Right and Left MCs, but declines again in the terminus region (Figures 1A, 4B). Mutations in the mutD5 mutant strain are due to the failure of MMR to correct the polymerase errors that survive in the absence of proofreading. Since the mutational pattern in the mutD5 mutant strain mimics that of the MMR-defective strains (Figure 4A, Supplementary Figure S4A, Tables 1 and 2), error correction by MMR, although much less powerful, apparently has the same biases as error correction by proofreader.
Given the above considerations, we would expect the mutational density pattern in the wild-type parental strain to mimic that of the MMR-defective strains. However, the nearly 2000 BPS accumulated in eight strains with wild-type mutational phenotypes appear randomly distributed across the chromosome (Figures 4C, 4D). One explanation for this discrepancy is that when both proofreading and MMR are active, the mutation rate is reduced to the point that other, weaker DNA repair and mutagenic activities obscure the underlying pattern. Although not significantly correlated, the wild-type BPS pattern is similar to the MMR− pattern, particularly around the terminus, supporting this hypothesis. Alternatively, although the eight strains that we combined to produce 2000 BPS have the same mutation rates, they may have different wave patterns that tend to negate each other and obscure the underlying wild-type pattern.
Conclusions
By applying mutation accumulation followed by whole-genome sequencing to bacterial strains mutant in various activities we have determined that the most important factors determining the symmetrical pattern of BPS rates across the chromosome are: the initiation, progression, and termination of DNA replication, replication error-correction, and chromosome structure. Because of the conservation of these factors, our results should apply to most bacteria, and possibly eukaryotes, and imply that different regions of the genome evolve at different rates.
Methods
Bacterial Strains and Media
The strains used in this study are listed in Supplementary Table S1 and the methods of their construction are in Supplementary file: Materials and Methods. Standard media and antibiotics were used (see Supplementary file: Material and Methods).
Mutation accumulation experiments
The complete MA procedure has been extensively described (7, 11, 57). Details are given in the Supplementary file: Material and Methods
Genomic DNA Preparation, Library construction, Sequencing, and Sequence Analysis
Details are given in the Supplementary file: Material and Methods.
RNA sequencing
The E. coli strain PFM144, which is PFM2 ΔmutL (11), was grown in LB and aliquots collected during lag (OD = 0.022), log (OD = 0.3), and stationary (OD = 1.5) phase. The number of cells collected was kept constant for each growth phase. Total RNA was extracted, DNA and rRNA were removed, and libraries constructed and sequenced as detailed in the Supplementary file: Material and Methods. Three biological triplicates were prepared for each growth phase.
A complete analysis of the RNA-Seq results will be the subject of a subsequent report. For this report the numbers of RNA-Seq reads for each condition were first normalized to the number of reads mapped to the gene holD, which was determined by rtPCR to be expressed at the same level in all phases of growth. The means of the normalized RNA reads from the triplicates were then binned into the same bins used for the mutational analysis. A fourth-order Daubechies wavelet transform was performed on the binned RNA-Seq reads as described for the mutational data (1).
Statistical Analysis
To obtain the BPS density patterns, the numbers of BPSs were binned into 46 bins, each 100 Kb long, as described (1). A fourth-order Daubechies wavelet transform was performed on the binned mutation data as described (1). For presentation in the figures, these results were converted into rates by dividing the number of BPS by the appropriate number of generations. Pearson’s product-moment correlation coefficient, ρ, was used to evaluate the correlations between the binned BPS data (Table 1). Spearman’s nonparametric correlation coefficient was also computed for a few data sets, but gave similar results. To account for multiple comparisons, p values were adjusted using the Benjamini-Hochberg method (71) with the false discovery rate set at 25%, implemented with the MatLab R2018a “mafdr” command. Because comparisons using the data from the same strain are not independent, this adjustment was made separately for each column in Tables 1 and 2.
To further compare the BPS density patterns between two data sets, wavelet coherence was calculated and plotted using the MatLab R2018a “wcoherence” command. While the Daubechies wavelet provides a good visual representation of the binned data, it is not continuous and thus not easily adapted for wavelet coherence analysis. The MatLab program first converts the binned data to Morlet wavelets and then computes the coherence between two of these wavelets. We chose to analyze the data with wavelet coherence because it gives a measure of the correlation between the signals (displayed as colors in the figures) (72, 73). In addition, the MatLab wavelet coherence plot indicates, as a dashed curve, the ‘cone of influence’ within which results are free of artifactual edge effects (72). The relative phase-lag between the two signals is indicated by small arrows: arrows pointing right indicate in-phase, arrows pointing left indicate 180° out-of-phase, and arrows pointing in other directions indicate the various degrees in between. Because the MatLab program assumes a frequency-time series, the X axis of the plot is cycles/sample and the Y axis is time; we converted these to bins/cycle (on an inverted scale) and bins, respectively.
Declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Availability of data and material
The sequences and SNPs reported in this paper have been deposited with the National Center for Biotechnology Information Sequence Read Archive https://trace.ncbi.nlm.nih.gov/Traces/sra/ (accession no. in progress) and in the IUScholarWorks Repository (hdl.handle.net/2022/20340). Bacterial strains are available upon request
Competing interests
The authors declare no competing interests
Funding
This research was supported by US Army Research Office Multidisciplinary University Research Initiative (MURI) Award W911NF-09-1-0444 to P.L.F. and H.T., the National Institutes of Health T32 GM007757 to B.A.N., and the US Army Undergraduate Research Apprenticeship Program to J. H. and S. R
Authors’ contributions
B.A.N. and P.L.F. designed the research, analyzed the data, and wrote the paper. B.A.N executed the experiments. W.M., H.L., and H.T. performed the bioinformatic analyses. P.L.F. and H.T. supplied resources. All authors read and approved the final manuscript.
Supplemental Material and Methods
Bacterial strains and media
The strains used in this study are listed in Supplemental Table S1. All E. coli strains were derived from PFM2 (1) or AB1157 (2). The oriC+ oriZ+ and ΔoriC oriZ+ strains were a gift from Rodrigo Reyes-Lamothe (McGill University). The mutD5 allele was obtained from Roel Schaaper (NIEHS). The deletion mutations originated in the Keio collection (3) and were moved by P1 phage transduction (4); the Knr element was removed by using FLP recombination (5). The deletions were confirmed by PCR analysis using the oligonucleotides listed in Supplemental Table S2. The B. subtilis strains were derived from the undomesticated ancestral strain NCIB3610 and were a gift from M.A. Konkol and D.B. Kearns (Indiana University).
Rich medium was Miller Luria Broth (LB) (Difco; BD); minimal medium was Vogel-Bonner minimal medium (VB min) with 0.2% glucose (6). When required, antibiotic concentrations were: carbenicillin (Carb), 100 μg/ml; kanamycin (Kn), 50 μg/ml; nalidixic acid (Nal), 40 μg/ml; chloramphenicol (Cam), 30 μg/ml; and, rifampicin (Rif), 100 μg/ml. Half of these concentrations were used in minimal medium.
Estimation of mutation rates from fluctuation assays
Mutation rates were determined as described (7), using mutation to NalR or RifR. The Ma-Sandri-Sarkar maximum likelihood method was used to calculate the mutation rates by using the FALCOR web tool found at www.mitochondria.org/protocols/FALCOR.html (8).
Mutation accumulation experiments
The MA procedure has been described (1, 9, 10). The MA lines originated from single colonies isolated from a founder colony, obtained by streaking from a freezer stock onto agar plates of the medium to be used in the MA experiment. After incubation overnight at the experimental temperature, one well isolated colony was excised from the agar plate, soaked for 30 minutes in 0.85% NACL + 0.01% gelatin, and then vortexed for 60 seconds. Appropriate dilutions for obtaining well-isolated colonies were then plated onto the appropriate agar plates at the appropriate temperature to start MA lines. Plates were incubated at 37°C for most experiments, or at 28°C for the experiment at low temperature. Each MA line was periodically streaked for a single colony: on LB and supplemented VB min agar plates at 37°C, this was done daily; on VB min and diluted LB agar plates at 37°C, and on LB plates at 28°C, this was done every 48 hours. The number of passes required was determined by the preliminary mutation rate obtained from a fluctuation assay. The parameters of the MA experiment including the number of lines used for each strain and the total number of generations per experiment are given in Supplemental Table S3.
Estimation of generations
The method to estimate that number of generations undergone in each MA experiment is described (1, 9). The diameter of the single colonies streaked was recorded daily, and then the number of cells in colonies of different diameters was determined for each experiment as described (1). The daily colony diameters were converted to generations and the results summed.
Genomic DNA preparation, library construction, and sequencing
Genomic DNA was isolated using PureLink Genomic DNA purification kit (Invitrogen) from a predetermined amount of overnight LB culture inoculated from the freezer stocks made after the last passage of the MA line. DNA concentration was measured on an Epoch Microplate Spectrophotometer (BioTek Instruments, Inc.). The identity of the lines was confirmed before library construction by verifying the presence of the expected gene deletions in the gDNA with diagnostic PCR using the oligonucleotides in Supplemental Table S2. Libraries were made by the Indiana University Center for Genomics and Bioinformatics and were sequenced using the Illumina HiSeq 2500 platform at the University of New Hampshire Hubbart Center for Genome Studies or the Illumina NextSeq platform at the Indiana University Center for Genomics and Bioinformatics.
Sequence analysis
SNP calling is described (1). The reference genome used for E. coli was NCBI reference sequence NC_000913.2, and for B. subtilis was GenBank: CP020102.1. The Illumina reads were aligned to the referenced genomes using the Burrows-Wheeler short-read alignment tool, BWA version 0.7.9 (11).
Poor sequence coverage resulted in some MA lines being eliminated. Cross-contamination can occur during streaking, resulting in lines with identical mutations. If two lines shared over 50% of their mutations then one of the lines was dropped from further analysis. If lines shared less than 50%, then each shared mutation was assigned to one of the lines and dropped from the others. If lineage could be established the shared mutation was assigned accordingly, otherwise the mutation was assigned randomly.
RNA sequencing
The E. coli strain PFM144, which is PFM2 ΔmutL (10), was grown in LB and aliquots collected during lag (OD = 0.022), log (OD = 0.3), and stationary (OD = 1.5) phase. The number of cells collected was kept constant for each growth phase. Cells were pelleted at 10,000gs for 10 min, 1ml of medium was added and the cells were pelleted again at 10,000gs for 2 min. The pellets were then flash frozen in liquid nitrogen and stored at −80°. Three biological triplicates were prepared for each growth phase.
RNA was extracted using FastRNA Pro Blue kit (MP Biomedicals). DNA was removed by using TurboDNase (Ambion). DNA removal was confirmed with diagnostic PCR using primer pair “fis forward” and “fis reverse” (Supplemental Table S2). rRNA was removed using RiboMinus magnetic beads kit (Invitrogen). RNA concentration and purity was assessed with an Epoch Microplate Spectrophotometer (BioTek Instruments, Inc.). Libraries were made by the Indiana University CGB and were sequenced at UT Health Science Center on Illumina HiSeq 2500 platform.
Acknowledgements
We thank H. Bedwell-Ivers, C. Coplen, M. Durham, J. Eagan, N. Gruenhagen, J. A. Healy, N. Ivers, C. Klineman., H. Lee, E. Popodi, I. Rameses, S. Riffert, H. Rivera, D. Simon, K. Smith, J. Townes, L. Tran, and L. Whitson for technical help. Bacterial strains were kindly provided by R. Schaaper, R. Reyes-Lamothe, D. Kearns, M. Konkol, and The National BioResource Project at the (Japanese) National Institute of Genetics. We also thank S. E.Bell and X. Wang for technical help and discussions.
Footnotes
Author email addresses: Patricia L. Foster, plfoster{at}indiana.edu, Brittany A. Niccum, brittanyniccum{at}gmail.com, Heewook Lee, heewookl{at}cs.cmu.edu, Wazim MohammedIsmail, wazimoha{at}indiana.edu, Haixu Tang, hatang{at}indiana.edu