Abstract
Background Replication stress (RS) gives rise to DNA damage that threatens genome stability. RS can originate from different sources that stall replication by diverse mechanisms. However, the mechanism underlying how different types of RS contribute to genome instability is unclear, in part due to the poor understanding of the distribution and characteristics of damage sites induced by different RS mechanisms.
Results We use ChIP-seq to map γH2AX binding sites genome-wide caused by aphidicolin (APH), hydroxyurea (HU), and methyl methanesulfonate (MMS) treatments in human lymphocyte cells. Mapping of γH2AX ChIP-seq reveals that APH, HU, and MMS treatments induce non-random γH2AX chromatin binding at discrete regions, suggesting that there are γH2AX binding hotspots in the genome. Characterization of the distribution and sequence/epigenetic features of γH2AX binding sites reveals that the three treatments induce γH2AX binding at largely non-overlapping regions, suggesting that RS may cause damage at specific genomic loci in a manner dependent on the fork stalling mechanism. Nonetheless, γH2AX binding sites induced by the three treatments share common features including compact chromatin, coinciding with larger-than-average genes, and depletion of CpG islands and transcription start sites. Moreover, we observe significant enrichment of SINEs in γH2AX sites in all treatments, indicating that SINEs may be a common barrier for replication polymerases.
Conclusions Our results identify the location and common features of genome instability hotspots induced by different types of RS, and help in deciphering the mechanisms underlying RS-induced genetic diseases and carcinogenesis.
Introduction
Faithful and complete DNA replication is vital for cell survival and genetic transmission. Replication fork progression is constantly challenged and may be stalled by environmental insults and endogenous stress arising from normal cellular metabolism, leading to replication stress (RS) [1-3]. These challenges can arise from various genotoxic mechanisms, such as depletion of nucleotide pools, deficiency of replication complex, conflicts between replication and transcription, R-loop formation, DNA damage, and others (reviewed in [3]). Replisomes need to overcome these obstacles in order to complete DNA replication in a timely and accurate manner.
Fork stalling elicits the activation of the ATM- and Rad3-related (ATR) kinase, a member of the phosphoinositide 3-kinase (PI3K)-like protein kinase [4]. ATR activation arrests cell cycle, promotes fork stability to prevent fork collapse, and regulates DNA repair pathways to rescue stalled forks. One of the critical downstream target of ATR is histone H2AX [5]. Phosphorylation of H2AX at the serine residue 139 (γH2AX) by ATR is an early event in response to fork stalling [6]. Once phosphorylated, γH2AX marks stalled forks prior to DSB formation [6], presumably setting up a favorable chromatic environment that facilitates the recruitment of fork repair proteins to stalled sites. γH2AX also accumulates at break sites after fork collapse [6-8], consistent with its function in double-strand break (DSB) repair. The importance of γH2AX in fork rescue is supported by the yeast study demonstrating that a mutant of the HTA gene that abrogates γH2A (γH2AX ortholog in yeast) confers hypersensitivity to camptothecin, a potent inhibitor of the topoisomerase I that causes the collisions between topoisomerase-DNA complex and replication forks and therefore stalls replication [9]. The same mutant only shows mild sensitivivity to ionizing radiation, suggesting that γH2AX is particularly important in rescuing stalled replication.
Fragile sites (FSs) refer to chromosomal loci that are prone to breakage upon RS. They are hotspots for genome instabilities including sister chromatid exchanges, deletions, translocations, and intra-chromosomal gene amplifications [10-15], and their instability is frequently involved in early stages of tumorigenesis [16, 17]. Due to the importance of FSs in genome stability and carcinogenesis, several methods have been developed to analyze the genome-wide distribution and characteristics of FSs. While early studies used conventional cytogenetic method (G-banding) to map FSs to regions that span megabases in human chromosomes [14, 17, 18], employment of recent sequencing technologies has allowed for fine mapping of FSs sensitive to aphidicolin (APH), hydroxyurea (HU), or ATR inhibition in various human cell lines and murine B lymphocytes [7, 19-21]. An approach using direct in situ break labeling, enrichment on streptavidin and next-generation sequencing (BLESS) has identified >2,000 APH-sensitive regions (ASRs) in HeLa cells and revealed that ASRs contain significant enrichment in satellites of alpha-type repeats in pericentromeric and centromeric regions, as well as in the large transcribed gene regions [19]. Another distinct group of FSs known as early replication fragile sites (ERFSs) have been identified in murine B lymphocytes using RPA and γH2AX ChIP-seq. ERFSs are induced predominantly in early replicating and actively transcribed gene clusters. ERFSs contain high densities of replication origins, have high GC content and open chromatin configuration, and are also gene rich [7, 22]. Nucleotide-resolution analysis of chromosome damage sites has been established with end-seq and found that long (>20bp) poly(dA:dT) tracts are prone to HU-induced fork collapse in mouse splenic B cells [21]. Finally, RPA ChIP-seq has identified over 500 high-resolution ATR-dependent fork collapse sites in mouse embryonic fibroblast cells, which are enriched in microsatelite repeats, hairpin-forming inverted retrotransposble elements and quasi-palindromic AT-rich minisatelite repeats, suggesting that structure-forming repeats are also DNA sequence prone to produce fork collapse [20]. However, it is worth noting that FS breakage displays cell and tissue type-specificity [23, 24], and thus it is difficult to directly compare FS location and features measured in data derived from various cell types from different organisms.
In this study, we hypothesized that different fork stalling mechanisms may stall fork at different loci and induce or exacerbate fragilities at different sequences in the genome. This, in turn, would affect the regulation and expression of different sets of genes residing within/near the fragile loci in a manner dependent on the fork stalling mechanism. For instance, fork stalling can be induced by collision between replication and transcription in large genes, R-loop formation or other replication stressors. Due to cell type and tissue specificity of FS breakage [23, 24], this hypothesis needs to be tested in a cell type-specific manner. Here, we used ChIP-seq to map and characterize γH2AX binding sites induced by three distinct fork stalling mechanisms in one human lymphocyte cell line. The lymphocyte cell line was chosen because historically FSs have been primarily studied in cultured lymphocytes and lymphoblastoid cells. Although γH2AX spreads to large regions and its binding sites may not reflect the exact location of broken sites, mapping and characterizing γH2AX binding may still reveal important information on fragile genomic loci. Three commonly used fork stalling agents were used, namely APH, HU, and methyl methanesulfonate (MMS). APH is a DNA polymerase α inhibitor, HU is the ribonucleotide reductase inhitor that depletes nucleotide pool, and MMS is thought to stall fork progression by binding to and methylating DNA. Our γH2AX ChIP-seq mapping reveals that APH, HU, and MMS treatments induce non-random γH2AX chromatin binding at discrete regions, suggesting that there are γH2AX binding hotspots in the genome. The three treatments induce γH2AX binding at largely non-overlapping regions, supporting that different fork stalling mechanisms likely cause fork stalling at different genomic loci. We also find that γH2AX binding hotspots are depleted from CpG islands (CGIs) and transcription start sites (TSSs), but are enriched at compact chromatin regions. In addition, significant enrichment of SINEs is found in γH2AX sites in all treatments, indicating that SINEs may be a common barrier for replication polymerases. Our results provide novel insights into γH2AX binding specificity in the human genome in response to different DNA replication stressors, which will help in deciphering the mechanisms underlying carcinogenesis and RS-induced genetic diseases.
Results
Mapping of γH2AX binding sites induced by APH, HU, MMS with ChIP-seq
Prior to ChIP-seq, we tested the specificity of γH2AX antibody to ensure high specificity of ChIP (Suppl Fig. S1). Exponentially growing cells were treated with APH (0.3 μM), HU (2 mM), and MMS (200 μM) for 24 hrs to induce RS using conditions widely reported in literatures [25-30]. Following treatment, cells were crosslinked, lysed, and DNA was sonicated to 100-500 bp. Immunoprecipitation was then performed to pull down γH2AX-bound DNA, and ChIP DNA was used for library construction and Illumina sequencing (Fig. 1A). To ensure reproducibility, two independent biological replicates were carried out, and peak calling and alignment were performed for each replicate. Since it is known that γH2AX binding to DNA spreads into large regions, broad peaks were called using MACS2 broad peak calling program [31]. Signals from ChIP samples were normalized to pre-ChIP input signals, and ChIP-seq peaks with p values of <10-3 were selected for further analysis. Spearman correlation coefficient between untreated and treated samples were conducted. The coefficient between replicates in each treatment was ≥ 0.9 (Fig. 1B and Suppl Fig. S2), suggesting the high reproducibility of γH2AX binding and a high confidence of ChIP-seq data. Snapshots of ChIP-seq peaks in each treatment are shown in Fig. 1C and Suppl Fig. S3. We observed that ChIP-seq peaks in both untreated and treated samples showed a nonrandom distribution pattern (Fig. 1C and Suppl Fig. S3), suggesting that these γH2AX binding sites may represent genome instability hotspots sensitive to RS.
γH2AX binding sites induced by different stressors share little overlap
About 4,700 γH2AX binding sites were identified in the untreated sample, indicating a high level of spontaneous DNA damage in this cell line. Compared to other cell lines, GM07027 displayed a high level of endogenous γH2AX expression (Suppl Fig. S4A). We identified ∼18,000, ∼80,000, ∼12,000 γH2AX binding sites in APH, HU, MMS treated samples, respectively (Fig. 1D). Because HU induced four to seven times as many significant peaks as other treatments, we then checked whether such high peak number was due to the high level of damage induced by HU. As shown in Suppl Figs. S4B and S4C, HU and MMS induced comparable γH2AX amount and caused similar reduction in cell proliferation. Similarly, all treatments induced comparable levels of CHK1 phosphorylation at S317, a marker for ATR activation, and enriched S phase cell population (Suppl Fig. S5). However, MMS induced the fewest γH2AX peaks, suggesting that the heterogeneity of ChIP-seq peaks produced from the three drug treatments was unlikely caused by dose effect of the stressors. Although the APH treatment condition resulted in a lower level of damage (Suppl Fig. S4), increasing APH concentration completely blocked replication (data not shown), and thereby could not be used to study RS.
We observed little overlap between APH (6.4%) and MMS (9.3%) data sets. HU treated sample contained regions shared with all other stressors, but this overlap only accounted for a small portion of the HU data set due to the number of peaks (6.2% of overlap with APH treatment and 4% of overlap with MMS treatment) (Fig. 1D). Taken together, this suggests that γH2AX binds at specific genomic regions in a manner likely dependent on the fork stalling mechanisms.
γH2AX binding is enriched in large genes and regions encoding long transcripts
Our results showed that γH2AX binding was enriched at genes longer than the genomic median, regardless of the stressor (Fig. 2A and Suppl. Fig. S6, Kruskal Wallis with post hoc paired Wilcoxan signed rank test, p < 2 × 10-16). This result supports that large genes/transcripts have the potential to stall replication under RS induced by different treatments, presumably because replication machinery more likely collides with RNA polymerases transcribing long genes [10]. Interestingly, while HU induced γH2AX enrichment at genes longer than the genomic average, such enrichment was found at shorter genes when compared to APH or MMS treated samples (Fig. 2A, Suppl Fig. S6), indicating that HU treatment may sensitize shorter genes to breakage.
γH2AX is enriched at CFSs under exogenous genotoxin treatment
Common fragile sites (CFSs) are specific chromosomal regions that are prone to break under APH-induced RS. They are present in all individuals, are characterized by gene poor, heterochromatic, late replicating, non-B-form DNA structures like hairpins [15, 32-35]. CFSs are not precisely mapped breaks, but rather are megabase regions defined by G-banding using APH treated lymphocyte metaphase spreads [14]. Using permutation analysis, we compared γH2AX enrichment at consensus CFS G-band positions (Suppl Table S1). We found that CFSs accumulated γH2AX at a low level in the absence of RS and breakage was further enhanced with exogenous genotoxic stress (Fig. 2B and 2C). While CFSs were originally described under APH treated conditions, we found that both HU and MMS could induce significant γH2AX enrichment when compared with untreated samples (Fig. 2C). This result confirms previous findings that RS may preferentially cause damage at regions containing CFSs, and that these regions may be sensitive to a wide variety of stressors.
Sequence features in γH2AX binding regions
It is thought that repetitive sequences are intrinsic barriers of replication machinery and replication forks are prone to stall at repetitive regions [36]. Thus, we analyzed ChIP-seq peaks in the context of repetitive genomic elements using the RepeatMasker data set [37]. In addition to areas of low complexity (defined as >100 nt stretch of >87% AT or 89% GC, and >30 nt stretch with >29 nt poly(N)n, N denotes any nucleotide, and those containing short tandem repeats [37]), we also looked at γH2AX accumulation in the context of common transposable elements: SINEs (short interspersed nuclear elements), LINEs (long interspersed nuclear elements), DNA transposons, and LTRs (long terminal repeats). No enrichment at regions of low complexity was observed. Instead, we observed significant enrichment at SINEs genome-wide in all samples (Fig. 3). SINEs are 80-500 bp nonautonomous elements in the genome, with 3’ ends often composed of simple repeats like poly-dA, poly-dT, or tandem array of 2-3 bp unit [38]. A recent study identifies that poly (dA:dT) tracts are natural replication barriers and are a common cause for DNA breakage in HU-treated mouse B-lymphocytes [21], and SINEs are significantly enriched in early replicating fragile sites identified in HU-treated mouse B-lymphocytes [7]. Another study shows that repetitive DNA sequences that give rise to non-B-form structures impede DNA replication [20]. The enrichment of SINEs but not simple repeats in γH2AX binding indicate that in addition to the 3’ poly (dA:dT), abundant transposable elements in SINEs may contain features prone to non-B-form structure formation that make SINEs particularly susceptible to fork stalling.
Compared to untreated sample, SINEs, LINEs, simple repeats, and DNA transposons were enriched in γH2AX binding sites under HU treatment, while LTRs and simple repeats were reduced in MMS treatment. Binding patterns in APH treated sample did not significantly differ from untreated cells in any repetitive elements (Suppl Fig. S7). Future studies using a high-resolution sequencing method will be helpful to pinpoint sequence composition and features under different replication stress inducers.
Epigenetic features in γH2AX binding regions
Poor replication initiation has been proposed to cause instabilities [35]. Given that replication timing and initiation can be epigenetically controlled rather than directed by specific sequence motif [12, 39], we examined common epigenetic marks including H3K9Ac, H3K4me3, H3K27me3, and H3K9me3 that modulate chromatin structures at γH2AX binding sites. H3K9Ac and H3K4me3 are euchromatic marks and are tightly associated with active transcription and histone deposition, while H3K27me3 and H3K9me3 are found mainly at inactive gene promoters and are associated with compact chromatin [40]. After aligning γH2AX ChIP-seq peaks with histone modification ChIP-seq datasets from human B-lymphoblastoids [GSM733677 (H3K9ac), GSM733708 (H3K4me3), GSM945196 (H3K27me3), GSM733664 (H3K9me3)], we found depletion in γH2AX at H3K9Ac and H3K4me3 marks, and enrichment in all samples at H3K27me3 and H3K9me3 marks (Fig. 4), suggesting that γH2AX sites induced by the three stressors coincide with more compact chromatin regions.
Depletion of CGIs and TSSs in γH2AX binding regions
CGIs are DNA elements with high CpG content. Roughly 50% of these regions are associated with gene expression regulation, and can be located at or near TSSs [41-43]. Early studies have shown a strong association of replication initiation and CGIs in mammalian genomes, with half of origins residing within or near CGIs [44, 45]. Replication origin activity is also significantly enriched at and around TSSs [46, 47]. Thus, we next examined the relationship between γH2AX binding and CGIs and TSSs in our samples. Using permutation analysis, we searched for enriched or depleted binding at CGIs genome-wide and found that γH2AX did not associate with CGIs. Rather, these regions were noticeably unbound (Fig. 5A). Similarly, we found consistent local depletion of γH2AX at TSSs (Fig. 5B), while no depletion or enrichment at transcription termination sites (TTS) or gene bodies was observed (Fig. 5C and Suppl Fig. S8). Together with the enrichment of γH2AX binding at more compact chromatic regions (Fig. 4), our data suggest that γH2AX tends to bind to transcriptionally inactive regions upon fork stalling.
Discussion
While γH2AX binding to DSBs has been mapped and profiled in high-resolution [48], systematic characterization and comparison of γH2AX chromatin binding in response to RS is lacking. This is further complicated by the fact that fork stalling can be induced by a diverse of mechanisms, and FS instability also displays cell- and tissue-type specificity. In this study, we generated a large set of γH2AX binding data from a single human cell line treated with three genotoxins that stall replication with distinct mechanisms. This study design allows us to directly compare γH2AX binding under different RS conditions, revealing a number of notable features of γH2AX binding in response to fork stalling.
We find that only a small portion of γH2AX binding sites resulting from MMS (9.3%) and APH (6.4%) treatment overlap, suggesting that the two different fork stalling mechanisms produce RS-sensitive damage hotspots at discrete locations. This is not completely unexpected, since these two chemicals induce RS with distinct mechanisms. APH inhibits DNA polymerases α and slows DNA polymerization during replication, generating stretches of single stand DNA at stalled forks [14, 16, 49]. Thus, APH is expected to cause forks to stall or collapse at vulnerable regions containing natural barriers for DNA polymerases. These regions likely require additional efforts to avoid the pausing or dissociation of polymerases. Consistently, several studies have shown that specialized DNA polymerases, including Pol η, Pol ζ, and Pol κ that facilitate DNA synthesis and promote the stability of APH-inducible FSs [50-53]. In contrast, RS induced by the DNA methylating agent MMS is more complex. Although MMS is capable of reacting with a number of nucleophilic sites on DNA including ring nitrogen and exocyclic oxygen on purines and pyrimidines, the reactivity towards electrophiles varies substantially by the position of the nucleotide, whether the nucleotide is at the major or minor groove, and whether the DNA is single or double stranded [54]. Consequently, it is difficult to pinpoint where the methylation adducts are formed. HU reduces or depletes the overall cellular nucleotide pool, and therefore is expected to stall all DNA synthesis and impact replication more globally. In agreement with this view, we find that HU induces several times more damage sites than other treatments. HU induces γH2AX binding hotspots at regions overlapping with APH or MMS treated samples, but this overlap only accounts for a small portion of data set due to the large peak numbers.
We observed that SINEs are enriched in γH2AX binding sites induced by all three treatments (Fig. 3), suggesting that SINEs may contain features that easily stall DNA polymerases. One such feature may be the poly (dA:dT) tracts at the 3’ end of SINEs, which have been implicated as a natural replication barriers and is a common cause for DNA breakage in murine lymphocytes [21]. Cumulating evidence indicates that SINEs regulate gene expression, affect chromatin structure, and are involved in genome rearrangement [55, 56], and therefore they have been implicated in many diseases including cancer [57]. It will be interesting to investigate the potential role of RS-induced SINE instability in disease development.
Despite different localizations of γH2AX binding, we find that they share a few obvious common features. First, all three conditions induce γH2AX binding at regions with the median transcript length longer than the median human transcript size (Fig. 2), indicating that regions with large transcripts are prone to break under RS. It has been shown that transcription of large genes often requires more than one complete cell cycle to complete. Collisions of transcription machinery with a replication fork and the formation of R-loops impede fork movement, causing FS instability [10]. Thus, our results reinforce transcription/replication collision as a crucial theme causing RS regardless of the RS mechanism.
In addition to increased binding at long genes, we also find that APH, HU, and MMS-induced γH2AX binding shows depletion of H3K9Ac and H3K4me3 marks, while being slightly enriched with H3K27me3 (Fig. 4), suggesting that chromatin within FSs may be more compact than non-fragile regions. It has been postulated that epigenetic feature regulates replication density and timing, with compact chromatic regions being poorly represented at replication initiation regions [12, 39]. In support of this, previous report shows that the six most break-prone human CFSs display an epigenetic pattern of histone hypoacetylation [11]. The same study also examines H3K9Ac acetylation pattern of large genes and find that acetylation coverage of large genes is substantially lower than that of the human genome on average. Our results therefore extend this finding to genome-wide FSs and support that compact chromatin may be a common epigenetic feature contributing to FS instability.
Previous research suggests that unprogrammed formation of R-loops impairs fork progression, causing fork stalling that contributes to DSB formation [58, 59]. A recent study has reported widespread R-loop formation at unmethylated CGI promoters in the human genome [60]. Therefore, our observation that γH2AX peaks flank but are not located at CGIs and TSSs is somewhat surprising (Fig. 5). In order to explain this observation, it is worth revisiting studies of mapping γH2AX distribution after DSB induction. DSBs trigger H2AX phosphorylation over large domains (0.5 to 2 Mb) surrounding the DSB [48]. Anti-correlation between RNA Pol II occupancy and γH2AX enrichment has been observed in both S. cerevisiae and the human U2OS cell line [48, 61], suggesting that TSSs and promoter regions may be particularly resistant to either the establishment or maintenance of H2AX phosphorylation. In addition, γH2AX enrichment at transcriptionally repressed genes seems to be dependent on HDACs [61]. Thus, it is highly likely that specialized chromatin structures at TSSs and CGIs prevent γH2AX accumulation despite R-loop formation. It will be interesting to determine the role of γH2AX depletion and specialized chromatin in stabilizing stalled forks at TSSs.
In conclusion, our study demonstrates that different types of replication stresses produce γH2AX binding at non-overlapping loci. By characterizing sequence and epigenetic features of these loci, our analysis provides a global view of the characteristics of genomic regions sensitive to various replication stress conditions. It is conceivable that cells may use different molecular mechanisms involving different protein molecules and repair pathways to rescue forks stalled at different types of fragile sequences. Since chromosome rearrangements found in cancer cells often result from genome instability caused by RS, deciphering the molecular mechanisms protecting RS-induced genome stability represents an important issue in the field.
Materials and Methods
Cell culture
Human B-lymphocyte cell line (GM07027) was obtained from Coriell Institute. 174xCEM was obtained from American Type Culture Collection (ATCC). GM07027 and 174xCEM lymphocyte cells were cultured in suspension and passaged in the RPMI1640 medium (Life Technologies) supplemented with 2 mM L-glutamine and 15% fetal bovine serum (Atlanta Biologicals) at 37°C under 5% CO2. HeLa and HEK293T cells (ATCC) were cultured in DMEM media supplemented with 10% cosmic calf serum (ThermoFisher) at 37 °C containing 5% CO2. No antibiotics were used to avoid possible antibiotics-induced stress. Cell cycle was detected by analyzing DNA contents using a Beckman Coulter EPICS XL™ flow cytometer.
ChIP-seq sample preparation
Cells were treated with 2 mM HU, 0.3 μM APH or 200 μM MMS for 24 hrs, collected by centrifugation, resuspended in PBS and crosslinked with 1% formaldehyde for 15 min at r.t.. Crosslinking was stopped by 0.2 M glycine, cells were centrifuged, resuspended in lysis buffer (50 mM Tris-HCl pH8.0, 1% Triton X-100, 1% SDS, protease inhibitor cocktail containing 1 mM AEBSF, 0.3 µM aprotinin, 50 µM bestatin, 10 µM E-64, 10 µM leupeptin, 5 µM pepstain and 1 mM PMSF), sonicated on ice for 10 times in 10 s pulses to obtain DNA fragments 100-500 bp in length, and centrifuged again at 4 °C for 10 min at 20,000 g. The supernatant was then diluted with four volumes of dilution buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl pH8.0, 150 mM NaCl, protease inhibitors) and precleared with protein A beads (Roche) at 4 °C for 1 hr. Precleared lysates were incubated with anti-γH2AX (Active Motif, #39117) at 4 °C for overnight, followed by the addition of Protein A beads. After additional 3 hr incubation at 4 °C, beads were washed sequentially with 1 ml of buffer A (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH8.0, 150 mM NaCl), buffer B (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH8.0, 500 mM NaCl), buffer C (250 mM LiCl, 1% NP-40, 1% Na-deoxycholate, 1mM EDTA, 10 mM Tris-HCl pH8.0), and buffer D (1mM EDTA, 10 mM Tris-HCl pH8.0) for 5 min at 4 °C with rotation. Beads were then washed with buffer D again for 5 min, and eluted with 300 μl elution buffer (1% SDS, 100 mM NaHCO3) at 50 oC for 15 min. Elutes were reverse crosslinked in 200 mM NaCl and 20 μg Protease K at 65 °C overnight. DNA was then precipitated by ethanol and precipitated DNA was used for ChIP-seq library construction.
NGS library preparation and sequencing
Libraries were prepared according to Illumina’s TruSeq® ChIP Sample Preparation Guide (Part# 15023092 Rev. B). Briefly, ChIP DNA was end-repaired using a combination of T4 DNA polymerase, E. coli DNA Pol I large fragment (Klenow polymerase) and T4 polynucleotide kinase. The blunt, phosphorylated ends were treated with Klenow fragment (32 to 52 exo minus) and dATP to yield a protruding 3-‘A’ base for ligation of Illumina’s adapters which have a single ‘T’ base overhang at the 3’ end. After adapter ligation, DNA fragments with sizes of 250-300 bp were selected on 2% agarose gels and were PCR amplified with Illumina primers for 18 cycles. The libraries were captured on an Illumina flow cell for cluster generation and sequenced on HiSeq 2500 (Illumina) with paired-end 100 bp read length following the manufacturer’s protocols. For each genotoxin, two independent treatments were performed, followed by independent ChIP experiments. This resulted in a total of eight ChIP samples (untreated, APH, HU, MMS) that were sequenced simultaneously.
ChIP-seq reads processing and sequence analysis
Prior to sequence analysis, adaptor sequences in reads were trimmed. Paired-end reads in fastq format were aligned to the GRCh38 reference genome using Bowtie2 default settings [62]. Reads were checked for quality control using Samtools [63], and reads below q40 were removed. PCR duplicates were also removed. Following alignment, broad peaks were called using MACS2 peak-calling program [31] (with settings --broad --no-model, --broad-cutoff 10e-3 -p) to give the final peak list per replicate. Shift size was determined using gel quantification from library quality controls. Shift sizes were determined to be: APH-treated replicate 1: 251; APH-treated replicate 2: 257; HU-treated replicate 1: 248; HU-treated replicate 2: 243; MMS-treated replicate 1: 222; MMS-treated replicate 2: 241; Untreated replicate 1: 214; Untreated replicate 2: 229. Blacklisted regions were removed from analysis [64]. Reproducibility between replicates was assessed using Spearman Rank Correlation of tags per 1,000 bp bin.
Enrichment or depletion of γH2AX ChIP-seq peaks in repetitive elements, CGIs, and CFSs were assessed using 1000 iteration permutation analysis with the regioneR Bioconductor package [65]. Repetitive elements were defined by RepeatMasker [37], which uses RepBase Update, the database of repetitive sequences throughout multiple species to define repetitive sequences [66]. This database contains transposable elements (SINES, LINES, DNA-transposons, and LTRs), and non-mobile DNA repeat elements which include the canonical TTAGGG telomere sequence (simple repeats/microsatellites), regions of low complexity such as the known fragile poly-T motif, and (x)RNA sources found throughout the genome. Positions and categories of repetitive elements were obtained from the RepeatMasker data set [37]. Positions of CGIs were obtained from the CGI track in the UCSC Genome Browser [42]. CFSs in human lymphocytes [18, 67, 68] were sorted using the G-band positions from the UCSC Chromosome band track [69, 70]. The NCBI RefSeq dataset was used for gene lengths, TSS, and TTS analyses [71]. Gene length was analyzed using a Kruskal-Wallis test and post-hoc paired Wilcoxon signed-rank test with a Holm-Bonferroni correction for family-wise error. Graphs for gene length were generated using the ggplot2 R package [72]. Graphs for ChIP-seq data relationships to TSS and histone marks were generated using Deeptools2 [73]. Histone mark data was taken from GSM733677 (H3K9ac), GSM733708 (H3K4me3), GSM945196 (H3K27me3), GSM733664 (H3K9me3) [64, 74]. Sample data was realigned to hg19 using identical Bowtie2 settings prior to comparison with histone marks. COSMIC [75, 76] was used for cancer gene analysis, and Gene Consortium database [77, 78] was used for gene ontology analysis.
Declarations
Ethics approval and consent to participate
N/A
Consent for publication
N/A
Availability of data and materials
All ChIP-seq data generated or analyzed during this study have been deposited to GEO, but are not publicly available until the manuscript is published.
Competing interests
None.
Funding
This work was supported by Illumina pilot grant and NIH R01GM112864 to W.C., and in part by NSF MRI grant 1532271 to the Center for Institutional Research Computing at WSU.
Author contributions
XL performed ChIP. MC analyzed ChIP-seq sequences. XL and MC assembled figures. XL, MC, WC wrote the manuscript.
Acknowledgements
We thank Dr. Ben Liu at the Genomics Core for assistance in sequence analysis, and Ms. Elizabeth Everson in flow cytometry analysis.