Nucleotide excision repair is impaired by binding of transcription factors to DNA

Radhakrishnan Sabarinathan; Loris Mularoni; Jordi Deu-Pons; Abel Gonzalez-Perez; Núria López-Bigas

doi:10.1101/028886

ABSTRACT

Somatic mutations are the driving force of cancer genome evolution¹. The rate of somatic mutations appears in great variability across the genome due to chromatin organization, DNA accessibility and replication timing^2-5. However, other variables that may influence the mutation rate locally, such as DNA-binding proteins, are unknown. Here we demonstrate that the rate of somatic mutations in melanoma tumors is highly increased at active Transcription Factor binding sites (TFBS) and nucleosome embedded DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data⁶, we show that the higher mutation rate at these sites is caused by a decrease of the levels of nucleotide excision repair (NER) activity. Therefore, our work demonstrates that DNA-bound proteins interfere with the NER machinery, which results in an increased rate of mutations at their binding sites. This finding has important implications in our understanding of mutational and DNA repair processes and in the identification of cancer driver mutations.

The accumulation of somatic mutations in cells results from the interplay of mutagenic processes, both internal and exogenous, and mechanisms of DNA repair. Recent efforts to sequence the whole genome of tumor samples from different tumor types^7,8 have shed light on this interplay. On the one hand, mutational signatures associated to various tumorigenic mechanisms have been identified across cancer types⁹; on the other, genomic features such as chromatin organization, DNA accessibility, and DNA replication timing^2-5 have been associated to the variation of somatic mutation rates at the megabase scale. Two recent studies proposed a causal relationship between the accessibility of chromosomal areas to the DNA repair machinery and their mutational burden. Supek and Lehner, 2015¹⁰ point to variable repair of DNA mismatches as the basis of the megabase scale variation of somatic mutation rates across the human genome. Polak et al. 2014⁴ attributed lower somatic mutation rates at DNase-I hypersensitive sites (DHS) in cell lines and primary tumors than at their flanking regions and the rest of the genome to higher accessibility to the global genome repair machinery. Similarly, nucleosome occupancy has been linked to regional mutation rate variation between the nucleosome bound DNA and linker regions^11-13, while two recent studies found a relation between transcription factor binding sites (TFBS) and nucleotide substitution rates. Reijns et al. 2015¹⁴ detected increased levels of nucleotide substitutions around TFBS in the yeast genome, which was attributed to DNA-binding proteins acting as partial barriers to the polymerase-delta-mediated displacement of polymerase-alpha-synthesized DNA. Katainen et al. 2015¹⁵ found that CTCF/cohesin-binding sites are frequently mutated in colorectal tumors and in a small subset of tumors of other cancer types, and suggest that these mutations are probably caused by challenged DNA replication under aberrant conditions.

To elucidate the impact of DNA-binding proteins on DNA repair, we analyzed the somatic mutation rate at TFBS in the genomes of 38 primary melanoma samples sequenced by TCGA¹⁶. We found that the mutation rate was approximately five times higher in active TFBS, i.e., those overlapping DHS (Fig. 1a) than in their flanking regions (P < 2.2 × 10⁻⁶, chi-square test). We determined that this elevated mutation rate could not be explained by the sequence context (Fig. 1a), and that it did not occur at inactive TFBS (Fig. 1a and Extended Data Fig. 1), indicating that it is directly related to the protein bound to DNA. Furthermore, this enrichment for mutations appeared at the active binding sites of most transcription factors (TFs) (Fig. 1b, Extended Data Fig. 2 and Supplementary Table 1); the signal was discernible in most analyzed melanoma samples (Fig. 1c and Supplementary Table 2), and it increased with genome-wide mutation rate.

Figure 1. Elevated mutation rate at TFBS in melanomas.

a, Mutation rates are approximately five-fold higher within active TFBS, those overlapping DHS in melanocytes, than in flanking regions (red line). In contrast, non-active TFBS, those non-overlapping DHS in melanocytes, do not show increased mutation rates (green line). The high increase in mutation rate is not explained by sequence context; black lines show the expected mutation rate per position when distributing all observed mu-tations in the region according to the probability of mutations in different trinucleotide contexts. b, A significant increase in mutation rate in TFBS compared to flanking regions is observed for most indi-vidual transcription factors and (c) in most of the individual melanoma samples. The log2 fold change (FC) on the x-axis represents if the mutation rate in TFBS is higher (positive FC) or lower (negative FC) than the expected, and the corresponding significance value (from chi-square test) is shown on the y-axis for each transcription factor. d, The contribution of C>T mutations to mutational density is higher compared to the other mutation types. The zero coordinate in the x-axis corresponds to the TFBS mid-point, and the magenta line above it represents the average size of TFBS.

Figure 2. Mutation rate at distal TFBS and DHS sites.

a, Mutation rate in distal TFBS, which are 5 kb away from transcription start sites. Similar to proximal TFBS as shown in Fig. 1a, the mutation rate is elevated at the center of core TFBS compared to the flanking. In addition, periodic peaks of mutation rate in the flanking regions of binding sites correlate well with the nucleosome positioning (green line). This is further supported by the Autocorrelation analysis (bottom panel) that shows the periodic peaks are observed at a distance of ~146bp, which coincides well with the size of the DNA being wrapped in nucleosomes, separated by short stretch of linker DNA. The zero coordinate in the x-axis corresponds to the TFBS mid-point. b, Mutation rate centered in DHS in melanomas is shown (top panel). In the subset of DHS regions outside promoter regions and non-overlapping TFBS (DHS-noPromoter-noTFBS), the peak in mutation rate disappears and only a valley is observed (middle panel). In the subset of DHS regions in promoters overlapping TFBS (DHS-Promoters-TFBS), the peak in mutation rate is more apparent. The actual mutation rate values are shown in light red and the best-fit spline is shown in dark red. The zero coordinate in the x-axis corresponds to the DHS peak mid-point, and the magenta line above it represents the average size of DHS (~150nts). The barplot at the right of each panel compares the mutation rate in the DHS and the flank for each group of regions.

Most somatic mutations in melanocytes are caused by exposure to ultraviolet (UV) radiation⁹. UV radiation causes specific DNA lesions or DNA photoproducts –cyclobutane pyrimidine dimers (CPDs) and (6-4) pyrimidine–pyrimidone photo-products ((6-4)PPs), at the sites of dipyrimidines¹⁷. As expected, C>T (G>A) mutations predominated over other nucleotide changes in melanomas (Fig. 1d), both within TFBS and at their flanks. This could be explained by either a faulty DNA repair or higher probability of UV induced lesions^18-19 at protein-bound DNA.

Next, we focused on active TFBS in distal regions from transcription start sites, and again found increased mutation rate at binding sites, flanked by periodic peaks of mutation rate observed at a distance of ~146bp, which coincides well with the size of the DNA being wrapped in nucleosomes. When we superimposed the nucleosomes positioning signals from ENCODE²⁰ and these mutation rate peaks, we verified that their positions matched perfectly (Fig. 2a). Furthermore, we found that the peak of mutation rate observed at the center of DHS regions occurred exclusively at TFBS located within promoter regions (DHS-Promoters-TFBS), and was absent from DHS-noPromoter-noTFBS (Fig. 2b). This corroborated that whatever the process causing the increment of mutation rate it required that the proteins be bound to the DNA.

We then inquired if the cause of the higher mutation rate in TFBS and nucleosomes was the reduced accessibility to the protein-bound DNA of the NER machinery. Non-repaired nucleotides would be by-passed by polymerases carrying out translesion DNA synthesis, thus resulting in mutations²¹. To test it we assembled nucleotide-resolution maps of the NER activity of the two products of UV-induced DNA damage, CPDs and (6-4)PPs, generated by Hu et al., 2015 using XR-seq in irradiated skin fibroblasts⁶. In XR-seq, the excised ∼30-mer around the site of damage generated during nucleotide excision repair is isolated and subjected to high-throughput sequencing. When we analyzed the genome-wide signal of this NER map, we found a strong decrease in the amount of CPD and (6-4)PP repair at the center of TFBS (Fig. 3a and Extended Data Fig. 3), compared to their flanking regions. The decrease was apparent both in wild-type cells (NHF1), and CS-B mutant cell lines, which lack transcription-coupled repair⁶ (Fig. 3a and Extended Data Fig. 3), and it appeared at the binding sites of individual transcription factors (Extended Data Fig. 4). Moreover, we found that the level of DNA excision repair (and the mutation rate) at TFBS correlated with the strength of their binding (Fig. 3B and Extended Data Fig. 5). We concluded from these observations that the higher mutation rate observed at active TFBS is caused by a decrease of the NER activity.

Figure 3. Regions around TFBS show a decrease in nucleotide excision repair.

a, Mutation rate around TFBS is plotted (red line) alongside the average repair of UV-light induced DNA damage, cyclobutane pyrimidine dimers (CPD), in wild-type and CS-B mutant cell lines (blue line). A sharp decrease in nucleotide excision repair is evident at the core TFBS both in case of proximal and distal. b, The level of nucleotide excision repair (and the resulting mutation rate) in TFBS correlated with the strength of the binding of the transcription factor to its site. The binding sites were classified into four quartiles (Low to High) using the ChIP-seq read coverage that reflects the strength of binding or occupancy (as in Rejins et al., 2015¹⁴). The binding sites in the “High” quartile (last panel) show higher mutation rates at the center (correlating with the lower repair) compared to the “Low” quartile (first panel). The zero coordinate in the x-axis corresponds to the TFBS mid-point, and the magenta line above it represents the average size of TFBS.

Figure 4. Model showing the mutation rate and repair rate in TFBS and nucleosome sites.

The model shows that the accessibility of the DNA to the nucleotide excision repair (NER) machinery directly determines the distribution of mutational density at the nucleotide scale. Lower NER activity is observed at the TFBS bound region (within DHS region) and the nucleosome positions in the flank, compared to the nucleosome free regions (DHS and linkers). This NER activity corroborates the observed high mutation rate in transcription factor and nucleosome bound regions.

A previous study related higher DNA repair activity at DHS than that outside DHS to greater accessibility to the repair machinery⁴. By specifically deconvoluting the signal of mutation rate within DHS, our work goes a step beyond to show that bound TFs at the center of DHS actually hinder DNA repair. This interplay of greater NER at DHS and lower NER at TF bound sites in their center results in a volcano-shaped pattern of NER activity around the TFBS, with a strong depletion exactly at its center flanked by two mountains in the DHS area around it (Fig. 3). The volcano shape is more pronounced at distal TFBS, those that occur distant from transcription start sites (TSS) (Fig. 3a), which may be explained by the presence of shorter regions of open chromatin surrounded by compacted DNA. Moreover, a periodicity in NER activity is observable for the first nucleosomes around TFBS (Fig 3a), which matches nicely the previously noted periodical variation of the mutation rate. Also in coherence with the mutation rate pattern, the signal of decreased NER activity is clearer at the center of DHS-Promoters-TFBS, exactly at the position of the TFBS (Extended Data Fig. 6c). These results demonstrate that repair activity in DHS regions is in general higher than in non-DHS regions, supporting previous observations⁴, however this activity is specifically impaired at sites with bound transcription factors.

NER consists of two pathways: global repair –targeting the lesions in a genome-wide manner– and transcription-coupled repair that recognize lesion within transcribed regions¹⁷. These pathways differ in the initial steps of damage recognition, although they share the core component that excise damaged regions. To discern the effect of DNA bound TFs on transcription coupled NER we focused on transcribed regions centered at TFBS at least 200 bps downstream of TSS, and plotted together mutation rate and XR-seq data in XP-C cells, which only have transcription-coupled repair⁶. Mutation rate is also increased at the center of transcribed TFBS, and the volcano shape of repair rate in XP-C cells is apparent for TFs bound to either template or non-template strand (Extended Data Fig. 7). This result demonstrates that the decrease in NER caused by bound TFs results from impairment of both NER pathways.

NER recognizes and repairs other DNA lesions beside those induced by UV light, such as DNA adducts induced by smoking-related carcinogens (e.g. benzo[a]pyrene diol epoxide)²². We therefore hypothesized that the conclusion we had drawn from the observations made in melanomas could be extended to other tumor types. We observed higher mutation rates at TFBS in lung adenocarcinomas and lung squamos cell carcinomas, in particular for C>A variants, which correspond to the mutations caused by tobacco smoking⁹ (Extended Data Fig. 8). In contrast, no increment of the mutation rate in TFBS is observed in colon adenocarcinomas, where NER activity is not expected to play a major role in the mutational process.

Two previous studies have described abnormal mutation rates in connection with a group of DNA bound TFs in yeast¹⁴ and CTCF/cohesin sites in a subset of colorectal tumors¹⁵. However, in contrast to our results, in neither of these studies the peaks of mutation rate were caused by impairment of NER resulting from bound proteins. In the former, higher mutation rate at specific TFBS were related to polymerase-delta-mediated displacement of polymerase-alpha-synthesized DNA during replication. In the latter, higher mutations at CTCF/cohesin sites of a subset of colorectal tumors, was attributed to challenged DNA replication under aberrant conditions.

In summary, our results demonstrate that the accessibility of the DNA to the NER machinery directly determines the distribution of mutational density at the nucleotide scale. The increased repair in freely accessible, nucleosome-free, DNA around TFBS and the decline in repair efficiency exactly at TFBS produces a lower mutation rate in the periphery of DHS sites and higher mutation rate at their center (Fig 4). Moreover, periodic signals of higher mutation rate and lower NER in close chromatin regions coincide with nucleosome occupancy, suggesting that nucleosomes produce the same type of impairment to NER.

These findings have strong implications for our basic understanding of the mechanisms of DNA repair in human cells, as well as for the study of tumor evolution and cancer-associated somatic mutations. They indicate that most mutations in TFBS accumulate due to faulty repair at these sites. Therefore, methods designed to identify potential somatic driver mutations, in non-coding regions, which typically exploit the mutational patterns of genomic elements must construct models of the background mutation rate that accurately take into account the increased mutation density at TFBS due to faulty repair.

Methods

Mutation data

Whole-genome somatic mutations of 38 skin cutaneous melanomas (SKCM), 46 lung adenocarcinomas (LUAD), 45 lung squamous cell carcinomas (LUSC), and 42 colorectal adenocarcinomas (CRC) identified by TCGA were obtained from Fredriksson et al., 2014¹⁶. As suggested by the authors of that paper, we considered in our analyses only single nucleotide substitutions with a minimum variant frequency of 0.2 and which do not overlap dbSNP entries (v138). The total number of mutations of each cancer type passing these thresholds is listed in Extended Data Table 1. We separated CRC samples into two groups: hypermutated (with mutations of the DNA polymerase epsilon (POL-E) gene; n = 8 samples) and hypomutated (the rest; n = 34 samples).

Genomic elements

The genomic coordinates of transcription factor binding sites (TFBS), i.e., TF motif match under ChIP-seq peak regions, were obtained from ENCODE²⁰. These comprised the binding sites of 109 transcription factors (TF) as used in Khurana et al., 2013²³. We also obtained from ENCODE predicted binding sites of 52 transcription factors which are not supported by ChIP-seq peaks (termed unbound TFBS). In addition, we obtained the binding sites of 32 TFs used in Reijns et al., 2015¹³. We treated the latter as an independent data set, and following the authors of the original paper,¹³ we clustered the TFBS into quartiles according to the binding strength or occupancy of the TFs to their sites – quantified through ChIP-seq read coverage.

As promoters, we considered the DNA sequences up to 2.5kb upstream of transcription start sites (TSS) of all protein coding genes in GENCODE²⁴ (v19). Promoter regions overlapping coding sequences (CDS) or untranslated regions (UTRs) were excluded. We classified TFBS as either proximal –i.e., overlapping these upstream promoters– or distal –i.e., those located in intergenic regions, with no annotated TSS (as per GENCODE v19) within 5kb distance on either side. A third group of TFBS was composed of those located downstream TSS (between +200bp and +500bp) and which do not overlap with the upstream 2.5kb promoter regions –i.e., TFBS in transcribed regions.

All TFBS overlapping DNase I Hypersensitive sites (DHS) identified by the Epigenome roadmap project²⁵ in primary cell types most closely matching the cell of origin of each tumor type (see below) were considered active. We considered only DHS sites identified by the Hotspot algorithm (narrowPeaks in FDR 1%), which are typically 150nts long. For each cancer type, the matching primary cell type was selected based on the recent study by Polak et al., 2015⁵ (Extended Data Table 1). We chose the DHS from primary cell types (from Epigenome Roadmap project) instead of cell lines (from ENCODE), because the chromatin features of the cell of origin of a tumor has been shown to correlate better with its mutation profile than that of matched cancer cell lines⁵. However, we selected the TFBS detected by ENCODE in cell lines (see above) due to the lack of TF binding site annotations in primary cells analyzed by the Epigenome Roadmap project²⁵.

We then classified the TFBS in the samples of each tumor type as active or inactive based on their overlap, or lack thereof, with DHS regions (minimum 1bp) of the matched primary cell type. Unbound TFBS (see above), which do not overlap with TF peaks or DHS regions, were considered as inactive TFBS and used as negative control to compare with the active TFBS (in Extended data Fig. 1). All genomic co-ordinates of TFBS used in this study as part of any aforementioned category are available at http://bg.upf.edu/tfbs.

Mutation rate estimation

In order to compare the mutation rate in TFBS to their neighboring regions, we considered flanking stretches of 1000 nucleotides at both sides of the TFBS mid-point. To exclude regions that could bias the mutation rate analyses, prior to mapping the somatic mutations to these selected 2001 nts windows, we filtered out: any regions overlapping a) coding sequences, and b) UCSC Browser blacklisted regions, often misaligned to sites in the reference assembly, (Duke and DAC) and low unique mappability of sequencing reads (“CRG Alignability 36' Track”²⁶, score < 1) (http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeMapability). In addition, regions that overlap other TFBS within flanking regions (immediately upstream or downstream the TFBS) were excluded. The resulting filtered windows of each TFBS were then aligned (taking as reference the TFBS centers), and the mutation rate of every column i within the window was calculated as the total number of mutations mapped to nucleotides in column i divided by the total number of nucleotides observed in column i (after filtering). We computed this mutation rate for each TF separately, as well as globally for all TFs. In the latter case, prior to the calculation, we removed any repeated chromosomal positions (from different TFs) observed in a column.

In the case of the analysis center on DHS, we considered flanking stretches of 1000 nucleotides at both sides from DHS peak center and followed the same steps mentioned above to filter mutations and to compute the mutation rate.

Background mutation rate estimation

In order to check if the mutation rate observed at each position was expected due to the local sequence context, we randomly introduced the same number of mutations observed at each window following the probability of occurrence of each mutation according to its tri-nucleotide context. We computed the probability of occurrence of all possible 96 tri-nucleotide changes in each cancer type based on the total number of observed mutations in all its samples. We also computed separate probabilities of occurrence of all 96 tri-nucleotide in active and inactive TFBS from the mutations observed in each category. The mutation rate of each randomly generated set of changes, was computed for each column as explained above. This procedure was repeated 1000 times to compute the mean random mutation rate of every column in the motif.

Enrichment analysis

To identify if TFBS is enriched for mutations compared to the immediate flanking region, we compared the ratio of the total number of mutations to the total number of nucleotide positions within the TFBS region (-15 to 15nts) and that of the flanking region (16 to 1000nts on either side) using a chi-squared test. We performed this test for all transcription factors and for each individual tumor, and corrected the resultingp-values for multiple-testing using the Benjamini-Hochberg procedure²⁸. In addition, we computed the fold change of mutation rates through the expected frequencies obtained from chi-squared tests. Both, the fold change and adjusted p-values are shown in Figure 1b-c.

Nucleotide excision repair data

The genome-wide maps of nucleotide excision repair of two types of UV-induced damage, cyclobutane pyrimidine dimers (CPD) and (6-4) pyrimidine-pyrimidone photoproducts ((6-4)PP), available for three different cell lines –i) wild-type NHF1 skin fibroblasts, ii) XP-C mutants, lacking the global repair mechanism, and iii) CS-B mutants lacking transcription-coupled repair– were obtained from Hu et al., 2015⁶. The dataset contains normalized read counts for fixed steps of 25bp across the genome, for the forward and reverse strands separately. We kept these for our analyses and also generated strand independent data as the average of normalized read counts from both strands for every nucleotide position. These average read counts were mapped to the TFBS centered windows (2001bp), filtered and aligned to the TFBS mid-point as described above. We computed the average repair rate for each column i of these windows as the total number of average read counts mapped to the nucleotides in the column i divided by the total number of nucleotides in the column i, as described above for the mutation rate.

Nucleosome signals

Genome-wide nucleosome positioning signals (density graph) of ENCODE cell line GM12878 (lymphoblastoid cell line) were downloaded via the UCSC genome browser (http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeSydhNsome/). We then mapped them to the TFBS centered windows, and similar to mutation and repair rates, we computed the average signal per column i of the window as the sum of signal values mapped to the nucleotides in column i divided by the total number of nucleotides in column i.

Computational and statistical tools

BEDTools utilities²⁹ were used to carry out operations as extensions or overlaps in the various analyses of genomic features (TFBS/DHS), as well as to map somatic mutations to genomic features. All curve fittings shown in figures (best-fit spline) were performed using the smooth.spline function from R³⁰ (v3.0). The auto-correlation was performed using the acf function from statsmodels python package (http://statsmodels.sourceforge.net/).

ACKNOWLEDGEMENTS

We acknowledge funding from the Spanish Ministry of Economy and Competitiveness (grant number SAF2012–36199), the Marató de TV3 Foundation, and the Spanish National Institute of Bioinformatics (INB). R.S. is supported by an EMBO Long-Term Fellowship (ALTF 568–2014) co-funded by the European Commission (EMBOCOFUND2012, GA-2012–600394) support from Marie Curie Actions. A.G.-P. is supported by a Ramón y Cajal contract.

References

1.↵
Yates, L. R. & Campbell, P. J. Evolution of the cancer genome. Nat. Rev. Genet. 13, 795–806 (2012).
OpenUrl CrossRef PubMed
2.↵
Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012).
OpenUrl CrossRef PubMed Web of Science
3.
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
OpenUrl CrossRef PubMed Web of Science
4.↵
Polak, P. et al. Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair. Nat. Biotechnol. 32, 71–75 (2014).
OpenUrl CrossRef PubMed
5.↵
Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).
OpenUrl CrossRef PubMed
6.↵
Hu, J., Adar, S., Selby, C. P., Lieb, J. D., & Sancar, A. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution, Genes Dev. 29, 948–960 (2015).
OpenUrl Abstract/FREE Full Text
7.↵
ICGC, Hudson, T. J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).
OpenUrl CrossRef PubMed Web of Science
8.↵
Chang, K. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
OpenUrl CrossRef PubMed
9.↵
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
OpenUrl CrossRef PubMed Web of Science
10.↵
Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).
OpenUrl CrossRef PubMed
11.↵
Hara, R., Mo, J. & Sancar, A. DNA Damage in the Nucleosome Core Is Refractory to Repair by Human Excision Nuclease. Molecular and Cellular Biology 20, 9173–9181 (2000).
OpenUrl Abstract/FREE Full Text
12.
Yazdi, P. G. et al. Increasing Nucleosome Occupancy Is Correlated with an Increasing Mutation Rate so Long as DNA Repair Machinery Is Intact. PLoS ONE 10: e0136574 (2015).
OpenUrl
13.↵
Tolstorukov, M. Y., Volfovsky, N., Stephens, R. M. & Park, P. J. Impact of chromatin structure on sequence variability in the human genome. Nat Struct Mol Biol 18, 510–515 (2011).
OpenUrl CrossRef PubMed
14.↵
Reijns, M. et al. Lagging-strand replication shapes the mutational landscape of the genome. Nature 518, 502–506 (2015).
OpenUrl CrossRef PubMed
15.↵
Katainen, R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet 47, 818–821 (2015).
OpenUrl CrossRef PubMed
16.↵
Fredriksson, N. J., Ny, L., Nilsson, J. A., & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).
OpenUrl CrossRef PubMed
17.↵
Marteijn, J. A., Lans, H., Vermeulen, W. & Hoeijmakers, J. H. J. Understanding nucleotide excision repair and its roles in cancer and ageing. Nat Rev Mol Cell Biol 15, 465–481 (2014).
OpenUrl CrossRef PubMed
18.↵
Tornaletti, S. & Pfeifer, G. P. UV Light as a Footprinting Agent: Modulation of UV-induced DNA Damage by Transcription Factors Bound at the Promoters of Three Human Genes. Journal of Molecular Biology 249, 714–728 (1995).
OpenUrl CrossRef PubMed Web of Science
19.↵
Gale, J. M., Nissen, K. A. & Smerdon, M. J. UV-induced formation of pyrimidine dimers in nucleosome core DNA is strongly modulated with a period of 10.3 bases. Proceedings of the National Academy of Sciences 84, 6644–6648 (1987).
OpenUrl Abstract/FREE Full Text
20.↵
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
OpenUrl CrossRef PubMed Web of Science
21.↵
Goodman, M. F. & Woodgate, R. Translesion DNA Polymerases. Cold Spring Harbor Perspectives in Biology 5, (2013)
22.↵
Nouspikel, T. DNA Repair in Mammalian Cells. Cell. Mol. Life Sci. Cellular and Molecular Life Sciences 66, 994–1009 (2009)
OpenUrl
23.↵
Khurana, E. et al. Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics. Science 342, 1235587–1235587 (2013).
OpenUrl Abstract/FREE Full Text
24.↵
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
OpenUrl Abstract/FREE Full Text
25.↵
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
OpenUrl CrossRef PubMed
26.↵
Derrien, T. et al., Fast computation and applications of genome mappability. PLoS One 7:e30377 (2012).
OpenUrl CrossRef PubMed
27.
Stouffer, S. A, Suchman, E. A., DeVinney, L. C., Star, S. A. & Williams, R. M. Jr.. The American Soldier, Vol.1: Adjustment during Army Life. Princeton University Press, Princeton (1949).
28.↵
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a pratical and powerful approach to multiple testing. J. Roy. Stat. Soc. 57, 289–300 (1995).
OpenUrl
29.↵
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
OpenUrl CrossRef PubMed Web of Science
30.↵
R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria