Abstract
All normal somatic cells are thought to acquire mutations. However, characterisation of the patterns and consequences of somatic mutation in normal tissues is limited. Uterine endometrium is a dynamic tissue that undergoes cyclical shedding and reconstitution and is lined by a gland-forming epithelium. Whole genome sequencing of normal endometrial glands showed that most are clonal cell populations derived from a recent common ancestor with mutation burdens differing from other normal cell types and manyfold lower than endometrial cancers. Mutational signatures found ubiquitously account for most mutations. Many, in some women potentially all, endometrial glands are colonised by cell clones carrying driver mutations in cancer genes, often with multiple drivers. Total and driver mutation burdens increase with age but are also influenced by other factors including body mass index and parity. Clones with drivers often originate during early decades of life. The somatic mutational landscapes of normal cells differ between cell types and are revealing the procession of neoplastic change leading to cancer.
Introduction
Acquisition of mutations is a ubiquitous and essential feature of the cells of living organisms. Although there has been comprehensive characterisation of the somatic mutation landscape of human cancer1-3, understanding of patterns of somatic mutation in normal cells is limited. In large part this has been due to the challenge of detecting somatic mutations in normal tissues and several strategies have recently been developed to address this including sequencing of in vitro derived clonal cell populations from normal tissues4-8, sequencing small biopsies containing limited numbers of microscopic clones9,10, sequencing microscopically distinguishable structural elements which are clonal units11,12, highly error corrected sequencing13,14 and sequencing single cells15,16. Together, these have begun to reveal differing mutation burdens between different cell types, their patterns of acquisition over time and the signatures of the mutational processes generating them. They have also shown that, in normal tissues, clones of normal cells with “driver” mutations in cancer genes are present. In the glandular epithelium of the colon these are relatively uncommon12 but, in the squamous epithelia of the skin9 and oesophagus10 and other tissues, such as the blood17-21, clones carrying drivers can constitute substantial proportions of normal cells present after middle age.
The factors determining differences in mutation landscape between normal cell types are incompletely understood. However, they plausibly include the intrinsic structural and physiological features of each tissue. Endometrium is a uniquely dynamic tissue composed of a stromal cell layer invaginated by a contiguous glandular epithelial sheet covering the luminal surface. It adopts multiple different physiological states during life including premenarche, menstrual cycling, pregnancy, and postmenopause. During reproductive years it undergoes cyclical breakdown, shedding, repair and remodelling in response to oscillating levels of oestrogen and progesterone which entail iterative restoration of the contiguity of the interrupted glandular epithelial sheet that is effected by stem cells within basal glands retained after menstruation22-25.
Characterisation of the mutational landscapes of normal tissues is beginning to provide comprehensive understanding of the succession of intermediate neoplastic stages between normal cells and cancers originating from them. There are two major histological classes of endometrial carcinoma26,27. Type I, endometrioid carcinoma, is commoner with the main known risk factor being extent of oestrogen exposure, influenced by early menarche, late menopause and body mass index (BMI)27,28. Type II, including serous and clear cell carcinomas, occurs in older women with smoking, age and elevated BMI as recognised risk factors29. Commonly mutated cancer genes include PTEN, TP53, PIK3CA, KRAS, ARID1A, FBXW7 and PIK3R130 and subsets of endometrial cancer carry large numbers of base substitution and/or small insertion and deletion (indel) mutations due to defective DNA mismatch repair, polymerase epsilon/delta mutations, or large numbers of copy number changes and genome rearrangements26,31.
Recent studies using exome and targeted sequencing have revealed the presence of driver mutations in known cancer genes in a high proportion of endometrial glands in endometriosis11,32,33, and also in eutopic normal endometrial epithelium11. Here, by whole genome sequencing, we have further characterised the mutational landscape of normal endometrial epithelium, explored how it is influenced by age, BMI and parity, estimated the age of driver mutations and the relationship of clonal evolution to glandular architecture.
Results
Samples and sequencing
Using laser capture microdissection (LCM) 215 histologically normal endometrial glands were isolated from 18 women aged 19 to 81 years. The samples were from biopsies taken for infertility assessments (6), hysterectomies for benign non-endometrial pathologies (2), residual tissues from transplant organ donors (6) and autopsies after death from non-gynaecological causes (4). DNA from each gland was whole genome sequenced using a library-making protocol modified to accommodate small amounts of input DNA12. The mean sequencing coverage was 28-fold and only samples with >15-fold coverage were included in subsequent analyses (n=182, Supplementary Table 1, Supplementary Results 1). Somatic mutations in each gland were determined by comparison with whole genome sequences from pieces of uterus, cervix or Fallopian tube from the same individuals. From each of 18 glands two separate samples were obtained and subjected to independent DNA extraction, library preparation and whole genome sequencing. Using these biological “near-replicates” the mean sensitivity of somatic mutation variant calling was estimated at >86% (range 0.70 – 0.95%) (Methods).
Clonality of endometrial glands
To assess whether endometrial glands are clonal populations derived from single recent ancestor cells the variant allele fractions (VAFs) of somatic mutations were examined. Most somatic mutations are heterozygous. Heterozygous mutations present in all cells of a population derived from a single ancestor will have VAFs of 0.5 whereas somatic mutations in cell populations derived from multiple ancestors will have lower VAFs or be undetectable by standard mutation calling approaches. 90% (163/182) of microdissected endometrial glands showed distributions of base substitution VAFs with peaks between 0.3 and 0.5 indicating that each consists predominantly of a cell population descended from a single epithelial progenitor stem cell with contamination by other cells potentially including endometrial stromal cells, inflammatory cells and epithelial cells from other glands (Fig. 1, Supplementary Results 1). Similar VAF distributions were observed for small insertions and deletions (indels). Subsequent analyses (see below) have demonstrated that many endometrial glands carry “driver” mutations in known cancer genes. However, endometrial glands exhibited clonality irrespective of the presence of driver mutations with, for example, somatic mutations in all 10 glands from a 19-year-old individual (PD37506) having a median VAF >0.3 but no driver mutations identified (Extended Data Fig. 1a, b). Thus, colonisation of endometrial glands by descendants of single endometrial epithelial stem cells is not contingent on growth selective advantage provided by driver mutations and may occur by a process analogous to genetic drift, as proposed for other tissues34,35.
Mutation burdens
The somatic mutation burdens in normal endometrial glands from the 18 individuals ranged from 225 to 2890 base substitutions (mean 1324) and 3 to 243 indels (mean 85) (Fig. 2a, b). In large part this variation was attributable to the ages of the individuals with a linear accumulation of ∼28 base substitutions per gland per year during adult life (linear mixed-effect model, SE = 3.1, P = 1.061e-07) (Supplementary Results 2). However, the possibilities of lower mutation rates premenarche and postmenopause cannot be excluded. The potential influences of BMI, a known risk factor for endometrial cancer, and the presence of driver mutations on mutation burden were also examined. An additional 20 substitutions were acquired with each unit of BMI (SE = 8, P = 2.330e-02). Therefore, the association between elevated BMI and increased endometrial cancer risk may, at least partially, be mediated by this additional mutation burden induced by BMI in normal endometrial epithelial stem cells. Positive driver mutation status conferred an addition of ∼177 substitutions (SE = 45.7, P = 1.632e-04). The basis of this correlation is unclear. It is conceivable that an elevated total mutation load increases the chances of including, by chance, a driver. It is also plausible, however, that drivers engender biological changes, for example elevated cell division rates, that result in higher overall mutation loads. There was no obvious correlation between parity and mutation burden.
In addition to endometrial glands, nearby normal endocervical glands were microdissected from one individual (PD37506). There was a ∼2-fold lower somatic mutation burden in endocervical than endometrial glands (Extended Data Fig. 3). This may reflect the absence, in endocervical glands, of the cyclical process of loss and regeneration that occurs in endometrial glands.
Mutational signatures
To explore the underlying processes of somatic mutagenesis operative in normal endometrial epithelial cells mutational signatures were analysed. Three previously described single base substitution (SBS) mutational signatures were identified in all endometrial glands (Extended Data Fig. 2): SBS1, predominantly characterised by NCG>NTG mutations and likely due to spontaneous deamination of 5-methylcytosine; SBS5, a relatively featureless, ‘flat’ signature of uncertain cause; SBS18, predominantly characterised by C>A substitutions and possibly due to reactive oxygen species36. Overall, the mean signature exposures per gland were 0.22 for SBS1, 0.59 for SBS5 and 0.17 for SBS18; interestingly, glands from one donor with a history of recurrent missed miscarriage (RMM) showed much higher mean SBS18 exposure (0.35) compared to the rest of the cohort. There were approximately 2.7-fold more SBS5 than SBS1 mutations (SD 0.4171666). A positive linear correlation with age for the mutation burden attributable to all three signatures was observed (Fig. 2d, e, f). To ascertain the periods during which different mutational processes operate, phylogenetic trees of endometrial glands were constructed for each individual using somatic mutations (Figs. 3, 4). These revealed that the mutational processes underlying the three signatures are active throughout life. With respect to small indels, composite mutational spectra for each donor were generated and were similar across ages; however, due to the relative sparsity of indels in normal endometrial glands, formal signature extraction was not performed (Extended Data Figure 3).
Somatic copy number changes and structural variants (genome rearrangements) were found in only 27 out of 182 (15%) normal endometrial glands (Fig. 2c, Supplementary Results 3). These included copy number neutral loss of heterozygosity (cnn LOH) in six glands, whole chromosome copy number increases in one and structural variants in eighteen (12 large deletions, six tandem duplications and nine translocations). The majority of glands showed a single change. However, one of two glands carrying a TP53 mutation (see below) exhibited nine structural variants, indicating that genomic instability caused by defective DNA maintenance occurs in normal cells.
Driver mutations
To identify genes under positive selection a statistical method based on the observed:expected ratios of non-synonymous:synonymous mutations was used30. Eleven genes showed evidence of positive selection in the 182 normal endometrial glands; PIK3CA, PIK3R1, ARHGAP35, FBXW7, ZFHX3, FOXA2, ERBB2, CHD4, KRAS, SPOP and ERBB3 (Supplementary Results 4). All were present in a set of 369 genes previously shown to be under positive selection in human cancer30. In addition, four different truncating mutations (and no other mutations) were observed in the progesterone receptor gene (PGR). Although these did not attain standard significance levels the biological role that progesterone plays in normal endometrium as an antagonist of oestrogen driven proliferation raises the possibility that these inactivating mutations confer growth advantage. To comprehensively identify drivers in the 182 endometrial glands, mutations with the characteristics of drivers in each of the 369 genes were sought (Methods).
163 driver mutations were found in normal endometrial glands from 17/18 women (Supplementary Results 5). The youngest carrier was a 24 year old (PD40535) with a KRAS G12D mutation in 1/7 glands sampled. 58% (105/182) of endometrial glands carried at least one driver mutation, 19% (35/182) carried at least two and 3% (5/182) carried at least four drivers. Remarkably, in four women, aged 34 (19 glands), 44 (11 glands), 60 (14 glands) and 81 (5 glands), all glands analysed carried driver mutations suggesting that the whole endometrium had been colonised by microneoplastic clones (Figs 3, 4). The fraction of endometrial glands carrying a driver (Fig. 2g), the average number of drivers per gland (Fig. 2h) and the number of different drivers in each individual (corrected for number of glands sampled) (Fig. 2i) all positively correlated with age of the individual. However, there were sufficient outliers from this age correlation to suggest that other factors influence colonisation of the endometrium by driver carrying clones. Using a generalised linear mixed effect model, we found that age has a positive association with the accumulation of driver mutations (coefficient = 0.0336, SE = 0.0131), while parity has a negative association (coefficient = - 0.330, SE = 0.117) (Supplementary Results 6 and 7).
Driver mutations in both recessive (tumour suppressor genes) and dominant cancer genes were found. PIK3CA was the most frequently mutated cancer gene, with at least one missense mutation in 61% (11/18) of women and five different mutations found in two women (Fig. 3 and 4, Extended Data Fig. 4). Most truncating driver mutations in recessive cancer genes (including in ZFHX3, ARGHAP35 and FOXA2 which showed evidence of selection in normal endometrial glands, see above) were heterozygous without evidence of a mutation inactivating the second, wild type allele. Therefore, haploinsufficiency of these genes appears sufficient to confer growth advantage in normal cells. Nevertheless, further inactivating mutations, including copy number neutral LOH of the wild type allele and truncating mutations, in the same genes in other glands indicate that additional advantage is conferred by complete abolition of their activity (notably for ZFHX3 in the 60 year old, Figure 3 and Exended Data Fig. 5). Driver mutations were found in genes encoding growth factor receptors (ERBB2, ERBB3, FGFR2), components of signal transduction pathways (HRAS, KRAS, BRAF, PIK3CA, PIK3R1, ARHGAP35, RRAS2, NF1, PP2R1A, PTEN), pathways mediating steroid hormone responses (ZFHX3, FOXA2, ARHGAP35), pathways mediating WNT signalling (FBXW7) and proteins involved in chromatin function (KMT2D, ARID5B). Many different combinations of mutated cancer genes were found in individual glands.
Driver mutations were placed on the phylogenetic trees of somatic mutations constructed for each individual and, by assuming a constant somatic mutation rate during life, the time of occurrence of a subset was estimated (Methods). Some driver mutations occurred early in life. These included a KRAS G12D mutation in three glands from a 35 year old and a PIK3CA mutation in two glands from a 34 year old, which are both likely to have arisen during the first decade. A pair of drivers in ZFHX3 and PIK3CA, co-occurring in six glands from a 60 year old, was also acquired during the first decade indicating that driver associated clonal evolution begins early in life. There was evidence, however, for continuing acquisition and clonal expansion of driver mutations into the third and fourth decades and further accumulation beyond this period is not excluded.
Phylogeography of mutations within the endometrium
Phylogenetically closely related glands were often in close physical proximity within the endometrium (Fig. 3). In phylogenetic clusters for which the mutation catalogues were almost identical, this may simply reflect multiple sampling of a single tortuous gland weaving in and out of the plane of section, rather than distinct glands with their own stem cell populations (e.g. glands C5 and E5, Figs. 3a, c). For other phylogenetic clusters, the different branches within the clade have diverged substantially, sometimes acquiring different driver mutations, and therefore are likely derived from different stem cell populations. In such instances phylogenetically related glands can range over distances of multiple millimetres suggesting that their clonal evolution has entailed capture and colonisation of extensive zones of endometrium (e.g. glands C1, A2, B1, H2, A3, B3, Figs. 3b, d). Conversely, many glands in close physical proximity are phylogenetically distant (e.g. glands E1 and G2, Figs 3a, c), indicating that the cell populations have remained isolated from each other.
Normal endometrium compared to other cells
Endometrial cells exhibit lower mutation rates than normal skin epidermal9, colorectal4,12, small intestinal4,12 and liver cells4, similar burdens to oesophageal cells10 and higher rates than skeletal muscle cells7 (Extended Data Fig. 7). Of the mutational signatures found in endometrial cells, SBS1 and SBS5 are found in all other cell types37. However, the SBS1 mutation rate is higher in colorectal and small intestinal epithelial cells whereas the SBS5 mutation rate is higher in liver cells4. SBS18 has also been found ubiquitously in colonic crypts12.
The prevalence of driver mutations is substantially different in different normal cell types. In colon, like the endometrium a tissue with glandular architecture, only ∼1% crypts (glands) in 60 year old individuals carry a driver mutation12 compared to the much higher fractions (up to 100%) in the endometrial glands of 60 year old women. The biological basis of this difference is unclear but is unlikely to be the difference in total mutation burden, which is lower in the endometrium than the colon.
Endometrial cancers exhibit higher mutation loads than normal endometrial cells, for base substitutions (∼5-fold, medians of 1346 and 7330 substitutions observed in normal endometrium and endometrial cancer respectively (Mann-Whitney test, P = 7.629e-06) (Fig. 5a)) and indels (Fig. 5b) and these differences also pertain to normal endometrial cells with driver mutations. In most endometrial cancers these differences are attributable to higher mutation burdens of the ubiquitous base substitution and indel mutational signatures. In addition, however, the very high mutation loads of the subsets of endometrial cancers with DNA mismatch repair deficiency and polymerase epsilon/delta mutations were not seen in normal endometrial cells. Differences between endometrial cancers and normal cells were even more marked for structural variants and copy number changes (median number zero in normal endometrial cells and ∼23 in endometrial cancers38) and this again pertained to normal endometrial cells with drivers.
There were also differences in the repertoire of cancer genes in which driver mutations were found (Fig 5 d,e,f, Supplementary Results 4 and 8). Notably, mutations in PTEN, CTCF, CTNNB1 and ARID1A in endometrioid and in TP53 in serous endometrial cancer accounted for higher proportions of driver mutations than in normal endometrial cells. It is possible that PTEN, ARID1A, TP53 and CTCF require biallelic mutation to confer growth advantage and this may account for their lower prevalence in normal cells. However, heterozygous mutations in PTEN and TP53 were found, albeit rarely and restricted to the two oldest individuals studied (69 and 81-year old), and this explanation would not account for the relative deficit of CTNNB1 mutations. Overall, the results suggest that driver mutations in some cancer genes may be relatively effective at enabling stem cell colonisation of normal tissues but confer limited risk of conversion to invasive cancers. Conversely, others may require biallelic mutation and/or confer limited advantage in colonising normal tissues but are relatively effective at conversion to malignancy.
Discussion
This study of normal endometrial epithelium, together with recent studies of other normal cell types4,5,9-12,17,18, is revealing the landscape of somatic mutations in normal human cells. The landscape is characterised by different somatic mutation rates in different cell types that, for the most part, are generated by a limited repertoire of ubiquitous mutational processes generating base substitutions, small indels, genome rearrangements and whole chromosome copy number changes. These processes exhibit more or less constant mutation rates during the course of a lifetime resulting in essentially linear accumulation of mutations with age. However, the influences of BMI and the presence of driver mutations on mutation burden in endometrial epithelium indicate that additional factors can modulate their mutation rates. The reasons for the different mutation rates of ubiquitous signatures in different tissues are unclear. For SBS1, which is likely due to deamination of 5-methylcytosine, the differences may be related to the number of mitoses a cell has experienced. Additional mutational signatures which are present only in some cells, only in some cell types and/or are intermittent also operate in normal cells, supplementing the mutation load contributed by ubiquitous signatures. The latter include exposures such as ultraviolet light in skin9, APOBEC mutagenesis in occasional colon crypts and other signatures of unknown cause in normal colon epithelium12.
A small subset of mutations generated by these mutational processes have the properties of driver mutations. The total somatic mutation rate is lower in endometrial than colonic epithelial stem cells and thus the rate of generation of driver mutations is also likely to be lower. However, numerous cell clones with different driver mutations, some carrying multiple drivers, colonise much of, and in some cases potentially all of, the normal endometrial epithelium in most women. This is in marked contrast to the colon where just 1% of normal crypts in middle-aged individuals carry a driver12. This dramatic difference may be due to intrinsic differences in physiology between endometrium and colon. In the endometrium, the cyclical process of tissue breakdown, shedding and remodelling iteratively opens up denuded terrains for pioneering clones of endometrial epithelial cells with drivers to preferentially colonise compared to wild type cells. By contrast, in the colon the selective advantage of a clone with a driver is usually confined to the small siloed population of a single crypt, with only occasional opportunities for further expansion. Thus, the endometrium in some respects resembles more the squamous epithelia of skin and oesophagus in which cell clones derived from basal cells directly compete against each other for occupancy of the squamous sheet and in which substantial proportions of such sheets become colonised over a lifetime by normal cell clones carrying driver mutations9,39. Although this rampant colonisation by driver clones in endometrium progresses with age, it is already well advanced in some young women, and parity apparently has an inhibitory effect on it, indicating that multiple factors influence its progression. More extensive studies of the mutational landscape in normal endometrium are required to better assess how pregnancy, the premenarchical and postmenopausal states, hormonal contraceptive use and hormone replacement therapies influence it and also the potential impact it has on pregnancy and fertility.
The burdens of all mutation classes are lower in normal endometrial cells, including those with drivers, than in endometrial cancers. However, these differences are most marked for structural variants/copy number changes and for the extreme base substitution/indel hypermutator phenotypes due to DNA mismatch repair deficiency and polymerase delta/epsilon mutations which were not found in normal endometrium. The results therefore suggest that in endometrial epithelium, and in other tissues thus far studied including colon, oesophagus and skin, normal mutation rates are sufficient to generate large numbers of microneoplastic clones with driver mutations behaving as normal cells, but that acquisition of an elevated mutation rate and burden is associated with further evolution to invasive cancer. Given that the endometrial epithelium is extensively colonised by clones of normal cells with driver mutations in middle-aged and older women and that the lifetime risk of endometrial cancer is only 3%40, this conversion from microneoplasm to symptomatic malignancy appears to be extremely rare. Driver mutations in normal endometrium often appear to arise and initiate clonal expansion early in life. It is therefore plausible that some neoplastic clones ultimately manifesting as cancer were initiated during childhood, although the fraction to which this might apply is unclear.
This study has added endometrial epithelial cells to the set of normal cell types in which the landscape of somatic mutations has been characterised. However, most normal tissues have not been investigated in this way. The outcomes of the current studies showing differences in mutation burdens, mutational signatures and prevalence of driver mutations mandates a systematic characterisation of the somatic mutation landscape in all normal human cell types.
AUTHOR CONTRIBUTIONS
MRS and LM designed the study and wrote the manuscript with contributions from all authors. KSP, CAID, JJB, KM, MJL and LM obtained samples. PE and LM devised the protocol for laser-capture microscopy, DNA extraction and sequencing of endometrial glands. LM prepared sections, reviewed histology, micro-dissected and lysed endometrial glands. YH assisted with tissue processing and section preparation. LM performed data curation and analysis with the help from DL, THHC, MAS, KD, JN, PST, SFB, HLS, and RR. THHC reconstructed phylogenetic trees. MAS devised filters for substitutions and structural variants. DL, FM and SM assisted with signature analyses. IM assisted with statistical and dnds analyses. PJC oversaw statistical analyses and performed analysis of structural variants. MRS supervised the study.
ACKNOWLEDGEMENTS
This work was supported by the Wellcome Trust. LM is a recipient of a CRUK Clinical PhD fellowship (C20/A20917) and Pathological Society of Great Britain and Ireland Trainee Small Grant (Grant Reference No 1175). SFB was supported by the Swiss National Science Foundation (P2SKP3-171753 and P400PB-180790). MAS is supported by a Rubicon fellowship from NWO (019.153LW.038).
We thank Laura O’Neil, Calli Latimer and Paul Scott for technical support; Feran Nadeu and Jingwei Wang for their advice on mutational signature extraction; Thomas J Mitchell, Nicola Roberts and Andrew R.J. Lawson for their assistance with data analysis. We are also grateful to the Cambridge Biorepository for Translational Medicine for the provision of samples from deceased transplant organ donors.