Summary
Genetic diversity plays a central role in tumor progression, metastasis, and resistance to treatment. Experiments are shedding light on this diversity at ever finer scales, but interpretation is challenging. Using recent progress in numerical models, we simulate macroscopic tumors to investigate the interplay between global growth dynamics, microscopic composition, and circulating tumor cell cluster diversity. We find that modest differences in growth parameters can profoundly change microscopic diversity. Simple outwards expansion leads to spatially segregated clones, as expected, but a modest cell turnover can result in mixing at the microscopic scale, consistent with experimental observations. Using multi-region sequencing data from a Hepatocellular Carcinoma patient to validate our models, we propose that deep multi-region sequencing is well-powered to distinguish between leading models of cancer evolution. The genetic composition of circulating tumor cell clusters, which can be obtained from non-invasive blood draws, is therefore informative about tumor evolution and its metastatic potential.
Highlights
Numerical and theoretical models show interaction of front expansion, selection, and mixing in shaping tumor heterogeneity.
Cell turnover increases intratumor heterogeneity
Simulated circulating tumor cell clusters and microbiopsies exhibit substantial diversity.
Simulations suggest attainable sampling schemes able to distinguish between prevalent tumor growth models.
Introduction
Most cancer deaths are due to metastasis of the primary tumor, which complicates treatment and promotes relapse (Nguyen, Bos, and Massagué 2009; Eccles and Welch 2007; Holohan et al. 2013). Circulating tumor cells (CTC) are bloodborne enablers of metastasis that can be isolated and genetically characterized (Massagué and Obenauf 2016; Aceto et al. 2014). Counts of single CTCs have been used to predict tumor progression (Cristofanilli, Budd, et al. 2004; Cristofanilli, Hayes, et al. 2005) and monitor curative and palliative therapies in breast (Rack et al. 2014; Hayes et al. 2006) and lung cancers (Maheswaran et al. 2008). CTCs have also been isolated in clusters of 2-30 cells (Marrinucci et al. 2012). These CTC clusters, though rare, are associated with more aggressive metastatic cancer and poorer survival rates in mice and breast and prostate cancer patients (Aceto et al. 2014).
Cellular growth within tumors follows Darwinian evolution with sequential accumulation of mutations and selection resulting in subclones of different fitness (Nowell 1976; Greaves and Maley 2012). Certain classes of mutations are known to give cancer cells advantages beyond local growth rates. For example, acquiring mutations in ANGPTL4 in breast tumors does not appear to provide a growth advantage to cells in the primary, however it enhances metastatic potential to the lungs (Padua et al. 2008). Similarly, breast tumors are more likely to metastasize into the lung or brain if they acquire mutations in TGFβ or ST6GALNAC5, respectively (Bos et al. 2009; Padua et al. 2008). These genes are referred to as metastasis progression genes or metastasis virulence genes (Nguyen, Bos, and Massagué 2009; Nguyen and Massagué 2007).
Mutations, including those in metastasis progression and virulence genes, are not uniformly distributed in the tumor. Tumors show substantial intratumoral heterogeneity (ITH) (Navin et al. 2010; Sottoriva et al. 2015; McGranahan and Swanton 2015) where subclones have private mutations that can lead to subclonal phenotypes (Yates et al. 2015; J. Zhang et al. 2014; Gerlinger, Horswell, et al. 2014). A high degree of ITH can allow tumors to explore a wide range of phenotypes: this might result in a few cancer cells that have a metastatic phenotype in early tumor growth. Additionally, ITH can contribute to therapy resistance and relapse (Hiley et al. 2014; Holohan et al. 2013). Studying ITH is therefore important for understanding cancer progression and improving therapeutic and prognostic decisions (Hiley et al. 2014; Jamal-Hanjani, Hackshaw, et al. 2014; Alizadeh et al. 2015). To capture the complete mutational spectrum of a primary tumor, multiple study designs have been proposed that divide the tumor into regionally representative samples, known as multiregion sequencing (Gerlinger, Rowan, et al. 2012; Gerlinger, Horswell, et al. 2014; J. Zhang et al. 2014; Yates et al. 2015).
Next-generation sequencing (NGS) of single CTCs has shown that they have similar genetic composition to both the primary and metastatic lesions (Heitzer et al. 2013), and can therefore be used as a non-invasive liquid biopsy to study tumors and tumor heterogeneity, monitor response to therapy, and determine patient-specific course of treatment (Powell et al. 2012; Heitzer et al. 2013; Krebs et al. 2014; Hodgkinson et al. 2014).
Here we ask whether genetic heterogeneity within individual circulating tumor cell clusters can be informative about solid tumor progression. Because CTC clusters are thought to originate from neighboring cells in the tumor (Aceto et al. 2014), heterogeneity within CTC clusters is closely related to cellular-scale genetic heterogeneity within tumors. We therefore interpret our simulation results as informative about both micro-biopsies and circulating tumor cell clusters.
We used an extension1 of the simulator described in Waclaw et al. (Waclaw et al. 2015) to study the interplay of tumor dynamics, CTC cluster diversity, and metastatic outlook. We show that fine-scale tumor heterogeneity, and therefore CTC cluster composition, depend sensitively on the tumor growth dynamics and sampling location. Simulated data is consistent with recent sequencing experiments, but slightly finer sampling will provide stringent tests that distinguish between state-of-the-art models. These findings further reinforce the utility of fine-scale tumor profiling and CTC clusters as clinical tools to elucidate tumor information and clinical outlook (Mateo et al. 2014; Ignatiadis, Lee, and Jeffrey 2015).
Simulation Model
To simulate the growth of solid tumors, we use TumorSimulator2 (Waclaw et al. 2015). The software is able to simulate a tumor containing 108 – 109 cells, or roughly 2 cubic centimeters, in 24 core-hours. The tumor consists of cells that occupy points in a 3D lattice. Cells do not move in this model: The tumor evolves through cell division and death. Empty lattice sites are assumed to contain normal cells which are not modelled in TumorSimulator.
Each cell has an associated list of genetic alterations which represent single nucleotide polymorphisms (SNPs) that can be either passenger or driver. Driver mutations increase the growth rate by a factor 1 + s, where s ≥ 0 is the average selective advantage of a driver mutation.
The simulation begins with a single cell that already has an unlimited growth potential. Tumor growth then proceeds by selecting a mother cell randomly. It then divides with a probability b0 (1 + s)k where b0 is the initial birth rate and k is the number of driver mutations. New cells are given new passenger and driver mutations according to two independent Poisson distributions parameterized by two mutation rates. The mother cell dies with a probability proportional to the death rate, d. Further details of the algorithm are described in Supplemental Methods. Values of b0, s are selected as in Waclaw et al. 2015. The mutation rates are selected as in Waclaw et al. 2015 to facilitate comparisons between simulations and Ling et al. 2015 to match empirical observations.
We consider three turnover scenarios corresponding to three models for the death rate d: (i) No turnover (d = 0), corresponding to simple clonal growth (Hallatschek et al. 2007); (ii) Surface Turnover (d(x, y, z) > 0 only if x, y, z is on the surface), corresponding to a quiescent core model (Shweiki et al. 1995) (iii) Turnover (d > 0 everywhere), a model favored in Waclaw et al. 2015 to explore ITH.
Results
Global composition
To determine the effect of the growth dynamics on global intratumor heterogeneity, we first consider the allele frequency spectra for different turnover models (Fig 1, S1). In all cases, a majority of driver and passenger genetic variants are at frequency less than 1%, as expected from theoretical and empirical observations (Wang et al. 2014). Passenger mutations represent the bulk of ITH, consistent with the theoretical and experimental evidence that neutral evolution drives most ITH (Williams et al. 2016). For simulations with low to moderate death rate, d ∈ {0.05, 0.1, 0.2}, we find that the frequency spectra are very similar across the three turnover models (Fig 1, S1): A low death rate has little impact on the global composition of a tumor.
When the death rate is increased to d = 0.65, as in Waclaw et al. 2015, the different models produce distinct frequency spectra (Fig 1b). As in Waclaw et al. 2015, we find that the number of high-frequency drivers is higher in the turnover model than in the no turnover model. Whereas Waclaw et al. interpreted this observation as an indication that turnover reduces diversity, we find that diversity is in fact increased for all types of variants and at all frequencies. The number of somatic mutations in the turnover model is 3.4 times higher than in the surface turnover model and 6.2 times higher than in the no turnover model. This is primarily due to a higher number of cell divisions required to reach a given tumor size when cell death occurs throughout the tumor (Table S1). The Waclaw et al. model uses a death rate of d = 0.65, which is a staggering 95% of the birth rate. The turnover model therefore has 8.3 times more cell divisions to reach a given size, and the surface turnover has 4 times more cell divisions than the no turnover model (Table S1).
We find a large excess of rare variants compared to most previous analytical models of tumor evolution. The Wright-Fisher model for a constant-sized population (the “standard neutral model”) predicts that the distribution ϕ(f) of mutations with frequency f decays as f-1. Recently published tumor models that account for exponential population growth in a coalescent or branching process framework (Ohtsuki and Innan 2017) predict ϕ(f) ∼ f-1 to ϕ(f) ∼ f-2, depending on model parameters
Here we observe that, for variants above 1% in frequency, ϕ(f) ≃ f-2.5. The tumor model studied here departs from these previous models in three ways: the rate of population growth, the presence of selection, and differential growth in the core and edge of the tumor. Selection itself has a weak effect on the scaling behavior (Fig S2), and the different turnover models exhibit similar scaling, suggesting that the overall growth rate is not the culprit. We find that differential growth across the tumor explains most of the discrepancy. In fact, a simple deterministic and neutral geometric model with differential growth accurately predicts the observed decay ϕ(f) ∼ f-2.5 (Figs 1 and S2).
A geometric model
Here we model the tumor as a continuously growing sphere where only surface cells divide. If a mutation appears in a cell at the surface of the tumor at a time when the tumor has radius r, we suppose that this mutation occupies a cross-section area a2 of the tumor surface. It therefore occupies a fraction of the surface of the tumor at that point. If the tumor grows radially outwards, the descendants of this cell occupy a fraction of the space yet to be occupied, and the mutation itself will occupy a fraction of the final tumor, which is the volume of a spherical cone with its tip removed. We can then integrate over all possible radii r where mutations occur. The density ρ(r) of mutations occurring at radius r is proportional to the density of cells at that locus with µ the mutation rate per cell. The frequency spectrum is therefore
If we focus on common mutations, which occurred at r ≪ R, we can approximate , leading to
We show in Supplemental Methods that a model accounting for stochastic fluctuations in the early reproductive success of a mutation preserves this scaling behavior, but with an overall scale factor α that depends on details of the growth model, i.e.
Fig 1 shows the agreement of simulation results to the geometric model with α = 42. Variants at less than 1% frequency follow a distinct power law that is closer to the ϕ(f) = f-2 described in (Ohtsuki and Innan 2017).
Cluster diversity depends on sampling position and turnover rate
To study the effect of cluster size, position of origin, and evolutionary model on CTC cluster composition, we sampled groups of cells across tumors (More details in CTC cluster synthesis). To assess genetic heterogeneity within clusters, we consider the number of distinct somatic mutations, S(n), among cells in clusters of size n.
As expected, we find that larger CTC clusters have more somatic mutations (Fig 2, S3). By contrast with global diversity patterns, we find that moderate turnover has a profound impact: Clusters from models with low turnover have many more somatic mutations than in the no turnover model (Fig 2a,b). Surface turnover has little effect on cluster diversity (Fig S3).
Fig 2 also shows the relationship between a CTC cluster’s shedding location (i.e. its distance to the tumor center-of-mass when it was sampled) and its genetic content. No turnover and surface turnover models show similar trends of increasing diversity with distance (Fig S3). Full turnover models show an opposite trend of decreasing diversity with distance in clusters of intermediate size (Fig 2b-d and S4 for d = 0.1, 0.2, and 0.65, respectively). However, these trends revert again when considering large clusters with thousands of cells (Fig 3).
Comparison with multi-region sequencing data
We did not have access to large-scale sequencing data for micro-biopsies. To validate predictions of our model, we therefore used multi-region sequencing data from a Hepatocellular Carcinoma (HCC) patient presented in (Ling et al. 2015) (Fig 3a). The HCC data contained 23 sequenced samples from a single tumor each with ≈ 20, 000 cells, therefore we used our sampling scheme to produce 23 biopsies of comparable sizes (20, 000 cells). The distance measurements were made using ImageJ (Schneider, Rasband, and Eliceiri 2012) and Fig S1 from Ling et al. 2015. Since (Ling et al. 2015) could only reliably call variants at more than 10% frequency, we used a similar frequency cutoff in our simulations. The HCC data does not show a clear spatial trend, (Fig 3a) similarly to the model without turnover, (Fig 3c), whereas the model with turnover predicts a detectable trend at comparable sample size (Fig 3d). However, we have little statistical power to distinguish between the models.
We therefore investigated the study design that would be needed to effectively distinguish between the different models proposed here. Based on our simulations, power depends on cluster size, number of clusters sampled, and the choice of frequency cutoff. Interestingly, even though the spatial trends in diversity are undetectable in large clusters across all frequencies (Fig S5), they are restored if we impose a frequency cutoff (Fig 3c, d): The large number of rare, recent variants overwhelms the signal about early tumor evolution that can be gathered from older, common variants.
Overall, the trends observed in Fig 2 are barely detectable with the current sample size but could be detected with modest increases (Fig 3b). For biopsies containing tens of thousands of cells, the number of spatially distributed samples needed is ≈ 40, roughly twice the size of the HCC dataset. Alternatively, ≈ 30 small cluster (23-30 cells) samples are necessary to reliably detect spatial patterns. Furthermore, intermediate-sized clusters show opposite trends to both small and large clusters in the different models (Fig 3b and S6). Thus small cluster sequencing may increase our power to discriminate between leading models.
CTC clusters derived from turnover models are more likely to contain virulent mutations
Metastasis is an inefficient process (Massagué and Obenauf 2016) in that most CTCs are eliminated from the circulatory system or fail to survive in the new microenvironment. We hypothesize that the genetic composition of CTC clusters influences the likelihood of implantation into a new microenvironment. More specifically, genetic heterogeneity within a cluster may contribute to implantation by increasing the likelihood that a metastasis progression mutation is present. If a cluster has S somatic mutations, and each mutation has a small probability p ≪ 1 of being a metastasis progression or virulence gene, the probability of having at least one such metastasis virulence gene is 1 – (1 – p)S ≈ Sp.
Diverse CTC clusters do not carry more virulent mutations, on average, than homogeneous ones, but they are more likely to carry some virulent mutations because of the increased diversity. Unless implantation probability is exactly proportional to the number of cells carrying virulent mutations in a cluster, which seems unlikely, diversity will impact implantation rate.
To compare the increased likelihood that CTC clusters possess metastatic progression genes compared to single CTCs, we determine the relative increase in the number of distinct somatic mutations in a CTC cluster versus a single CTC termed cluster advantage, A(n). To disentangle the contributions from the microscopic and macroscopic diversity, as well as cluster size effects, we compute the cluster advantage for clusters composed of neighboring cells, as well as for random sets of cells sampled across the tumor (Fig 4).
Whereas randomly sampled sets of cells show similar and almost linear increase of the cluster advantage with sample size, cell clusters show more variability. Turnover models have the highest cluster advantage, followed by the surface turnover model, and the no turnover model (Fig 4). Higher turnover increases the cluster advantage (Fig S7). Even low turnover with a death rate of d = 0.05 doubles the cluster advantage compared to the no turnover and surface turnover model (Fig S7).
Discussion
Even though the results of our simulations are consistent with Waclaw et al. at the tumor-wide level (Waclaw et al. 2015), we reach opposite conclusions about the effect of cell turnover on genetic diversity. Waclaw et al. argued that turnover reduces diversity based on the observation that more high-frequency variants were observed in the tumor with turnover: A small number of clones make up a larger proportion of the tumor. Even though we can reproduce the observation, we find that turnover models in fact vastly increase diversity according to more conventional metrics, for example by increasing the number of somatic mutations (by ≈ 5.9×) across the frequency spectrum. Both the increase in dominant clone frequency and increased overall diversity have the same simple origin: A tumor model with turnover requires more cell divisions to reach a given size. An early driver mutation has more time to realize a selective advantage and occupy a high fraction of the tumor, but carrier cells are also more likely to accumulate new mutations along the way leading to increased diversity (Fig 1 and Table S1).
The impact of turnover on cellular heterogeneity is particularly pronounced when considering small cell clusters. These fine-scale patterns, observed in Figs 2 and S3, can be interpreted by considering the expansion dynamics of each model and their impact on cell division and mixing. In all turnover models, the number of somatic mutations in a given cell is ≈ 2.75× higher at the edges than at the center of the tumor, reflecting the higher number of divisions to reach the edge: The center of the tumor is occupied early, which slows down cell division.
In the no turnover and surface turnover models, cell clusters show the same overall pattern of additional diversity at tumor edge. In the turnover model, however, we observe the opposite pattern: Even though edge cells still carry the most mutations, core clusters are now more diverse than edge clusters.
Turnover increases the number of somatic mutations by increasing the number of cell divisions required to reach a given size, especially in the core. For example, core cells in the model with d = 0.2 have ≈ 3.99 somatic mutations, compared to ≈ 1.83 for the no turnover model. This effect is somewhat weaker for edge cells, leading to a modest spatial trend: Without turnover, the number of somatic mutations per cell is 3.5 times higher at the edge than in the core, and the ratio is reduced to 2.2 when turnover is present (d = 0.2).
More importantly for diversity, turnover allows for mixing of cells from nearby clones (Fig 5c). This mixing has a smaller effect at the edge of the tumor, where the range expansion produces serial bottlenecks which reduce the effective population size relative to the tumor core. For moderate cluster sizes, this differential mixing effect overwhelms the “number of divisions” effect, and core clusters are much more diverse than edge clusters, producing distinctive gradients of diversity. Fig
The difference in somatic diversity between single CTCs and CTC clusters, measured through the cluster advantage, follows the expected law of diminishing returns: the more cells in the cluster, the fewer the number of unique mutations per cell. However, the trends vary by growth model and cluster origin. Cell mixing afforded by turnover reduces neighboring cell similarity and increases cluster advantage.
Under the assumption that the presence or absence of a metastatic progression allele modulates metastatic potential of tumor cell clusters, the proportion of metastatic lesions that derive from circulating tumor cell clusters is highest in the turnover model. We can think of this as interference occurring between cells within a cluster. Alternately, this is an illustration of the advantage of not putting all one’s egg in the same basket, applied to tumor metastasis: Assuming that there is a chance component to cluster implantation, mixing increases the likelihood that at least one virulence cell makes it to a hospitable site. Such an effect should be robust to details of the growth model.
In experiments, CTC clusters derived from primary breast and prostate tumors produced more aggressive metastatic tumors (Aceto et al. 2014) compared to single CTCs. This is likely due to differences in mechanical properties of the cluster or the creation of a locally favorable environment by the cluster, rather than by genetic differences. However, the present analysis suggests that this advantage can be enhanced by diversity within the cluster.
Both fine-scale mixtures of cell phenotypes and clonally constrained mutations have been observed experimentally in tumors (Yates et al. 2015; Navin et al. 2010). Similarly, multi-region sequencing revealed high tumor heterogeneity in clear cell renal carcinoma (ccRCC) (Gerlinger, Horswell, et al. 2014), but low levels in lung adenocarcinomas (J. Zhang et al. 2014). This strongly suggests that the amount of migration and mixing varies substantially across tumors, with ccRCC data being better described by a model with turnover, whereas lung adenocarcinoma data more closely resembles a model with low or no turnover.
Distinguishing between migration effects, turnover effects, and tumor growth idiosyncrasies is extremely challenging. Among limitations of our model, we note the assumption of spherical tumor shape and the absence of complex physical contraints (which HCC tumors may experience). Another limitation of the present model is the rigid computational grid which prevents cells from pushing each other out of the way, which constrains growth in the center of the tumor. This constraint plays a role in reducing diversity at the center of the tumor, but it may not be realistic in the earlier stages of tumor growth.
The importance of such effects is largely unknown, and it is likely to vary between tumors and tumor types. Fortunately, we have shown that we are at the cusp of being able to test such models quantitatively. A sampling experiment with twice as many samples than were collected in the HCC patient studied above would enable us to either validate or reject the current state-of-the-art models (Fig 3b), and sequencing of small clusters would further allow us to discriminate between the different models studied here.
Data collection schemes including the lung TRACERx study (Jamal-Hanjani, Hackshaw, et al. 2014; Jamal-Hanjani, Wilson, et al. 2017) will help us put the state-of-the-art models to the test and identify such important parameters of tumor growth. Given our power analysis, we find that sequencing small contiguous cell clusters provides a richer picture of tumor dynamics compared to larger biopsies, with little to no loss in power, assuming that few-cell sequencing can be performed accurately.
This work set out to answer two simple questions: First, should we expect substantial heterogeneity at the cellular scale within tumors and within circulating tumor cell clusters? The answer to the first question is most likely yes, as even the models with no turnover exhibit measurable cluster heterogeneity.
The second question was whether this heterogeneity, sampled through liquid biopsies or multi-region sequencing, is informative about tumor dynamics. Given that state-of-the-art models produce very different predictions about the level of cluster heterogeneity, the answer is also positive. This work identified some of the key factors that determine cluster diversity, especially the interaction between range expansion, cell turnover, and mixing. Even if no diversity were observed at all in CTC clusters, it would enable us to reject the present models in favor of models including additional biological factors that favor the clustering of genetically similar cells. Measuring diversity, or the lack of diversity, within circulating tumor cell clusters or fine-scale multi-region sequencing is therefore a promising tool for both fundamental and medical oncology.
Author Contributions
Conceptualization, S.G.; Methodology, S.G.; Software, Z.A.; Investigation, Z.A. and S.G.; Writing Original Draft, Z.A.; Data Curation Z.A.; Writing Review & Editing, Z.A & S.G.; Visualization, Z.A.; Funding Acquisition, Z.A. and S.G.; Resources, S.G.; Supervision, S.G.
Acknowledgments
We thank Julien Jouganous, Hamid Nikbakht, Yasser Riazalhosseini, and Robert Sladek for useful discussions. This research was made possible thanks to a Canadian Institutes of Health Undergraduate Research Award in computational biology, funding reference numbers 139962 and 145987 and Frederick Banting and Charles Best Canada Graduate Scholarship. This research was undertaken, in part, thanks to funding from the Canada Research Chairs program and a Sloan research fellowship.
Footnotes
↵* simon.gravel{at}mcgill.ca