ABSRACT
Large and non-recombining genomes are prone to accumulating deleterious mutations faster than natural selection can purge (Muller’s ratchet). A possible consequence would then be the extinction of small populations. Relative to most single-cell organisms, cancer cells, with large and non-recombining genomes, could be particularly susceptible to such “mutational meltdown”. Curiously, deleterious mutations in cancer cells are rarely noticed despite the strong signals in cancer genome sequences. Here, by monitoring single-cell clones from HeLa cell lines, we characterize deleterious mutations that retard cell proliferation. The main mutational events are copy number variations (CNVs), which happen at an extraordinarily high rate of 0.29 events per cell division. The average fitness reduction, estimated to be 18% per mutation, is also very high. HeLa cell populations therefore have very substantial genetic load and, at this level, natural population would likely experience mutational meltdown. We suspect that HeLa cell populations may avoid extinction only after the population size becomes large. Because CNVs are common in most cell lines and cancer tissues, the observations hint at cancer cells’ vulnerability, which could be exploited by therapeutic strategies.
INTRODUCTION
Cancer initiation requires tumor cells to obtain several key traits, such as sustained proliferative signaling, resistance to cell death, and accelerated cell growth leading to a competitive advantage over slow-growing neighboring cells (1, 2). Accelerated proliferation of cancer cells inevitably increases their underlying rate of mutations (2, 3), copy number variations (CNVs) that affect a larger fraction of cancer genomes (4-7). It has been shown that positive selected CNVs (drivers) have critical roles in activating oncogenes and in inactivating tumor suppressors (6, 8-14). While many other mutations, including CNVs, may be fitness-neutral and are termed passenger mutations (15, 16), there in fact exist a larger class of mutations that reduce cancer cell proliferation and slow cell growth (17, 18). This class of deleterious mutations has been referred to as “negative drivers” (18).
Deleterious mutations cannot be efficiently purged by natural selection in populations with no recombination. The phenomenon is referred to as Muller’s ratchet, a moving mechanism that is used to indicate the irreversible accumulation of deleterious mutations (19, 20). Since recombination, which can enhance the efficacy of selection against deleterious mutations (20, 21), is absent in the clonally-reproducing cancer cells, accumulation of deleterious mutations can collectively exert a noticeable effect on fitness (22). Deleterious mutations may have a variety of effects on tumorigenesis by shaping cancer growth and intra-tumor variation. If the incidence of deleterious mutations is low or negligible, cancer progression can be easily described by relatively simple mathematical models and genetic variation within tumors can be treated as a function of population size. At another extreme, if deleterious mutations are prevalent, the majority of cancer cells would be defect with limited proliferative ability and only a small fraction of cells in a tumor would be capable of constantly dividing. Accordingly, the proportion and the effect size distributions of deleterious mutations must not be overlooked and may be key parameters to model tumorigenesis.
Guided by previous experience in population genetics, most studies in cancer focus on slightly deleterious mutations as they should be able to evade purifying selection and accumulate, thus influencing long-term tumor progression (23). Because highly deleterious mutations should be rapidly weeded out by purifying selection, it is generally assumed that these mutations have little to do with long-term cancer progression. Thus, the most common mutations, either fixed or at high frequency, that are frequently cited in cancer genomic studies are usually advantageous for individual cell fitness. Such practical limitations ensure that deleterious mutations are mostly undetected in empirical studies. Indeed, even the relatively comprehensive TCGA pan-cancer data set yielded only a small number of deleterious mutations, highlighting gaps in our knowledge about this important class of variants (18).
Direct measurements of individual deleterious mutation effects in vivo are challenging. However, an assessment of their collective action is possible, given a system that generates such variants at an appreciable rate. HeLa cells present such as system as their genomes are highly variable. In addition, this cell line has attractive features for our purposes since it has been extensively cultured and exhibits a short doubling time and aggressive growth. We examined variation in growth rate among individual HeLa cells by monitoring clones from a common ancestral HeLa cell population. We then estimated deleterious mutation rate and the average fitness decrease per mutation by performing computer simulations of cell growth. Our observations suggest that the differences in growth rate among cell clones are heritable, and CNV is the major cause of proliferative fitness reduction in HeLa cells. HeLa cells constantly produce high numbers of deleterious mutations during growth. We discuss the implications of our findings for modeling tumorigenesis.
RESULTS
Genetic variation in growth rate in a HeLa cell population
To ensure genomic homogeneity of the initial population, we first established a HeLa cell line (E6) derived from an ancestral cell line (JF) (Fig. S1). When E6 population size reached approximately 5 × 104 cells (15∼16 divisions), five single-cell clones were generated from E6 and established in culture. When the clone cultures reached 106 ∼ 107 cells, we measured their growth rates using the MTT assay. The B8 and G3 clones showed clearly higher growth rates than E3, E7 and G2, suggesting that cells within E6 were heterogeneous (Fig. 1A). Since these clones were all descendants of E6 which originated from a single cell and experienced only 15 - 16 divisions, our results indicate that heterogeneity in growth rate can be generated in a very short period of time in cancer cells.
To test whether variation in growth rate among clones is heritable, we isolated 39 cells from B8 (fast growing clone) and 40 cells from E3 (slow growing clone), and monitored their growth from a single cell for seven days. Approximately 30% of B8 and 50% of E3 cells died out within seven days (Fig. S2), due to either damage caused during cell isolation or genetic defects. Furthermore, most cell lines with growth rates < 0.6 died within 2 months. Thus, only 50% of B8 and 27% of E3 cells survived for more than two months.
While the growth rates of cells from a single origin varied greatly, the progeny of the fast-growing clone (B8) grew faster on average than those of the slow growing clone (E3)(Fig. 1B). Mean cell number among the B8 progeny was 62.0, while it was 17.3 among E3-derived cell lines. We further drew approximately 100 cells from one fast-(progeny D) and one slow-(progeny C) (Fig. S2) growing derivative of B8 and monitored growth of each cell clone for seven days. As expected, cells from the fast-growing progeny had a higher mean growth rate than that from the slow-growing descendant (p = 2 x 10−11, KS test; Fig. 1C).
If mutations that slow cell proliferation frequently arise in cancerous cell populations, we would expect a decrease in proliferation rates and an increase in among-cell growth speed variance as the population is maintained. To test this, we monitored growth of single cells that were randomly drawn from cell populations at different time points. We first set up six single-cell clones from B8 (Table S1). After cell numbers reached 100 - 500 (8-10 cell divisions), 20 - 30 % of the cells from each population were randomly drawn and separated into single cells. The growth of these isolates was monitored for eight days. This step was repeated when the size of the six populations exceeded 5,000 cells. In all six cases, the average growth rate of cells drawn at the first time point was higher than those from the second time point (Fig. S3). In addition, in four out of six cases, variation in growth rate was higher at the second than at the first time point (t-test).
To test whether the slow-growing cells would be outcompeted by their fast-growing counterparts, we performed a competition assay. The fast- and slow-dividing cells were co-cultured in different ratios and their proportions in populations were monitored weekly using the Discosoma sp. red fluorescence protein (DsRed) and enhanced green fluorescence protein (eGFP) over time using flow cytometry. We ran the experiment for eight weeks. Regardless of the initial proportions, the fast-dividing cells always outgrew the slow-dividing cells in our co-culture assays over time (Fig. 1D and E), suggesting that fast cells indeed possessed higher fitness than the slow ones.
Fig. 1 demonstrates that a cancer cell population can generate heterogeneity in growth rate within several cell replications, even starting from a single cell. Moreover, the majority of these changes is heritable and reduces the fitness of cancer cells (defined as proliferation rate), suggesting that fitness reduction we observe is largely genetically determined.
Cell growth rate is associated with CNV number
To study the genetic basis of growth rate heterogeneity among our cancer cell lines, we assessed genomic variation in E6 and five of its descendant clones by constructing a digital copy-number profile based on next generation sequencing reads. We focused on copy number variation because single-nucleotide mutation rates are too slow to produce significant sequence variation during our short-duration culturing experiments. Most clone genomes are triploid (Fig. S4), and only a small number harbors more than four or fewer than two haploid genomes, consistent with a previous study (24).
The three slow-growing clones showed a clear increase in CNV copy number compared to the E6 parental line (Fig. 2, Table S2), while the two fast-growing clones were more similar to E6, suggesting that that most CNVs are deleterious. As E6 experienced only 15 - 16 divisions before its five descendant clones were generated, above results also indicate that CNVs can be generated in a very short period of time. To test this hypothesis, 11 clones derived from B8 with different growth rates between day 1 to 8 were picked for further analyses (Fig. 3A). The growth rate of each clone was measured again by RTCA iCELLigence when the population reached approximately 106 cells (after 20 - 30 cell divisions). The results were highly correlated (R2 = 0.713, p-value < 0.01; Fig. 3B) with the previous eight day measurements, demonstrating that variation in growth rates among clones was consistent and not due to stochastic fluctuation at different stages.
We picked about 20 cells from each of the 11 clones and counted their chromosome numbers. The karyotypes ranged from 38 to 113 chromosomes, with most (72%) cells harboring between 55 and 70 chromosomes (Fig. 3C). Therefore, despite single-cell origin, the progeny quickly generated aneuploidy within only 20-30 cell divisions, again illustrating frequent cytogenetic change in cancer cells.
Although chromosome numbers varied among clones, their distribution by combing all clones reconstituted the chromosome distribution of their ancestor B8. The average number of chromosomes in the fast-growing group (62.5) was similar (p = 0.56, t test) to the B8 clone (61.1), whereas the slow growing group (66.5) showed significantly greater chromosome numbers than the fast growing (p < 0.01) and the ancestral (p = 0.04) clones (Fig. 3D). Consequently, Fig. 2 and 3 suggest that cancer cells exhibit a very high rate of CNV generation and most of these CNVs are deleterious, reducing the cells’ proliferative ability.
Rate of deleterious mutations
To further study how fast the difference in proliferation rate can be generated, we picked 12 cells from B8, allowed them to divide once, and isolated each daughter cell in a well of a 96-well plate. We then monitored the growth of each of the 12 pairs of cells for four days (Fig. 4A). Four cell pairs lost at least one of the daughters in the first four days, probably due to injury during preparation. Of the remaining eight pairs, two (h and i, Fig. 4A) exhibited different proliferation rates between the daughter cell lineages. The result suggests that there is approximately one deleterious mutation in every four cell divisions in this cancer cell line.
Such a high deleterious mutation rate implies that many cells at day four would carry deleterious mutations which reduce cell proliferation. To test this, each descendant of clone f was harvested and re-suspended into a 96-well plate. We monitored growth of these progeny for another four days. As expected, the majority of the surviving clones exhibited slower growth than their ancestor f (Fig. 4B).
Modeling population growth attenuation
Rough estimates from our experiments so far suggest extremely high mutation rates in cancer cell lines. To understand mutation accumulation rates in tumors, we need quantitative estimates of deleterious mutation rates and effect distributions. Therefore, we constructed and applied a simple model of cell growth and the mutation accumulation process (see Materials and Methods). Let Nt be the population size at day t, where N0 = 1, and Rt be the population growth rate at day t. For each generation, a proportion of cells (μ) generates new mutations which decrease their growth rate. The mean deleterious effect of a mutation is d. We have where R0 is the net growth rate at day 0.
To estimate parameters of this model (R0, μ, and d) we randomly drew 18 single cells from B8 and monitored their growth for 7-8 days (Table S3). We divided these newly-derived cell lines into fast- (cell number > 100) and slow-growing (cell number < 100) groups. We then conducted computer simulations to evaluate the parameters that best fit the observed data (see Materials and Methods).
Using fast-growing cell data, we estimate posterior mean of R0 = 2.37 ([2.22, 2.52]), μ = 0.29 ([0.26, 0.30]), and d = 0.18 ([0.17, 0.20]). Estimates from slow-growing cells (R0 = 2.00 ([0.25, 0.33]), μ = 0.29 ([0.16, 0.20]), and d = 0.18 ([2.22, 2.52])) are similar, except that the initial growth rate is slower, as expected. The deleterious mutation rate (μ) of 0.29 (0.26, 0.33) suggests that there is approximately one deleterious mutation for every 3 - 4 cell divisions, which is very close to our rough non-parametric estimates from daughter-cell divisions (see Fig. 4A). Since μ is scaled per cell division, the product of μ and d reflects fitness change per generation. Our estimates indicate that the HeLa cells experience a 5% (4%, 6%) reduction in fitness for every generation (25, 26) (Table S3). Using point estimates of μ (0.29) and d (0.18), we fitted our model to the growth rates observed in a range of experimental data (Fig. S2) and estimated initial growth rates (R0). Only cell lines that showed monotonic increase in cell numbers were considered. The estimation of R0, ranging from 1.64 to 2.54 in 43 sets of experimental data (Table S4), suggested that the ancestral cell of slowest growing lineage had accumulated about 2.2 (log (1.64 / 2.54) / (1 – 0.18)) more deleterious mutations than the ancestor of fastest growing lineage.
Interestingly, while R0 shows substantial variation among cell lines, estimates of μ and d are similar, possibly indicative of an intrinsic property of rapid cancer cell division. To further test this, we estimated the R0, μ and d in tumor cell lines from different cancer types by the same method (Table 1). Although the R0 are considerably variable, μ and d are generally consistent among cell lines, implying that a fitness reduction (μ × d) in a variety of cancer cells is close to our initial estimate of 5% per generation.
Discussion
We performed multiple single-cell progeny assays using HeLa cell lines and extensively genotyped the clonal populations, focusing on copy number variants (CNVs). We fitted a growth model to these data to estimate distributions of mutation effects on fitness as measured by proliferative capacity. We show that CNV deleterious mutations appear at the rate that falls in the range of 0.26 - 0.33 per genome per generation. Our results demonstrate that accumulated passengers are deleterious to cancer cells because they reduce cell proliferation. We estimate that the deleterious effect of 0.18 ([0.175, 0.187], 95% confidence interval) is twice higher than previous results (17, 27). While at first this may seem to be unreasonably high, the mutations we identified would not be seen in cancer cell populations within tumors as the vast majority of these variants are too deleterious to appear in the earlier data sets.
Nevertheless, the observation that a significant portion of cancerous cells are born with a low enough proliferative capacity that they are rapidly lost during tumor growth suggests that there is a trade-off between cancer progression and genome instability. While cancer cells may benefit from generating genetic variation paving the way for tumor progression, accumulation of deleterious mutations would eventually be intolerable and lead to cell population meltdown. The tumor population must thus strike a delicate balance to maintain heterogeneity and at the same time curb relentlessly increasing passenger mutations. Exploiting this fact may help develop a new therapeutic regime for cancer (17, 23). This can be done by either increasing the overall mutation rate (μ) or the deleterious effect of passengers (d). For example, a 10% increase in μ may result in a more than 50% population reduction in 20 days (Fig. S5). Practically, mutation rate can be increased by targeting the DNA repair system (28) or by regulating DNA replication stress (29).
However, for these strategies to be effective, an additional layer of cancer biology needs to be considered. It has been shown by mathematical modeling that a high incidence of defective cells and cell death may not be disadvantageous for tumor growth, but in fact facilitate tumor progression (30). That is because a high rate of proliferation-reducing or cell death-inducing spontaneous mutation can lead to cancer dynamics that enable sufficient stem-like cell divisions to enrich the pool of cells with high proliferation rate and drive population expansion.
Therapies that directly target cell proliferation should be considered. Since in order to maintain high proliferation rates, rapidly proliferating cells need to increase their translational capacity and are dependent on high rates of ribosome biogenesis (31, 32). Thus, inhibition of ribosome biogenesis could be a selective approach to cancer therapy (33, 34). More importantly, this effect is enhanced in cells with higher proliferation rates, but less so in cells with lower proliferation rates.
Materials and Methods
Cell culture and Single cell isolation
Cancer Cell lines, including HeLa cell lines, Human prostate cancer cell line PC3, Human Rhabdomyosarcoma cell line A204 and Huma malignant melanoma cell line A375, were obtained from the Culture Collection of the Chinese Academy of Sciences, Shanghai, China. Cells were cultured in DMEM Medium for HeLa cell, PC3 and A375, in medium of McCoy’s 5A for A204 (Life Technologies) and supplemented with 10% fetal bovine serum (FBS) (Gibico, Life Technologies), 100 U/ml of penicillin, and 100 μg/ml of streptomycin at 37°C with 5% CO2. Cells were trypsinized using 0.05% trypsin at room temperature for 3 min. Cell counting was performed with a hemocytometer. The cell suspensions were diluted with medium to the final concentration of 1 cell per 100 μl. Single cells were seeded into each well of a 96-well cell culture plate and maintained at 37°C with 5% CO2. After incubating for 12 hours, during which the cells attached to the wells, the single cell isolation was visually confirmed by microscopy. The number of cells in each well was counted every day. When the cell colony became sufficiently large, cells were transferred to 24-well plates, and subsequently transferred to 6-well plates and 10-cm culture dishes.
Karyotype analysis
The cultured cells in a stage of active division were treated with colchicine (200 ng/ mL) for 1 hour at 37°C, then harvested and resuspended in 0.07M KC1 for 30min, and slowly added 10 drops of Carnoy’s fixative (3:1, methanol:acetic acid). The suspended cell was dropped onto a slide, dried the slide rapidly, and stained with 4% DAPI for 5 min. At least twenty cells were spread out in metaphase for karyotyping. Evaluation of interphase nuclei was performed by OLYMPUS BX51 fluorescence microscope. Photographs were taken by a CCD camera with 40 or 100 times objective. Image-Pro Plus software was used for digital image acquisition in the TIFF format, pseudocoloring and merging.
MTT Assay
MTT assay was used to measure cell proliferation. Cells were suspended and seeded at the concentration of ∼700 cells/100μl/well in 96 well plate. A volume of 20 μl dissolved MTT was pipetted into each well. After being incubated for 4 hours at 37°C in a humidified CO2 incubator, the media was removed and 200 μl sterile DMSO was added to each well. The absorbance values were then read at 570 nm with a microplate spectrophotometer. The number of living cells was estimated based on absorbance values.
Real-time cell analysis assay (RTCA)
Cells in 10% FBS media were seeded at a density of 5000 per well into a 16-well E-plate (ACEA) and incubated for 72 hours at 37 °C in a humidified CO2 incubator. The impedance for each well was monitored by the RTCA iCELLgence™ system (ACEA Biosciences Inc) every 70 minutes. Relative impedance signal level (represented as “cell index” in manufacturer’s software) that indicated the number of cells was analyzed using the RTCA Software 1.2 program (Roche Diagnostics). The cell growth curves were automatically recorded on the iCELLigence system in real time. The cell index was followed for 3 days.
Measurement of growth rate
The single cell clones that we generated for the growth rate calculation were in the log phase, the period of exponential increase in the cell number. During this phase, where N(t) is the total number or the concentration of cells at time t, and k is a constant coefficient. To obtain a linear function, the cell numbers were converted to logarithms to the base of Euler’s number e. The least-squares method (LSM) was used to fit the linear regression in which the slope (k) of the regression line would be the growth rate.
Whole genome sequencing and analysis of copy number variation
Genomic DNA was extracted from 106 HeLa cells using QiagenAllpre DNA/RNA Mini Kit (Qiagen). The genomic DNA (3 μg) was fragmented by Adaptive Focused Acoustics on a Covaris E120 (Covaris Inc). The range of product size was from 250bp to 350bp. The fragmented DNA was purified by Qiaquick PCR purification column and quantified on 2100 Bioanalyzer by using the Agilent DNA 1000 kit (Agilent Technologies, Palo Alto, CA, USA). The whole genome libraries were constructed by IlluminaTruseq DNA sample preparation kit according to the manufacturer’s instructions. Whole genome sequencing was performed using the Illumina Hiseq 2000 in the Beijing Institute of Genomics (BIG, Beijing, China). Reads were mapped to the human reference genome (hg18) using BWA software (Version 0.4.9) with default parameters (35). The aligned reads were used as input of the Control-FREEC software (Control-FREEC v10.3; http://bioinfo-out.curie.fr/projects/freec/) for characterizing the large-scale chromosomal and segmental duplication and deletion events (> 107 bp) in all those samples (Fig S4). FREEC was run with the following parameters: window size, 100kb; step size, 50 kb; contaminationAdjustment = TRUE; noisyData = TRUE; BAF calculation inactivated. The segmental copy number gain and lose of each descendant clone was profiled against the E6 genome by preforming the FREEC with the same parameters mentioned above. The relative ratios and breakpoints of copy number gain and lose against the E6 were calculated and identified for the 5 descendant clones (Fig 2A, Table S2).
eGFP and DsRed transfection
Cells with the stable expression of eGFP and DsRed were constructed by transfecting with lentivirus vectors, plenti6.3-MCS-IRES-eGFP and plenti6.3-MCS-IRES-DsRed (Life Technologies), which contained a blasticidin resistance gene and an enhanced green fluorescent protein (eGFP) or a discosoma sp. red fluorescent protein (DsRed) sequence. The expression of eGFP and DsRed were used for labeling and distinguishing the two cells in the competition assay. The vectors were packaged into the lentivirus particles with infectious activity (Invitrogen). Before transfection, 2×105 cells per well were incubated with DMEM in a 6-well plate. After incubating for 24 hours, the medium was replaced by the transfection medium that contained the lentivirus particles and polybrene with the concentration of 8 μg/ml. The multiplicity of infection (MOI) value was 3. After transfecting for 24 hours, the cells were washed three times with PBS, and cultured in the DMEM medium with blasticidin of 10 μg/ml for at least 4 weeks in order to select cells that stably express eGFP and DsRed.
Isolation of fast dividing and slow dividing cells
The CellTrace™ Violet Cell Proliferation Kit, for flow cytometry (Thermo Fisher, C34557) was used for isolating the fast growing and slow growing cells. Cells were labeled by the cell-dye following the CellTrace™ Violet Cell Proliferation Kit workflow after cell cycle synchronization arrested at G1/S phase (double thymidine block). The labeled cells were continuously cultured for 7 days. At the 7th day after labeling, the cells were detached by trypsin-EDTA solution and suspended on culture medium. The BD Influx flow cytometer (BD) was used to isolate the fast growing and slow growing cells. The top 10% cells with strong fluorescence signals were sorted as the slow growing cells and bottom 10% cells with weak fluorescence signal were sorted as the fast growing cells. Analysis was completed using the BD Influx flow cytometer with 405 nm excitation and a 450/40 nm bandpass emission filter.
Competition assay for slow growing cell and fast growing cells
Fast-dividing cells with stable eGFP expression (or DsRed expression) and the slow-dividing cells with DsRed expression (or eGFP expression) were mixed and co-cultured at different initiation ratios (1:1, 2:8 and 8:2). The proportions of the two cell populations in the mixture were monitored by fluorescent intensity of DsRed and eGFP every 3 days by using flow cytometry.
Modeling population dynamic of cell expansion
We construct and apply a simple model of cell growth and mutation accumulation process. We define Nt as population size at the day t, where N0 = 1, and Rt as the growth rate of population at day t. In this case, Rt > 1 represents population increase and Rt < 1 means population decrease. For each division, a proportion of cells (μ) generate new mutations which decrease growth rate of the cells. The deleterious effect of a mutation is d. Nt is denoted by
Rt and Rt-1 are given by a recursive function
Iterating (S2) t times, Rt can be expressed as where R0 is the growth rate at day 0. Substitute (S3) into (S1) yields:
Since the experiments start from single cell isolation (No = 1), Nt can be derived as
Estimate R0 and fitness reduction μd
In (S5) Nt (t = 1,…, 7) can be obtained from observed data, so R0 and μd can be expressed by Nt (t = 1,…, 7). We then divide Nt-1 by Nt and divide Nt by Nt+1
Combine two formulas in (S6) and eliminate R0, the fitness reduction μd can be calculated as
Using linear regression model with the cell number from 8 days, the fitness duction μd can be approximated as
To calculate R0, we substitute (S7) into (S6). Then R0 can be expressed as
Cell growth simulation
We extended the previously described genetic load to simulate the cell growth with accumulation of deleterious mutations. The simulation begins with a single cell. Each day, cells attempt to divide and produce additional cells. The initial growth rate, R0 (R0 ϵ [1.0,3.5]), follows a Poisson distribution. For each cell replication at day t, the number of growth rate (Rt) which is randomly generated followed by mean growth rate Rt. Rt > 1 represents cell proliferation, while Rt = 0 means cell death. During replication, a proportion of cells (μ) generates deleterious mutations. The mutation accumulation in cells leads to the decrease in growth rate. The average deleterious effect of a mutation is d. Therefore, the growth rate of a cell population is reducing as a function of time. The computational model is qualitatively similar to our mathematical model, but considers much more complicated conditions, i.e., the fluctuations of the parameters (Rt, μ and d) caused by their distributions.
When fixed R0, μ and d are given, we can monitor the cell number every day using the simulation model. In summary, we cultured 106 cell lines in the simulation with different combinations of R0, μ and d. Compared the 106 results with observed data with HeLa cell lines, we used the Approximate Bayesian Computation (ABC) method to select optimal groups. The mean number of the selected groups represent the estimated R0, μ and d
Inference of R0, μ and d by Approximate Bayesian Computation (ABC).
Due to the complexity of the parameter space, we used Approximate Bayesian Computation (ABC) method by comparing the simulated 106 cell lines with observed data with HeLa to estimate R0, μ and d. ABC algorithms are a group of methods for performing Bayesian inference without the need for explicit evaluation of the model likelihood function. The algorithms can be used with implicit computer models that generate sample data sets rather than likelihoods (36, 37). By using ABC we can compute the posterior probability distribution of a multivariate parameter Θ (Θ ={R0, μ, d}). A parameter value Θi is sampled from its prior f(Θ) distribution f(Θ) to simulate a dataset S(Θi), for i = 1,…, 106. A set of summary statistics, the value that calculated from the data to represent the maximum amount of information in the simplest possible form, S(Θi) is computed from the simulation. By using a distance function p, we calculated the distance between S(Θi) and observed data Sobs. If the distance between S(Θi) and Sobs is less than a given threshold, the parameter value Θi is accepted. In order to set the threshold that which simulations are accepted, we provide the tolerance rate ϵ, which is defined as the percentage of accepted simulation (38, 39).
The ABC inference scheme is:
Sample a candidate parameter Θi ={R0, μ, d} from the prior distribution f(Θ);
Simulate the growth process of the cell line Θi and calculate the summary statistics; Si
Compare the simulated dataset S(Θi), with the observed data Sobs, using a distance p function and tolerance rate ϵ, if ρ(S(Θi) Sobs) < ϵ, accept Θi;
Go to 1.
Here the summary statistics S(Θi)= {simulated cell number of 7 days}. The observed summary statistics Sobs={observed 7 days’ cell number from experiment}. The prior distribution f(Θ) in our model was R0 ∼ Uniform [1.0,3.5], μ∼Uniform [0.05,0.55] and ∼Uniform [0.01,0.40] The distance function p we set was the Euclidean distance and the tolerance rate ϵ we set was 0.1%. As a whole 1,000 groups of Θi were accepted.
By using different clones from B8 as Sobs, we finally calculated the parametric ranges of rapid proliferated cells were R0f ϵ [2.22, 2.52], μf ϵ [0.26, 0.30], and df ϵ [0.17,0.20]. Meanwhile, the parametric ranges of defected cells were R0d ϵ [1.67,2.22], μd ϵ[0.28,0.31] and dd ϵ.[0.17,0.20].
Acknowledgements
We thank Zhenzhen Liu for the comments and suggestions during the preparation of the manuscript. We thank Xin Wu and Zhipeng Wu for their helps in the artwork of figures.We thank Fang Yang, Chungyan Li and Encheng Dong for their helps in experiments and other support. This study was supported by the Natural Science Foundation of China (91531305 to X.L.), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB13040300 to X.L. and C.-I.W.), Ministry of Science and Technology, Taiwan (105-2918-I-002 -014 - and 105-2628-B-002 -015 -MY3 to H.Y.W.), Natural Science Foundation grant (31671370, 31301093, 11201224 and 11301294 to T.Z.), the Youth Innovation Promotion Association of Chinese Academy of Sciences (2201080 to T.Z.), National Key Basic Research Program (973 Program) of China (2014CB542006 to C.-I.W.), the Key Research Program of the Chinese Academy of Sciences (KJZD-EW-L14 to X.L.).
Footnotes
↵* hurngyi{at}ntu.edu.tw (HYW); luxm{at}big.ac.cn (XL).