ABSTRACT
Evidence that some influential biomedical results cannot be recapitulated has increased calls for data that is findable, accessible, interoperable, and reproducible (FAIR). Here, we study factors influencing the reproducibility of a prototypical cell-based assay: responsiveness of cultured cell lines to anti-cancer drugs. Such assays are important for drug development, mechanism of action studies, and patient stratification. This study involved seven research centers comprising the NIH LINCS Program Consortium, which aims to systematically characterize the responses of human cells to perturbation by gene disruption, small molecule drugs, and components of the microenvironment. We found that factors influencing the measurement of drug response vary substantially with the compound being analyzed and thus, the underlying biology. For example, substitution of a surrogate assay such as CellTiter-Glo® for direct microscopy-based cell counting is acceptable in the case of neratinib or alpelisib, but not palbociclib or etoposide. Uncovering and controlling for such context sensitivity requires systematic measurement of assay robustness in the face of biological variation, which is distinct from assay precision and sensitivity. Conversely, validating assays only over a narrow range of conditions has the potential to introduce serious systematic error in a large dataset spanning many compounds and cell lines.
INTRODUCTION
The goal of making biomedical data more findable, accessible, interoperable, and reusable (the FAIR principles (Wilkinson, Dumontier et al. 2016)) as well as reports from industry that call into the question the reproducibility of published data (Arrowsmith 2011, Prinz, Schlange et al. 2011, Begley and Ellis 2012, Baker 2016) have increased interest in data reliability (Errington, Iorns et al. 2014, Morrison 2014). However, only a limited number of studies (https://f1000research.com/channels/PRR) such as the Science Exchange Reproducibility Initiative (http://validation.scienceexchange.com/#/reproducibility-initiative) have addressed the issue through new experiments and the results of such reproducibility experiments have themselves been controversial (eLIFE-Editorial 2017, Ioannidis 2017, Nature-Editorial 2017, Nosek and Errington 2017). In this paper, we investigate a prototypical class of cell-based assays—rather than a specific scientific result—through collaboration among multiple geographically dispersed laboratories. This work was carried out as part of our participation in the NIH Library of Network-Based Cellular Signatures Program (LINCS) consortium (http://www.lincsproject.org/). The overall goal of LINCS is to generate broadly useful datasets characterizing the responses of cells to perturbation by small molecule drugs, components of the microenvironment, and gene depletion or overexpression. For such a resource to be useful, it must be reproducible.
The assay studied by LINCS Centers in this paper involves measuring the responses of tissue culture cells to small molecule anti-cancer drugs across a dose range. Drug-response assays in cultured cells are widely used in preclinical pharmacologyc(Cravatt and Gottesfeld 2010, Schenone, Dancik et al. 2013) and the study of cellular pathwaysc(Barretina, Caponigro et al. 2012, Garnett, Edelman et al. 2012, Heiser, Sadanandam et al. 2012). In the case of anti-cancer drugs, cells are exposed to drugs or drug-like compounds for several days (commonly three days) and the number of viable cells then determined, either by direct counting using a microscope or using a surrogate assay such as CellTiter-Glo® (Promega), which measures ATP levels in a cell lysate. With some important caveats, the amount of ATP in a lysate from a single microtiter well is proportional to the number of viable cells in that well(Tolliday 2010). Several large-scale datasets describing the responses of hundreds of cell lines to libraries of anti-cancer drugs have recently been publishedc(Barretina, Caponigro et al. 2012, Garnett, Edelman et al. 2012, Seashore-Ludlow, Rees et al. 2015, Haverty, Lin et al. 2016), but their reproducibility and utility is being debatedc(Haibe-Kains, El-Hachem et al. 2013, Consortium and Consortium 2015, Safikhani, Freeman et al. 2015, Bouhaddou, DiStefano et al. 2016).
Our approach was simple: five experimentally-focused LINCS Data and Signature Generation Centers measured the sensitivity of the widely-used, non-transformed MCF 10A mammary epithelial cell line to eight small molecule drugs with different targets and mechanisms of action. The LINCS Data Coordination and Integration Center (DCIC) then processed and visualized the data working with LINCS Center 1. LINCS investigators have established that the way in which drug response is conventionally calculated from cell viability assays is confounded by variability in rates of cell proliferation arising from changes in plating density, fluctuation in media composition, and intrinsic differences in cell division times(Hafner, Niepel et al. 2016, Hafner, Niepel et al. 2017). We corrected for these and other known confounders using the growth rate inhibition (GR) method(Hafner, Niepel et al. 2016, Hafner, Niepel et al. 2017, Niepel, Hafner et al. 2017), thereby focusing the current study on other sources of irreproducibility that remain poorly understood. Individual Centers were provided with identical aliquots of MCF 10A cells, drugs, and media supplements, as well as a common experimental protocol and data analysis procedure. Some variation in methods was inevitable, because not all laboratories had access to the same instrumentation or the same level of technical expertise; in our view, this is a positive feature of the study because it more fully replicates “real-world” conditions. In initial experiments, we observed substantial center-to-center variation. We then performed systematic studies to identify those factors with the largest impact on the measurement of drug response, and distributed this information to other centers to improve experimental and analytical procedures. We found that irreproducibility arose from a subtle interplay between experimental and computational methods and poorly understood sources of biological variation. Thus, a sustained commitment to characterizing and controlling for this variability was necessary to obtain reproducible data from different sites.
RESULTS
Measuring drug responses in collaboration
To establish the single-Center precision of dose-response assays, Center 1 performed multiple replicates of a 13-point dose-response assay with MCF 10A cells and the MEK1/2 kinase inhibitor Trametinib at concentrations between 0.33 nM and 1 µM. Both technical and biological replicates were performed (Figure 1): for technical replicates, multiple drug dilution series were assayed on a single microtiter plate at a single time. For biological replicates, three sets of assays were performed on successive days in different plates; each biological replicate involved three technical replicates. In all cases, viable cell number was determined by differentially staining live and dead cells, collecting fluorescence images from each well, segmenting images using machine vision software, and then counting all viable cells in a well(Hafner, Niepel et al. 2016, Niepel, Hafner et al. 2017). Sigmoidal curves were fitted to the data and four response metrics derived: these measured potency (GR50), maximal efficacy (GRmax), slope (Hill Coefficient or hGR) of the dose response curve, and the integrated area over this curve (GRAOC)(Hafner, Niepel et al. 2016). Fitting procedures and response metrics have been described in detail previously(Hafner, Niepel et al. 2016, Hafner, Niepel et al. 2017) (Supplemental Figure 1), and all routines and data can be accessed on-line or via download at http://www.grcalculator.org/.
We found that response curves for technical replicates were nearly indistinguishable (Figure 1), showing that purely technical error resulting from inaccurate pipetting or errors in counting cells is small. For biological replicates, standard deviations in measurement of drug potency and efficacy (log10(GR50) values and GRmax values, respectively) were ~ 0.07, representing the repeatability of the assay at a single research site across multiple days.
To measure reproducibility across laboratories while controlling for anticipated sources of variability, a single LINCS Center distributed identical MCF 10A aliquots, drug stocks, and media additives, as well as a detailed experimental protocol optimized for cell line-drug pairs under study. This protocol included optimal dose-ranges and separation between doses for reliable curve fitting. When individual LINCS centers first performed these assays, up to 200-fold variability in GR50 values was observed between centers (Supplementary Figure 2). We therefore performed directed experiments in Center 1 to systematically investigate the origins of assay variability.
Technical drivers of variability
First, we examined inter-Center variability in estimation of GRmax focusing on the topoisomerase inhibitor Etoposide and CDK4/6 inhibitor Palbociclib. One LINCS Center used the CellTiter-Glo® ATP-based assay and a luminescence plate reader as a proxy for the number of viable cells; CellTiter-Glo is a common substitute for direct cell counting when a suitable microscope is not available. When we performed side-by-side experiments we found that dose-response curves and GR metrics computed from (image-based) direct cell counts and CellTiter-Glo® values were the same for the EGFR inhibitor Neratinib, differed marginally for the PI3K inhibitor Alpelisib, and exhibited significant differences for the topoisomerase inhibitor Etoposide or CDK4/6 inhibitor Palbociclib (Figure 2A,B). This finding explains, in part, inter-Center differences in drug response observed in preliminary experiments (Supplementary Figure 2). It is known that CellTiter-Glo® and direct cell counts are poorly correlated when drugs cause dramatic changes in cell size or alter ATP metabolism, thereby changing the relationship between ATP level in cell extracts and cell number (Figure 2C for Palbociclib)(Salani, Marini et al. 2013, Harris, Koh et al. 2016, Soliman, Steenson et al. 2016). Our data show that the degree of agreement between cell counting and CellTiter-Glo® depended on the drug being assayed (previous work has shown that cell line also impacts the correlation between the assays(Niepel, Hafner et al. 2017)). Thus, direct cell counting and CellTiter-Glo® measurements can be substituted for each other in some cases but not in others. More generally, a substitution of assays that appears to be justified by pilot studies on a limited number of cell lines and drugs can be problematic when the number and chemical diversity of drugs is increased.
Edge effects and non-uniform cell growth are a second and substantial source of variation in cell based studies performed in microtiter plates(Coyle, Green et al. 1989, Bushway, Azimi et al. 2010) thought to arise in part from temperature gradients and uneven evaporation of media at the edges of plates. We have observed a variety of irregularities in plating and cell growth at least some of which are dependent on the batch of microtiter plate; we now test all batches of plates for uniformity of cell growth(Niepel, Hafner et al. 2017). Because variation in growth is often confined to specific regions of a plate (Figure 2D), it can generate systematic error in dose-response curves. A variety of approaches can be taken to minimize such effects (e.g. placing plates in humidified chambers to reduce evaporation from edge wells), but we found that automated and randomized compound dispensing is particularly helpful. Using the HP D300e Digital Dispenser it is possible to “print” compounds directly into microtiter plates in an arbitrary pattern, randomizing the locations of control and technical replicate samples in the plate. In this way, systematic error arising from edge effects is converted into random error, which doesn’t result in a directional bias of response metrics and is more easily modeled(Niepel, Hafner et al. 2017). We have found that the use of simple washing and dispensing robots also reduces errors that humans make during repetitive pipetting operations. Most of these robots are small, robust, and relatively inexpensive, and our experience suggests that they greatly improve the reproducibility of medium- and high-throughput cell-based and biochemical studies.
A third source of error we explored involved the concentration range over which a drug was assayed and the impact of the range on curve fitting and parameter estimation. For example, if we follow general practice and assay Trametinib over a thousand-fold concentration range, growth is fully arrested at ~30 nM (Figure 3A, left plot, red arrow): phenotypic response does not change even when the dose is increased 100-fold to 1 µM and increasing the dose-range has no effect on curve fitting and parameter estimation (Figure 3A, left plot). However, when Dasatinib (a poly-selective SRC-family kinase inhibitor) was assayed over a thousand-fold range, curve fitting identified a plateau in GR value between 0.3 to 1 µM, but when the dose-range was extended GR values become negative, demonstrating a shift from cytostasis to cell killing (Figure 3A, right plot, and 3B). The subtlety here is that a dose-range that is adequate for analysis of Trametinib is not adequate for Dasatinib. This sort of variation is difficult to spot in a high-throughput experiment and suggests that efficient procedures are needed to optimize dose ranges for specific compounds careful quality control of curve fitting.
A fourth source of inter-Center variation involved over-estimation of cell number in the presence of high doses of Dasatinib and Neratinib when we compared Centers that used imaging-based assays (Figure 3C). Above 1 µM, GR values were negative for both drugs denoting cell death, but cell counts at one LINCS Center implied purely cytostatic effects. Follow-up studies showed that the discrepancy arose from the use of image processing algorithms that included dead cells in the “viable cell count” and from over-counting the number of cells when the drugs induced frequent multi-nucleation(Roytta, Laine et al. 1987, Orth, Kohler et al. 2011). Observed differences in drug response across Centers could be recapitulated in a single laboratory using two different image processing routines and were also evident by visual inspection of the segmented images (Figure 3C,D). In retrospect, all Centers should have processed images in the same way, but such routines are often built into manufacturer’s proprietary software making identical image analysis dependent on transfer of primary data. Furthermore, this level of harmonization is impossible to achieve when replicating published results. This example demonstrates the importance of locking down all steps in the data processing pipeline from raw measurements to final parameter estimation, as well as a relatively subtle interplay between biological and technical sources of variability.
Biological factors impacting repeatability
Variables that can change the biology of drug response, such as media composition, incubation conditions, microenvironment, media volume, and cell density, have been discussed elsewhere(Hafner, Niepel et al. 2016, Haverty, Lin et al. 2016) and were controlled to the greatest extent possible in this study through standardization of reagents and the use of GR metrics. In a truly independent repeat of the current study, experimental variables such as these would need to be considered as additional confounders, because it is difficult to fully standardize a reagent as complex as tissue culture media.
However, one Center performed a preliminary comparison of batches of horse serum, hydrocortisone, cholera toxin, and insulin and found that the effects on drug response were smaller than the sources of variation discussed above (data not shown). At the outset of the study, we had anticipated that the origin of the MCF 10A isolate would be an important determinant of drug response. MCF 10A cells have been grown for many years, and karyotyping reveals differences among isolates(Soule, Maloney et al. 1990, Caruso, Reiners et al. 2001, Cowell, LaDuca et al. 2005, Kim, Yang et al. 2008, Zientek-Targosz, Kunnev et al. 2008, Marella, Malyavantham et al. 2009), which is why we distributed aliquots of a single isolate to all Centers. However, we detected very little variability in drug sensitivity when three different MCF 10A isolates were compared directly to each other and also to a histone H2B-mCherry-tagged subclone of one of the isolates (Supplemental Figure 3). Variability among MCF 10A isolates assayed at Center 1 (including different sub-clones from the same master stock) was also smaller than what was observed when a single isolate was assayed at different Centers. However, in other settings, clonal variation is likely to have a much larger impact(Ramirez, Rajaram et al. 2016).
The duration of drug exposure is not generally explored in in vitro studies, and instead researchers typically assay cells after a fixed time point after treatment. Here, we monitored the responses of MCF 10A cells to drugs in a live-cell experiment in which cell number was measured every two hours using an automated high-throughput microscope. We then quantified the response by calculating GR values over a 12 hour moving window (instantaneous GR values) and found that the effect of time and dose were substantial in some cases and varied with drug. For example, instantaneous GR values for cells exposed to Etoposide were nearly constant at all drug doses throughout a 50-hour assay period (Figure 4, top left plot), whereas instantaneous GR values for 0.1 µM of Neratinib varied from 0 to 1 over the same period (Figure 4, bottom left plot), with the highest variability at intermediate drug doses. As a consequence, GR dose-response curves and metrics derived from these curves, such as GR50 and GRmax, varied with time (Figure 4 and Supplemental Figure 4). The temporal dependence of drug response is likely to reflect biological adaptation, drug export, and other factors important in understanding drug mechanism of action(Fletcher, Haber et al. 2010, Muranen, Selfors et al. 2012, Hafner, Niepel et al. 2016, Harris, Frick et al. 2016, Fallahi-Sichani, Becker et al. 2017), factors which, in a high-throughput assay, remain unexplored and can add to the difficulties in reproducing experimental results.
Final results
When the factors described above were taken into account to the greatest extent possible, we found that reproducibility of drug-response measurements across five LINCS Centers could be reduced to a standard deviation of about 2.4-fold (standard deviation value of log10(GR50) is ~0.38; see Figure 5). In the final dataset, one center still relied on manual compound dispensing because a robot was not available and a second center used CellTiter-Glo® assays rather than direct cell counting to estimate viable cell number. Despite such differences in procedure, which are typical of “real-world” conditions, inter-Center variability at the end of the study was dramatically lower than at the outset, while remaining about 5-fold higher than for biological repeats at a single center (Figure 1, fourth panel; 5A-B). Variability remained drug-dependent. For example, for Paclitaxel estimates of GRmax and Hill Coefficient (hGR) were variable, and GR50 and hGR values were variable for Palbociclib (Figure 5A-B). In contrast, for Neratinib, response was reproducibly measured (standard deviation for all metrics below 0.17 with a single outlier).
DISCUSSION
In this study, we show that five geographically dispersed laboratories can, with effort, generate reasonably reproducible drug-response data in tissue culture cells. Inter-lab precision was three- to five-fold lower than that achievable within a single laboratory, likely reflecting persistent differences in experimental procedure. This level of precision required some effort to achieve and, in our judgement, exceeds the norm for this class of experiments in the current literature (although this is not easy to prove). Further improvements in inter-center precision and reproducibility would necessitate the implementation of identical and automated compound handling, pipetting, and cell counting procedures in all laboratories. This was not achieved, because of the expense of acquiring the necessary instrument and a belief—belied by subsequent experiments—that counting cells is such a simple procedure that different assays can be substituted for each other without consequence.
At the outset of the study we had hoped to identify the specific biological, experimental, and computational factors that had the largest impact on irreproducibility across individual Centers and thereby generate a ranked list of the most important factors in ensuring reliable data collection.
However, we discovered that in many cases irreproducibility was itself irreproducible and that the technical factors responsible for specific outlier datapoints were difficult to diagnose. We therefore undertook a systematic study of the assay itself, with an eye to identifying those variables with the greatest impact on data quality. We found that irreproducibility most commonly arises from unexpected interplay between experimental protocol and true biological variability. For example, estimating cell number from ATP levels using the CellTiter-Glo® assay produces very similar results as direct counting of cells in the case of Neratinib, but this is not true when cells are exposed to Etoposide or Palbociclib (Figure 2). The discrepancy most likely arises because ATP levels in lysates of drug-treated cells vary for reasons other than changes in cell number, including changes in cell size and metabolism. This observation has important implications for the design of experiments in which diverse compounds are screened. We have previously shown that the density at which cells are assayed also has a dramatic effect on drug response(Hafner, Niepel et al. 2016), but this too is context dependent. For some cell line-drug pairs, density has little or no effect, whereas for other pairs it increases drug sensitivity, and for yet others it decreases sensitivity.
Preliminary studies suggest that such context dependence reflects real changes in the underlying biology and not flaws in assay methodology itself. For example, cell density directly impacts media conditioning and the strength of autocrine signaling, which affects response to some drugs but not others(Yonesaka, Zejnullahu et al. 2008, Wilson, Fridlyand et al. 2012). As a consequence, changes in protocol that might seem unimportant based on a control study in one cell type or biological setting might, nonetheless, substantially affect results obtained with other cells or growth conditions. From this we conclude that there is no substitute for empirical analysis of even a seemingly simple assay. In general, the variables with the greatest effect on measurement of drug response differed from what we expected a priori. For example, isolate-to-isolate differences (evident in karyotypic changes) in MCF 10A cultures had less of an effect on drug response assays (Supplemental Figure 3) than the ways in which drugs and cells were plated into multi-well plates and counted (Figure 2D, 3). The only way to identify and control for such variation is to conduct comprehensive experiments aimed at empirically establishing the range of conditions over which data remain precise and exact for a significant number of cell lines and drugs used in a specific profiling effort.
Data processing routines are as important for reproducibility as well-controlled experiments(Sandve, Nekrutenko et al. 2013). Data and data analysis routines can interact in multiple ways, some of which are clear in retrospect but not necessarily anticipated. For example, collecting 9-point dose response curves generally represents good practice, but it is essential that the dose range effectively span the GR50 (the mid-point of the response) and account for the possibility of compound dose response curves (which is often ascribed to inhibition of different targets over different dose ranges). When this is not the case (as illustrated by Figure 3A), curve fitting is underdetermined and response metrics become unreliable. In many cases problems with dose range are not evident until an initial assay has been performed. Iterative design is straightforward in small scale studies, but substantially harder for large-scale screens; for a large dataset, data processing routines must be designed to automatically identify and flag problems with dose range. Another example involves image processing routines for automated cell counting: such routines must be optimized for cells that grow and respond to drugs in different ways (Figure 3C,D) and must be tested for performance at high and low cell densities. Processing pipelines for the type of data collected in this study are much less developed than the pipelines commonly used for genomics data(Lam, Clark et al. 2011, Bao, Huang et al. 2014, Ashley 2016), but much can be learned from this comparison. For example, computational platforms with provenance such as Galaxy(Goecks, Nekrutenko et al. 2010), Sage Bionetworks’ Synapse(Omberg, Ellrott et al. 2013), or Cancer Genomics Clouds, have been developed to support data sharing, reproducible analyses, and transparent pipelines, with a primary focus on genomics data. Galaxy also provides a shared platform on which to execute workflows, which serves to eliminate compute environment differences. With sufficient investment in pipeline development it should be possible to adapt such solutions to a wider range of assay types. Image processing algorithms present a unique challenge in that they are frequently proprietary software linked to a specific data acquisition microscope, which complicates common analysis across laboratories; open-source image analysis platforms are therefore preferable in principle(Carpenter, Jones et al. 2006). In the specific case of drug dose-response assays described here, we have developed an online tool at http://GRcalculator.org along with open source scripts for download(Clark, Hafner et al. 2017).
The elements of a workflow for reproducible collection of dose-response data are fairly simple: (i) Standardization of reagents including obtaining cell lines directly from repositories such as the ATCC, performing mass spectrometry-based quality control of small molecule drugs, and tracking lot numbers for all media additives; (ii) Standardized data processing starting with raw data and metadata through to reporting of final results; and (iii) Use of automation to improve reliability and enable experimental designs too complex or labor intensive for humans to execute reliably. The first two points are obvious, but not trivial to implement, because laboratories are not all equipped the same way and some data processing routines are embedded in a non-obvious way in instrument software. A major benefit of automation is that it makes random plate layouts feasible, thereby changing systematic edge effects into random error that has a reduced impact on the dose-response measures. Our data argue that small and relatively inexpensive bench-top dispensing and washing robots have an important role in the performance of reproducible cell-based assays.
The primary contribution of the current study is to show that future execution of reproducible drug dose-response assays in different cell types will require systematic experimentation aimed at establishing the robustness of assays over a full range biological settings and cell types. Such robustness is distinct from conventional measures of assay performance such as precision or repeatability in a single biological setting (Figure 5C). Testing of this type is not routinely performed for the simple reason that establishing and maintaining robust and reproducible assays is time consuming and expensive: we estimate that reproducibility adds ~20% to the total cost of a large-scale study such as drug-response experiments in panels of cell lines(AlQuraishi and Sorger 2016). Iterative experimental design is also essential, even though several leading biomedical scientists have argued that this is not feasible for large-scale studies.(Harris 2017) More generally, despite a push for adherence to the FAIR principles(Wilkinson, Dumontier et al. 2016) there is currently no consensus that the necessary investment is worthwhile, nor do incentives exist in the publication or funding processes for individual research scientists to meet FAIR standards(AlQuraishi and Sorger 2016, Goodspeed, Heiser et al. 2016). In developing these incentives, we must recognize that reproducible research is a public good whose costs are borne by individual investigators and whose benefits are conferred to the community as a whole.
A question raised by our analysis is whether, given their variability and context-dependence, drug response assays performed in vitro are useful for understanding drug response in other settings, especially in human patients(Wilding and Bodmer 2014). Concern about the translatability of in vitro experiments is long-standing, but we think the current work provides grounds for optimism rather than additional worry. Simply put, if in vitro data cannot be reproduced from one laboratory to the next, then it is no wonder that they cannot easily be reproduced in humans; conversely, paying greater attention to accurate and reproducible in vitro data is likely to improve translation. Moreover, many of the factors that appear to represent irreproducibility in fact arise from biologically meaningful variation. This includes the time-dependence of drug response, the impact of non-genetic heterogeneity at a single-cell level, and the influence of growth conditions and environmental factors(Cohen, Geva-Zatorsky et al. 2008, Yonesaka, Zejnullahu et al. 2008, Loewer and Lahav 2011, Muranen, Selfors et al. 2012, Wilson, Fridlyand et al. 2012). The simple assays of drug response in current use are unable to correct for such variability, and the problem is made worse by “kit-based science” in which technical validation of assays is left to vendors. However, if the challenge of understanding biological variability at a mechanistic level is embraced, it seems likely that we will improve our ability to conduct in vitro assays reproducibly and to apply data obtained in cell lines to human patients.
MATERIAL AND METHODS
Cell lines and drugs
Three isolates of MCF 10A, here referred to as MCF 10A-GM, MCF 10A-OHSU, and
MCF 10A-HMS, were sourced independently at three different times from the ATCC. MCF 10A-H2B-mCherry cells were created by inserting an H2B-mCherry expression cassette into the AAVS1 safe harbor genomic locus of MCF 10A-HMS using CRISPR/Cas9(Hafner, Niepel et al. 2016). All lines were cultured in DMEM/F12 base media (Invitrogen #11330-032) supplemented with 5% horse serum (Sigma-Aldrich #H1138), 0.5 μg/mL hydrocortisone (Sigma # H-4001), 20 ng/mL rhEGF (R&D Systems #236-EG), 10 μg/mL insulin (Sigma #I9278), 100 ng/mL cholera toxin (Sigma-Aldrich #C8052), and 100 units/mL penicillin and 100 μg/mL streptomycin (Invitrogen #15140148 or #15140122 or other sources) as described previously(Debnath, Muthuswamy et al. 2003). Base media, horse serum, hydrocortisone, rhEGF, insulin, and cholera toxin where purchased by the MEP-LINCS Center and distributed to the remaining experimental sites. MCF 10A-GM was expanded by Gordon Mills at MD Anderson Cancer Center and distributed to all experimental sites. Cell identity was confirmed at individual experimental sites by short tandem repeat (STR) profiling, and the cells were found to be free of mycoplasma prior to performing experiments.
Drugs were obtained from commercial vendors by HMS LINCS, tested for identity and purity in house as described in detail in the drug collection section of the HMS LINCS Database (http://lincs.hms.harvard.edu/db/sm/), and distributed as 10 mM stock solutions dissolved in DMSO to all experimental sites.
Drug response experiments and data analysis
The experimental and computational protocols to measure drug response are described in detail in two prior publications(Hafner, Niepel et al. 2017, Niepel, Hafner et al. 2017). The following protocol was suggested for this study: cells were plated at 750 cells per well in 60 µL of media in 384-well plates using automated plate fillers and incubated for 24 h prior to drug addition. Drugs were added at the indicated doses with a D300 Digital Dispenser (Hewlett-Packard), and cells were further incubated for 72 h. At the time of drug addition and at the endpoint of the experiment cells were staining with Hoechst and LIVE/DEAD™ Fixable Red Dead Cell Stain (ThermoFisher Scientific) and cell numbers were determined by imaging as described(Hafner, Niepel et al. 2016, Niepel, Hafner et al. 2017) or by the CellTiter-Glo® assay (Promega). Some details of the experimental protocol differed across Centers and over time, e.g. manually dispensing of drugs or use of 96-well plates.
For live-cell experiments with MCF 10A-H2B-mCherry, cell counts were performed by imaging plates in an 2 hr interval over the course of 96 hours (only first 50 hours shown)(Hafner, Niepel et al. 2016, Niepel, Hafner et al. 2017). Data analysis was performed as described previously (Hafner, Niepel et al. 2016, Hafner, Niepel et al. 2017).
Irregularities in growth across microtiter plates was performed by plating MCF 10A cells at 750 cells per well in 60 µL of media in 384-well plates using automated plate fillers and determining cell numbers after 96 h through imaging as described(Hafner, Niepel et al. 2016, Niepel, Hafner et al. 2017).
DECLARATIONS
Competing interests
The authors declare that they have no competing interests.
Funding
This work was funded by grants U54-HL127365 to PKS; U54-HG008098 to RI, MRB, and EAS; R01-GM104184 to MRB; U54HL127624 to MM and AM; U54-HG008100 to JWG, LMH, and JEK; U54-HG008097 to JDJ; U54-NS091046 to CNS; U54-HL127366 to TRG and AD. ADS and AMB were supported by a NIGMS-funded Integrated Pharmacological Sciences Training Program grant T32GM062754.
Authors' contributions
M.N., L.M.H., M.R.B., and E.H.W. designed the study and led its execution; P.K.S supervised the systematic analysis of assay variability and oversaw manuscript preparation. M.H. contributed to experimental design and performed all data analysis. A.M.B. and A.D.S. participated in data collection and initial processing. All authors participated in editing the manuscript.
Materials & Correspondence
Data presented in this paper are included in the additional material and available on the GR browser at http://www.grcalculator.org/grbrowser/.
ADDITIONAL MATERIAL
Supplemental Data 1:
Final drug-response data generated by all LINCS Centers (Figure 5).
Supplemental Data 2:
Data generated by the HMS LINCS Center during follow-up experiments (Figures 2-4 and Supplemental Figure 3-4).
Acknowledgements
We thank K. Ward for arranging the HP D300 instrument trial at Mount Sinai.
Footnotes
↵†† Co-corresponding authors
† Contact author: Peter K. Sorger, 200 Longwood Avenue Boston, MA 02115, peter_sorger{at}hms.harvard.edu copying christopher_bird{at}hms.harvard.edu and heiserl{at}ohsu.edu Phone (617) 432-6902 Fax (617) 432-6990 ORCID ID: 0000-0002-3364-1838