Background: Docetaxel has a demonstrated survival benefit for metastatic castration-resistant prostate cancer (mCRPC). However, 10-20% of patients discontinue docetaxel prematurely because of toxicity-induced adverse events, and managing risk factors for toxicity remains an ongoing challenge for health care providers and patients. Prospective identification of high-risk patients for early discontinuation has the potential to assist clinical decision-making and can improve the design of more efficient clinical trials. In partnership with Project Data Sphere (PDS), a non-profit initiative facilitating clinical trial data-sharing, we designed an open-data, crowdsourced DREAM (Dialogue for Reverse Engineering Assessments and Methods) Challenge for developing models to predict early discontinuation of docetaxel Methods: Data from the comparator arms of four phase III clinical trials in first-line mCRPC were obtained from PDS, including 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 598 patients treated with docetaxel, prednisone/prednisolone, and placebo in the VENICE trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, and 528 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Early discontinuation was defined as treatment stoppage within three months due to adverse treatment effects. Over 150 clinical features including laboratory values, medical history, lesion measures, prior treatment, and demographic variables were curated and made freely available for model building for all four trials. The ASCENT2, VENICE, and MAINSAIL trial data sets formed the training set that also included patient discontinuation status. The ENTHUSE 33 trial, with patient discontinuation status hidden, was used as an independent validation set to evaluate model performance. Prediction performance was assessed using area under the precision-recall curve (AUPRC) and the Bayes factor was used to compare the performance between prediction models. Results: The frequency of early discontinuation was similar between training (ASCENT2, VENICE, and MAINSAIL) and validation (ENTHUSE 33) sets, 12.3% versus 10.4% of docetaxel-treated patients, respectively. In total, 34 independent teams submitted predictions from 61 different models. AUPRC ranged from 0.088 to 0.178 across submissions with a random model performance of 0.104. Seven models with comparable AUPRC scores (Bayes factor ≤ 3) were observed to outperform all other models. A post-challenge analysis of risk predictions generated by these seven models revealed three distinct patient subgroups: patients consistently predicted to be at high-risk or low-risk for early discontinuation and those with discordant risk predictions. Early discontinuation events were two-times higher in the high- versus low-risk subgroup and baseline clinical features such as presence/absence of metastatic liver lesions, and prior treatment with analgesics and ACE inhibitors exhibited statistically significant differences between the high- and low-risk subgroups (adjusted P < 0.05). An ensemble-based model constructed from a post-Challenge community collaboration resulted in the best overall prediction performance (AUPRC = 0.230) and represented a marked improvement over any individual Challenge submission. An online predictor can be found at: http://dream.web.tool.aicml.ca/ Findings: Our results demonstrate that routinely collected clinical features can be used to prospectively inform clinicians of mCRPC patients' risk to discontinue docetaxel treatment early due to adverse events and to the best of our knowledge is the first to establish performance benchmarks in this area. This work also underscores the "wisdom of crowds" approach by demonstrating that improved prediction of patient outcomes is obtainable by combining methods across an extended community. These findings were made possible because data from separate trials were made publicly available and centrally compiled through PDS.