TY - JOUR T1 - Algorithmic methods to infer the evolutionary trajectories in cancer progression JF - bioRxiv DO - 10.1101/027359 SP - 027359 AU - Giulio Caravagna AU - Alex Graudenzi AU - Daniele Ramazzotti AU - Rebeca Sanz-Pamplona AU - Luca De Sano AU - Giancarlo Mauri AU - Victor Moreno AU - Marco Antoniotti AU - Bud Mishra Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/05/16/027359.abstract N2 - The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next generation sequencing (NGS) data, and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional - omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent works on “selective advantage” relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications as it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations and progression model inference. We demonstrate PiCnIc’s ability to reproduce much of the current knowledge on colorectal cancer progression, as well as to suggest novel experimentally verifiable hypotheses.Statement of Significance: A causality based new machine learning Pipeline for Cancer Inference (PicNic) is introduced to infer the underlying somatic evolution of ensembles of tumors from next generation sequencing data. PicNic combines techniques for sample stratification, driver selection and identification of fitness-equivalent exclusive alterations to exploit a novel algorithm based on Suppes’ probabilistic causation. The accuracy and translational significance of the results are studied in details, with an application to colorectal cancer. PicNic pipeline has been made publicly accessible for reproducibility, interoperability and for future enhancements. ER -