PT - JOURNAL ARTICLE AU - Alexandre Drouin AU - Sébastien Giguère AU - Maxime Déraspe AU - Mario Marchand AU - Michael Tyers AU - Vivian G. Loo AU - Anne-Marie Bourgault AU - François Laviolette AU - Jacques Corbeil TI - Predictive Computational Phenotyping and Biomarker Discovery Using Reference-Free Genome Comparisons AID - 10.1101/045153 DP - 2016 Jan 01 TA - bioRxiv PG - 045153 4099 - http://biorxiv.org/content/early/2016/03/27/045153.short 4100 - http://biorxiv.org/content/early/2016/03/27/045153.full AB - The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a new reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa and S. pneumoniae. We show that the obtained models are accurate and that they highlight biologically relevant biomarkers, while providing insight into the process of antibiotic resistance acquisition.