PT - JOURNAL ARTICLE AU - Sohrab Saeb AU - Luca Lonini AU - Arun Jayaraman AU - David C. Mohr AU - Konrad P. Kording TI - Voodoo Machine Learning for Clinical Predictions AID - 10.1101/059774 DP - 2016 Jan 01 TA - bioRxiv PG - 059774 4099 - http://biorxiv.org/content/early/2016/06/19/059774.short 4100 - http://biorxiv.org/content/early/2016/06/19/059774.full AB - The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map that data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision making, it is important to reliably quantify their prediction accuracy. Cross-validation is the standard approach for evaluating the accuracy of such algorithms; however, several cross-validations methods exist and only some of them are statistically meaningful. Here we compared two popular cross-validation methods: record-wise and subject-wise. Using both a publicly available dataset and a simulation, we found that record-wise cross-validation often massively overestimates the prediction accuracy of the algorithms. We also found that this erroneous method is used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes. As we move towards an era of machine learning based diagnosis and treatment, using proper methods to evaluate their accuracy is crucial, as erroneous results can mislead both clinicians and data scientists.