TY - JOUR T1 - I Tried a Bunch of Things: The Dangers of Unexpected Overfitting in Classification JF - bioRxiv DO - 10.1101/078816 SP - 078816 AU - Michael Skocik AU - John Collins AU - Chloe Callahan-Flintoft AU - Howard Bowman AU - Brad Wyble Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/10/03/078816.abstract N2 - Machine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, MEG, and PET data. With these new techniques come new dangers of overfitting that are not well understood by the neuroscience community. In this article, we use Support Vector Machine (SVM) classifiers, and genetic algorithms to demonstrate the ease by which overfitting can occur, despite the use of cross validation. We demonstrate that comparable and non-generalizable results can be obtained on informative and non-informative (i.e. random) data by iteratively modifying hyperparameters in seemingly innocuous ways. We recommend a number of techniques for limiting overfitting, such as lock boxes, blind analyses, and pre-registrations. These techniques, although uncommon in neuroscience applications, are common in many other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques. ER -