PT - JOURNAL ARTICLE AU - Michael Skocik AU - John Collins AU - Chloe Callahan-Flintoft AU - Howard Bowman AU - Brad Wyble TI - I Tried a Bunch of Things: The Dangers of Unexpected Overfitting in Classification AID - 10.1101/078816 DP - 2016 Jan 01 TA - bioRxiv PG - 078816 4099 - http://biorxiv.org/content/early/2016/10/03/078816.short 4100 - http://biorxiv.org/content/early/2016/10/03/078816.full AB - Machine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, MEG, and PET data. With these new techniques come new dangers of overfitting that are not well understood by the neuroscience community. In this article, we use Support Vector Machine (SVM) classifiers, and genetic algorithms to demonstrate the ease by which overfitting can occur, despite the use of cross validation. We demonstrate that comparable and non-generalizable results can be obtained on informative and non-informative (i.e. random) data by iteratively modifying hyperparameters in seemingly innocuous ways. We recommend a number of techniques for limiting overfitting, such as lock boxes, blind analyses, and pre-registrations. These techniques, although uncommon in neuroscience applications, are common in many other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques.