PT  - JOURNAL ARTICLE
AU  - Michael Skocik
AU  - John Collins
AU  - Chloe Callahan-Flintoft
AU  - Howard Bowman
AU  - Brad Wyble
TI  - I Tried a Bunch of Things: The Dangers of Unexpected Overfitting in Classification
AID  - 10.1101/078816
DP  - 2016 Jan 01
TA  - bioRxiv
PG  - 078816
4099  - http://biorxiv.org/content/early/2016/10/03/078816.short
4100  - http://biorxiv.org/content/early/2016/10/03/078816.full
AB  - Machine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, MEG, and PET data. With these new techniques come new dangers of overfitting that are not well understood by the neuroscience community. In this article, we use Support Vector Machine (SVM) classifiers, and genetic algorithms to demonstrate the ease by which overfitting can occur, despite the use of cross validation. We demonstrate that comparable and non-generalizable results can be obtained on informative and non-informative (i.e. random) data by iteratively modifying hyperparameters in seemingly innocuous ways. We recommend a number of techniques for limiting overfitting, such as lock boxes, blind analyses, and pre-registrations. These techniques, although uncommon in neuroscience applications, are common in many other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques.