Abstract
Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes and for furnishing diagnostic and prognostic phenotype predictions in medicine. Despite this, a comprehensive and rigorous evaluation of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics data, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines, and Neural Network classification techniques for discriminating phenotype. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were randomly simulated to provide consistent estimates of classifier performance over a wide range of possible scenarios. The presence of non-normal error distributions and the effect of prior-significance filtering (dimension reduction) were evaluated. In each simulation, classifier parameters (such as the number of hidden nodes in a neural network) were tuned by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real clinical metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the scenarios without a significant presence of non-normal error distributions over metabolite clusters, Neural Network and PLS-DA classifiers performed poorly relative to Sparse PLS-DA (sPLS-DA), Support Vector Machine (SVM), and Random Forest classifiers. When non-normal error distributions were introduced, the performance of PLS-DA classifiers deteriorated further relative to the remaining techniques. Simultaneously, while the relative performance of Neural Network classifiers improved relative to PLS-DA classifiers, Neural Network classifier performance remained poor compared sPLS-DA, SVM, and Random Forest classifiers. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.