TY - JOUR T1 - The importance of study design for detecting differentially abundant features in high-throughput experiments JF - bioRxiv DO - 10.1101/007948 SP - 007948 AU - Luo Huaien AU - Li Juntao AU - Chia Kuan Hui Burton AU - Paul Robson AU - Niranjan Nagarajan Y1 - 2014/01/01 UR - http://biorxiv.org/content/early/2014/08/14/007948.abstract N2 - The use of high-throughput experiments, such as RNA-seq, to simultaneously identify differentially abundant entities across conditions has become widespread, but the systematic planning of such studies is currently hampered by the lack of general-purpose tools to do so. Here we demonstrate that there is substantial variability in performance across statistical tests, normalization techniques and study conditions, potentially leading to significant wastage of resources and/or missing information in the absence of careful study design. We present a broadly applicable experimental design tool called EDDA, and the first for single-cell RNA-seq, Nanostring and Metagenomic studies, that can be used to i) rationally choose from a panel of statistical tests, ii) measure expected performance for a study and iii) plan experiments to minimize mis-utilization of valuable resources. Using case studies from recent single-cell RNA-seq, Nanostring and Metagenomics studies, we highlight its general utility and, in particular, show a) the ability to correctly model single-cell RNA-seq data and do comparisons with 1/5th the amount of sequencing currently used and b) that the selection of suitable statistical tests strongly impacts the ability to detect biomarkers in Metagenomic studies. Furthermore, we demonstrate that a novel mode-based normalization employed in EDDA uniformly improves in robustness over existing approaches (10-20%) and increases precision to detect differential abundance by up to 140%. ER -