TY - JOUR T1 - RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods JF - bioRxiv DO - 10.1101/063008 SP - 063008 AU - Aliaksei Z. Holik AU - Charity W. Law AU - Ruijie Liu AU - Zeya Wang AU - Wenyi Wang AU - Jaeil Ahn AU - Marie-Liesse Asselin-Labat AU - Gordon K Smyth AU - Matthew E Ritchie Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/07/09/063008.abstract N2 - Background Carefully designed control experiments provide a gold standard for benchmarking new platforms, protocols and pipelines in genomics research. RNA profiling control studies frequently use the mixture design, which takes two distinct samples and combines them in known proportions to induce predictable expression changes for every gene. Current mixture experiments have low noise and simulate relatively large expression changes by comparing RNA from different tissues, making them atypical of regular experiments.Results To generate a more realistic RNA-sequencing control data set, two cell lines of the same cancer type were mixed in various proportions. Noise was added by independently preparing, mixing and degrading a subset of the samples. The systematic gene-expression changes induced by this design were used to benchmark different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines for differential gene expression, differential splicing and deconvolution analysis. More signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation was observed using the total RNA kit. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, the DEXSeq method was found to be the most sensitive but also the most inconsistent. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly.Conclusions RNA-sequencing control experiments such as this provide a valuable resource for benchmarking different sequencing protocols and data pre-processing workflows. We have demonstrated that with a few extra steps, data with noise characteristics much more similar to regular RNA-sequencing experiments can be obtained. ER -