TY - JOUR T1 - Improved Placement of Multi-Mapping Small RNAs JF - bioRxiv DO - 10.1101/044099 SP - 044099 AU - Nathan R. Johnson AU - Jonathan M. Yeoh AU - Ceyda Coruh AU - Michael J. Axtell Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/03/16/044099.abstract N2 - High-throughput sequencing of small RNAs (sRNA-seq) is a popular method used to discover and annotate microRNAs (miRNAs), endogenous short interfering RNAs (siRNAs) and Piwi-associated RNAs (piRNAs). One of the key steps in sRNA-seq data analysis is alignment to a reference genome. sRNA-seq libraries often have a high proportion of reads which align to multiple genomic locations, which makes determining their true loci of origin difficult. Commonly used sRNA-seq alignment methods result in either very low precision (choosing an alignment at random) or sensitivity (ignoring multi-mapping reads). Here, we describe and test an sRNA-seq alignment strategy that uses local genomic context to guide decisions on proper placements of multi-mapped sRNA-seq reads. Tests using simulated sRNA-seq data demonstrated that this local-weighting method outperforms other alignment strategies using three different plant genomes. Experimental analyses with real sRNA-seq data also indicate superior performance of local-weighting methods for both plant miRNAs and heterochromatic siRNAs. The local-weighting methods we have developed are implemented as part of the sRNA-seq analysis program ShortStack, which is freely available under a general public license. Improved genome alignments of sRNA-seq data should increase the quality of downstream analyses and genome annotation efforts.Article Summary High-throughput sequencing of small RNAs (sRNA-seq) is a frequently used technique in the study of small RNAs. Alignment to a reference genome is a key step in processing sRNA-seq libraries, but suffers from enormous rates of multi-mapping reads. Current methods for sRNA-seq alignment either place these reads randomly or ignore them, both of which distort downstream analyses. Here, we describe a locality-based weighting approach to make better decisions of placement of multi-mapped sRNA-seq data, and test our implementation of this method. We find that our method gives superior performance in terms of placing multi-mapped sRNA-seq data. An implementation of our method is freely available within the ShortStack small RNA analysis program. Use of this method may dramatically improve genome-wide analyses of small RNAs. ER -