RT Journal Article SR Electronic T1 Choice of reference genome can introduce massive bias in bisulfite sequencing data JF bioRxiv FD Cold Spring Harbor Laboratory SP 076844 DO 10.1101/076844 A1 Phillip Wulfridge A1 Ben Langmead A1 Andrew P. Feinberg A1 Kasper D. Hansen YR 2016 UL http://biorxiv.org/content/early/2016/09/22/076844.abstract AB Mapping bias can be introduced in analysis of short read sequencing data, if sequence reads are aligned to a different genome than the sample genome. Here we study mapping bias in whole-genome bisulfite sequencing using data from inbred mice. We show that the choice of reference genome used for alignment can profoundly impact the inferred methylation state, both for high and low resolution analyses. This bias can result in falsely identifying thousands of differentially methylated regions and hundreds of megabases of large-scale methylation differences. We show that the direction of these biased methylation differences can be reversed by changing the reference genome, clearly establishing mapping bias as a primary cause. We develop a strategy we call personalize-then-smooth for removing the bias by coupling alignment to personal genomes, with post-alignment smoothing. The smoothing step can be viewed as imputation, and allows a differential analysis to include methylation sites which are only present in some samples. Our results have important implications for analysis of bisulfite converted DNA.WGBSwhole-genome bisulfite sequencingDMRdifferentially methylated regions.