PT - JOURNAL ARTICLE AU - Phillip Wulfridge AU - Ben Langmead AU - Andrew P. Feinberg AU - Kasper D. Hansen TI - Choice of reference genome can introduce massive bias in bisulfite sequencing data AID - 10.1101/076844 DP - 2016 Jan 01 TA - bioRxiv PG - 076844 4099 - http://biorxiv.org/content/early/2016/09/22/076844.short 4100 - http://biorxiv.org/content/early/2016/09/22/076844.full AB - Mapping bias can be introduced in analysis of short read sequencing data, if sequence reads are aligned to a different genome than the sample genome. Here we study mapping bias in whole-genome bisulfite sequencing using data from inbred mice. We show that the choice of reference genome used for alignment can profoundly impact the inferred methylation state, both for high and low resolution analyses. This bias can result in falsely identifying thousands of differentially methylated regions and hundreds of megabases of large-scale methylation differences. We show that the direction of these biased methylation differences can be reversed by changing the reference genome, clearly establishing mapping bias as a primary cause. We develop a strategy we call personalize-then-smooth for removing the bias by coupling alignment to personal genomes, with post-alignment smoothing. The smoothing step can be viewed as imputation, and allows a differential analysis to include methylation sites which are only present in some samples. Our results have important implications for analysis of bisulfite converted DNA.WGBSwhole-genome bisulfite sequencingDMRdifferentially methylated regions.