Abstract
DNA methylation is an important gene regulatory mechanism that contributes to the genotype-phenotype relationship. Identifying genetic variants that are associated with methylation variation – an analysis commonly referred to as methylation quantitative trait locus (mQTL) mapping -- is therefore important for understanding the biological mechanisms underlying genotype-trait associations, and for investigating the potential causal or mediating effects of DNA methylation on phenotypic outcomes. However, existing approaches for mQTL mapping do not fare well in high-throughput sequencing-based data sets, as these approaches do not directly model the count generating process in sequencing studies and fail to take advantage of allele-specific methylation patterns. Here, we develop a new statistical method, IMAGE, together with a scalable computational inference algorithm, for mQTL mapping in sequencing-based studies. Our method properly accounts for the count nature of bisulfite sequencing data and incorporates allele-specific methylation patterns from heterozygous individuals to enable more powerful mQTL discovery. We compare IMAGE with existing approaches through extensive simulation. We also apply IMAGE to analyze two large-scale bisulfite sequencing studies of wild baboons and wild wolves, in which IMAGE identifies 50%-64% more mQTL than existing approaches. In both cases, mQTL are significantly depleted in CpG islands but enriched in shelf and open sea regions, suggesting that genetic variation is most likely to contribute to DNA methylation variation in regions of the genome with dynamic methylation patterns.