TY - JOUR T1 - Measuring the Landscape of CpG Methylation of Individual Repetitive Elements JF - bioRxiv DO - 10.1101/018531 SP - 018531 AU - Yuta Suzuki AU - Jonas Korlach AU - Stephen W. Turner AU - Tatsuya Tsukahara AU - Junko Taniguchi AU - Wei Qu AU - Kazuki Ichikawa AU - Jun Yoshimura AU - Hideaki Yurino AU - Yuji Takahashi AU - Jun Mitsui AU - Hiroyuki Ishiura AU - Shoji Tsuji AU - Hiroyuki Takeda AU - Shinichi Morishita Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/09/14/018531.abstract N2 - Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it performs long read lengths, and its kinetic information is sensitive to DNA modifications. We propose a novel algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Both the sensitivity and precision of our algorithm were ∼93.7% on CpG site basis for the genome of an inbred medaka (Oryzias latipes) strain within a practical read coverage of ∼30-fold. The method is quantitatively accurate because we observed a high correlation coefficient (R = 0.884) between our method and bisulfite sequencing, and 92.0% of CpG sites were in concordance within 0.25. Using this method, we characterized the landscape of the methylation status of repetitive elements, such as LINEs, in the human genome, thereby revealing the strong correlation between CpG density and unmethylation and detecting unmethylation hot spots of LTRs and LINEs. We could uncover the methylation states for nearly identical active transposons, two novel LINE insertions of identity ∼99% and length 6050 base pairs (bp) in the human genome, and sixteen Tol2 elements of identity >99.8% and length 4682 bp in the medaka genome. ER -