RT Journal Article SR Electronic T1 Assembly of polymorphic Alu repeat sequences from whole genome sequence data in diverse humans JF bioRxiv FD Cold Spring Harbor Laboratory SP 014977 DO 10.1101/014977 A1 Julia H. Wildschutte A1 Alayna Baron A1 Nicolette M. Diroff A1 Jeffrey M. Kidd YR 2015 UL http://biorxiv.org/content/early/2015/02/06/014977.abstract AB Alu insertions have contributed to >11% of the human genome. About ∼30-35 Alu subfamilies remain actively mobile, and are recognized as major drivers of genetic variation and disease. Sophisticated computational methods permit identification of non-reference insertions based on specific signatures from whole genome sequencing data, but reporting of entire insertion sequences is limited. We build on existing methods and develop an approach that combines Alu detection and de novo assembly of WGS data to reconstruct the full sequence of insertion events. Using this approach, we generate a highly accurate call set of 1,614 completely assembled Alu variants from 53 samples from the Human Genome Diversity Project panel. Experimental validation of 30 sites shows 100% this method produces a highly accurate call set that accurately reconstructs insertion sequence. We utilize the reconstructed alternative insertion haplotypes to genotype 1,010 fully assembled insertions, obtaining >99% accuracy. We find evidence of insertion by non-classical mechanisms and observe 5’ truncation in 16% of AluYa5 and AluYb8 insertions. The sites of truncation coincide with stem-loop structures and SRP9/14 binding sites in the Alu RNA, implicating L1 ORF2p pausing in the generation of 5’ truncations.