PT - JOURNAL ARTICLE AU - Thomas Willems AU - Melissa Gymrek AU - G. David Poznik AU - Chris Tyler-Smith AU - The 1000 Genomes Project Y-Chromosome Working Group AU - Yaniv Erlich TI - Chromosome-Wide Characterization of Y-STR Mutation Rates AID - 10.1101/036590 DP - 2016 Jan 01 TA - bioRxiv PG - 036590 4099 - http://biorxiv.org/content/early/2016/02/14/036590.short 4100 - http://biorxiv.org/content/early/2016/02/14/036590.full AB - Short Tandem Repeats (STRs) are mutation-prone loci that span nearly 1% of the human genome. Previous studies have estimated the mutation rates of highly polymorphic STRs using capillary electrophoresis and pedigree-based designs. While this work has provided insights into the mutational dynamics of highly mutable STRs, the mutation rates of most others remain unknown. Here, we harnessed whole-genome sequencing data to estimate the mutation rates of more than 4,500 Y-chromosome STRs (Y-STRs) with 2-6 base pair repeat units. To this end, we developed MUTEA, a new algorithm that infers STR mutation rates from population-scale high-throughput sequencing data using a high-resolution SNP-based phylogeny. After extensive intrinsic and extrinsic validations, we used MUTEA to estimate the mutation rates of STRs across the Y-chromosome using data from the 1000 Genomes Project and the Simons Genome Diversity Project. In total, we analyzed evolutionary data for over 222,000 meioses to yield the largest set of Y-STR mutation rate estimates to date. We found that the average mutation rate of polymorphic Y-STRs is an order of magnitude lower than estimates from prior studies. Using our ascertainment-free estimates, we identified determinants of STR mutation rates and built a model to predict rates for STRs across the genome. Our projection indicates that the load of de novo STR mutations exceeds the load of all other known variants. We also identified new Y-STRs for forensics and genetic genealogy, assessed the ability to differentiate between the Y-chromosomes of father-son pairs, and imputed Y-STR genotypes.