PT - JOURNAL ARTICLE AU - Ari Löytynoja AU - Nick Goldman TI - Short template switch events explain mutation clusters in the human genome AID - 10.1101/038380 DP - 2016 Jan 01 TA - bioRxiv PG - 038380 4099 - http://biorxiv.org/content/early/2016/09/27/038380.short 4100 - http://biorxiv.org/content/early/2016/09/27/038380.full AB - Resequencing efforts are uncovering the extent of genetic variation in humans and provide data to study the evolutionary processes shaping our genome. One recurring puzzle in both intra- and inter-species studies is the high frequency of complex mutations comprising multiple nearby base substitutions or insertion-deletions. We devised a generalized mutation model of template switching during replication that extends existing models of genome rearrangement, and used this to study the role of template switch events in the origin of such mutation clusters. Applied to the human genome, our model detects thousands of template switch events during the evolution of human and chimp from their common ancestor, and hundreds of events between two independently sequenced human genomes. While many of these are consistent with the template switch mechanism previously proposed for bacteria but not thought significant in higher organisms, our model also identifies new types of mutations that create short inversions, some flanked by paired inverted repeats. The local template switch process can create numerous complex mutation patterns, including hairpin loop structures, and explains multi-nucleotide mutations and compensatory substitutions without invoking positive selection, complicated and speculative mechanisms, or implausible coincidence. Clustered sequence differences are challenging for mapping and variant calling methods, and we show that detection of mutation clusters with current resequencing methodologies is difficult and many erroneous variant annotations exist in human reference data. Template switch events such as those we have uncovered may have been neglected as an explanation for complex mutations because of biases in commonly used analyses. Incorporation of our model into reference-based analysis pipelines and comparisons of de novo-assembled genomes will lead to improved understanding of genome variation and evolution.