PT - JOURNAL ARTICLE AU - G. David Poznik TI - Identifying Y-chromosome haplogroups in arbitrarily large samples of sequenced or genotyped men AID - 10.1101/088716 DP - 2016 Jan 01 TA - bioRxiv PG - 088716 4099 - http://biorxiv.org/content/early/2016/11/19/088716.short 4100 - http://biorxiv.org/content/early/2016/11/19/088716.full AB - We have developed an algorithm to rapidly and accurately identify the Y-chromosome haplogroup of each male in a sample of one to millions. The algorithm, implemented in the yHaplo* software package (yHaplo), does not rely on any particular genotyping modality or platform. Full sequences yield the most granular haplogroup classifications, but genotyping arrays can yield reliable calls, provided a reasonable number of phylogenetically informative variants has been assayed. The algorithm is robust to missing data, genotype errors, mutation recurrence, and other complications. We have tested the software on full sequences from phase 3 of the 1000 Genomes Project and on subsets thereof constructed by downsampling to SNPs present on each of four genotyping arrays. We have also run the software on array data from more than 600,000 males.