Abstract
Extensive hyperpolymorphism and sequence similarity between the HLA genes make HLA type inference from whole-genome sequencing data a challenging problem. We address these by representing sequences from over 10,000 known alleles in a reference graph structure, enabling accurate read mapping. HLA*PRG, our algorithm, outperforms existing methods by a wide margin and for the first time consistently achieves the accuracy of gold-standard reference methods with one error across 158 alleles tested.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.