RT Journal Article SR Electronic T1 Selecting Reads for Haplotype Assembly JF bioRxiv FD Cold Spring Harbor Laboratory SP 046771 DO 10.1101/046771 A1 Sarah O. Fischer A1 Tobias Marschall YR 2016 UL http://biorxiv.org/content/early/2016/04/06/046771.abstract AB Haplotype assembly or read-based phasing is the problem of reconstructing both haplotypes of a diploid genome from next-generation sequencing data. This problem is formalized as the Minimum Error Correction (MEC) problem and can be solved using algorithms such as WhatsHap. The runtime of WhatsHap is exponential in the maximum coverage, which is hence controlled in a pre-processing step that selects reads to be used for phasing. Here, we report on a heuristic algorithm designed to choose beneficial reads for phasing, in particular to increase the connectivity of the phased blocks and the number of correctly phased variants compared to the random selection previously employed in by WhatsHap. The algorithm we describe has been integrated into the WhatsHap software, which is available under MIT licence from https://bitbucket.org/whatshap/whatshap.