ABSTRACT
Longitudinal deep sequencing of viruses can provide detailed information about intra-host evolutionary dynamics including how viruses interact with and transmit between hosts. Many analyses require haplotype reconstruction, identifying which variants are co-located on the same genomic element. Most current methods to perform this reconstruction are based on a high density of variants and cannot perform this reconstruction for slowly evolving viruses. We present a new approach, HaROLD (HAplotype Reconstruction Of Longitudinal Deep sequencing data), which performs this reconstruction based on identifying co-varying variant frequencies using a probabilistic framework. We test this method with synthetic data sets of mixed cytomegalovirus and norovirus genomes, demonstrating high accuracy when longitudinal samples are available.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Method has been expanded to include refinement process. Further work has been done to validate method and characterise results.