RT Journal Article SR Electronic T1 Shared genomic variants: identification of transmission routes using pathogen deep sequence data JF bioRxiv FD Cold Spring Harbor Laboratory SP 032458 DO 10.1101/032458 A1 Colin J. Worby A1 Marc Lipsitch A1 William P. Hanage YR 2015 UL http://biorxiv.org/content/early/2015/11/20/032458.abstract AB While identifying routes of transmission during an infectious disease outbreak was traditionally conducted through exhaustive contact tracing efforts, the increasing availability of pathogen sequencing has provided a new resource with which one can identify plausible routes of infection. However, while transmission clusters can be identified using single genome sequences, individual transmission routes remain relatively uncertain. Deep sequence data may provide additional information where single genomes lack sufficient resolution – presence of shared minor variants can suggest epidemiological linkage when observed between multiple hosts. In this study we formalize shared variant methods to reconstruct the transmission tree in an outbreak, and using simulated outbreak data, we quantify the improved accuracy when compared with analogous single genome approaches. Furthermore we propose a hybrid approach, drawing information from both deep sequence and single genome data. Our simulation studies demonstrate the superior performance of transmission tree identification methods using shared variants in most settings. Application of these methods to deep sequence data collected during the 2014 Sierra Leone Ebola epidemic demonstrates the ability to identify plausible transmission routes without any additional data. The methods we describe should become a common step in outbreak investigations and epidemiological analyses once the collection of deep sequence data becomes increasingly widespread.Sequencing pathogen samples during a communicable disease outbreak is becoming an increasingly common procedure in epidemiological investigations. Identifying who infected whom sheds considerable light on transmission patterns, high-risk settings and subpopulations, and infection control effectiveness. Genomic data can shed new light on transmission dynamics, and can be used to identify clusters of individuals likely to be linked by direct transmission. However, identification of individual sources of infection typically remains uncertain. In this study, we investigate the potential of deep sequence data to provide greater resolution on transmission routes. We describe easily implemented methods to use such data, and demonstrate the remarkably improved performance when reconstructing transmission trees. Furthermore, we apply our methods to data collected during the 2014 Ebola outbreak in Sierra Leone, identifying several routes of transmission. Our study highlights the power of pathogen deep sequence data as a component of outbreak investigation and epidemiological analyses.