RT Journal Article SR Electronic T1 A natural encoding of genetic variation in a Burrows-Wheeler Transform to enable mapping and genome inference JF bioRxiv FD Cold Spring Harbor Laboratory SP 059170 DO 10.1101/059170 A1 Sorina Maciuca A1 Carlos del Ojo Elias A1 Gil McVean A1 Zamin Iqbal YR 2016 UL http://biorxiv.org/content/early/2016/07/25/059170.abstract AB We show how positional markers can be used to encode genetic variation within aBurrows-Wheeler Transform (BWT), and use this to construct a generalisation ofthe traditional “reference genome”, incorporating known variation within aspecies. Our goal is to support the inference of the closest mosaic of previouslyknown sequences to the genome(s) under analysis.Our scheme results in an increased alphabet size, and by using a wavelet tree encoding of the BWT we reduce the performance impact on rank operations. We give a specialised form of the backward search that allows variation-aware exact matching. We implement this, and demonstrate the cost of constructing an index of the whole human genome with 8 million genetic variants is 25GB of RAM. We also show that inferring a closer reference can close large kilobase-scale coverage gaps in P. falciparum.