RT Journal Article SR Electronic T1 A natural encoding of genetic variation in a Burrows-Wheeler Transform to enable mapping and genome inference JF bioRxiv FD Cold Spring Harbor Laboratory SP 059170 DO 10.1101/059170 A1 Sorina Maciuca A1 Carlos del Ojo Elias A1 Gil McVean A1 Zamin Iqbal YR 2016 UL http://biorxiv.org/content/early/2016/06/15/059170.abstract AB We show how positional markers can be used to encode genetic variation within a Burrows-Wheeler Transform (BWT), and use this to construct a generalisation of the traditional “reference genome”, incorporating known variation within a species. Our goal is to support the inference of the closest mosaic of previously known sequences to the genome(s) under analysis.Our scheme results in an increased alphabet size, and by using a wavelet tree encoding of the BWT we reduce the performance impact on rank operations. We give a specialised form of the backward search that allows variation-aware exact matching. We implement this, and demonstrate the cost of constructing an index of the whole human genome with 8 million genetic variants is 25GB of RAM. We also show that inferring a closer reference can close large kilobase-scale coverage gaps in P. falciparum.