TY - JOUR T1 - A natural encoding of genetic variation in a Burrows-Wheeler Transform to enable mapping and genome inference JF - bioRxiv DO - 10.1101/059170 SP - 059170 AU - Sorina Maciuca AU - Carlos del Ojo Elias AU - Gil McVean AU - Zamin Iqbal Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/06/15/059170.abstract N2 - We show how positional markers can be used to encode genetic variation within a Burrows-Wheeler Transform (BWT), and use this to construct a generalisation of the traditional “reference genome”, incorporating known variation within a species. Our goal is to support the inference of the closest mosaic of previously known sequences to the genome(s) under analysis.Our scheme results in an increased alphabet size, and by using a wavelet tree encoding of the BWT we reduce the performance impact on rank operations. We give a specialised form of the backward search that allows variation-aware exact matching. We implement this, and demonstrate the cost of constructing an index of the whole human genome with 8 million genetic variants is 25GB of RAM. We also show that inferring a closer reference can close large kilobase-scale coverage gaps in P. falciparum. ER -