Abstract
The high-throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged a wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for the latest generation of lllumina sequencing machines. This may make future use of the newest generation of platforms prohibitive, particularly in studies that rely on low quantity and quality samples, such as historical and archaeological specimens. Here, we rely on barcodes, short sequences that are ligated to both ends of the DNA insert, to directly quantify the rate of index hopping in 100-year old museum-preserved gorilla (Gorilla beringei) samples. Correcting for multiple sources of noise, we identify on average 0.470% of reads containing a hopped index. We show that sample-specific quantity of misassigned reads depend on the number of reads that any given sample contributes to the total sequencing pool, so that samples with few sequenced reads receive the greatest proportion of misassigned reads. Ancient DNA samples are particularly affected, since they often differ widely in endogenous content. Through extensive simulations we show that even low index-hopping rates lead to biases in ancient DNA studies when multiplexing samples with different quantities of input material.