RT Journal Article SR Electronic T1 Effect of error and missing data on population structure inference using microsatellite data JF bioRxiv FD Cold Spring Harbor Laboratory SP 080630 DO 10.1101/080630 A1 Patrick A. Reeves A1 Cheryl L. Bowker A1 Christa E. Fettig A1 Luke R. Tembrock A1 Christopher M. Richards YR 2016 UL http://biorxiv.org/content/early/2016/10/13/080630.abstract AB Missing data and genotyping errors are common in microsatellite data sets. We used simulated data to quantify the effect of these data aberrations on the accuracy of population structure inference. Data sets with complex, randomly-generated, population histories were simulated under the coalescent. Models describing the characteristic patterns of missing data and genotyping error in real microsatellite data sets were used to modify the simulated data sets. Accuracy of ordination, tree-based, and model-based methods of inference was evaluated before and after data set modifications. The ability to recover correct population clusters decreased as missing data increased. The rate of decrease was similar among analytical procedures, thus no single analytical approach was preferable. For every 1% of a data matrix that contained missing genotypes, 2–4% fewer correct clusters were found. For every 1% of a matrix that contained erroneous genotypes, 1–2% fewer correct clusters were found using ordination and tree-based methods. Model-based procedures that minimize the deviation from Hardy-Weinberg equilibrium in order to assign individuals to clusters performed better as genotyping error increased. We attribute this surprising result to the inbreeding-like nature of microsatellite genotyping error, wherein heterozygous genotypes are mischaracterized as homozygous. We show that genotyping error elevates estimates of the level of genetic admixture. Overall, missing data negatively impact population structure inference more than typical genotyping errors.