RT Journal Article
SR Electronic
T1 PCADAPT: An R Package to Perform Genome Scans for Selection Based on Principal Component Analysis
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 056135
DO 10.1101/056135
A1 Keurcien Luu
A1 Eric Bazin
A1 Michael G.B. Blum
YR 2016
UL http://biorxiv.org/content/early/2016/05/30/056135.abstract
AB We introduce the R package pcadapt that performs genome scans to detect genes under selection based on population genomic data. The statistical method implemented in pcadapt assumes that markers excessively related with population structure are candidates for local adaptation. Because population structure is ascertained with principal component analysis (PCA), the package is fast and can handle large-scale data generated with next-generation technologies. It can also handle missing data as well as data obtained from pooled sequencing. By contrast to population-based approaches, the package can handle admixed individuals and does not require to group individuals into predefined populations. Using data simulated under an island model, a divergence model and range expansion, we compare pcadapt to other software performing genome scans (BayeScan, hapflk, OutFLANK, sNMF). For the different software, the average proportion of false discoveries is around the nominal false discovery rate set at 10% with the exception of BayeScan that generates 40% of false discoveries. When comparing statistical power for a realized percentage of false discoveries, we find that the power of BayeScan can be severely impacted by the presence of admixed individuals whereas pcadapt is not impacted. Last, we show that pcadapt is the most powerful method in a model of range expansion where population structure is continuous. Because pcadapt can handle molecular data generated with next sequencing technologies, we anticipate that it will be a valuable tool for modern analysis in molecular ecology.