PT - JOURNAL ARTICLE AU - Keurcien Luu AU - Eric Bazin AU - Michael G.B. Blum TI - PCADAPT: An R Package to Perform Genome Scans for Selection Based on Principal Component Analysis AID - 10.1101/056135 DP - 2016 Jan 01 TA - bioRxiv PG - 056135 4099 - http://biorxiv.org/content/early/2016/05/30/056135.short 4100 - http://biorxiv.org/content/early/2016/05/30/056135.full AB - We introduce the R package pcadapt that performs genome scans to detect genes under selection based on population genomic data. The statistical method implemented in pcadapt assumes that markers excessively related with population structure are candidates for local adaptation. Because population structure is ascertained with principal component analysis (PCA), the package is fast and can handle large-scale data generated with next-generation technologies. It can also handle missing data as well as data obtained from pooled sequencing. By contrast to population-based approaches, the package can handle admixed individuals and does not require to group individuals into predefined populations. Using data simulated under an island model, a divergence model and range expansion, we compare pcadapt to other software performing genome scans (BayeScan, hapflk, OutFLANK, sNMF). For the different software, the average proportion of false discoveries is around the nominal false discovery rate set at 10% with the exception of BayeScan that generates 40% of false discoveries. When comparing statistical power for a realized percentage of false discoveries, we find that the power of BayeScan can be severely impacted by the presence of admixed individuals whereas pcadapt is not impacted. Last, we show that pcadapt is the most powerful method in a model of range expansion where population structure is continuous. Because pcadapt can handle molecular data generated with next sequencing technologies, we anticipate that it will be a valuable tool for modern analysis in molecular ecology.