RT Journal Article SR Electronic T1 Bootstrat: Population Informed Bootstrapping for Rare Variant Tests JF bioRxiv FD Cold Spring Harbor Laboratory SP 068999 DO 10.1101/068999 A1 Hailiang Huang A1 Gina M. Peloso A1 Daniel Howrigan A1 Barbara Rakitsch A1 Carl Johann Simon-Gabriel A1 Jacqueline I. Goldstein A1 Mark J. Daly A1 Karsten Borgwardt A1 Benjamin M. Neale YR 2016 UL http://biorxiv.org/content/early/2016/08/11/068999.abstract AB Recent advances in genotyping and sequencing technologies have made detecting rare variants in large cohorts possible. Various analytic methods for associating disease to rare variants have been proposed, including burden tests, C-alpha and SKAT. Most of these methods, however, assume that samples come from a homogeneous population, which is not realistic for analyses of large samples. Not correcting for population stratification causes inflated p-values and false-positive associations. Here we propose a population-informed bootstrap resampling method that controls for population stratification (Bootstrat) in rare variant tests. In essence, the Bootstrat procedure uses genetic distance to create a phenotype probability for each sample. We show that this empirical approach can effectively correct for population stratification while maintaining statistical power comparable to established methods of controlling for population stratification. The Bootstrat scheme can be easily applied to existing rare variant testing methods with reasonable computational complexity.Author Summary Recent technology advances have enabled large-scale analysis of rare variants, but properly testing rare variants remains a significant challenge as most rare variant testing methods assume a sample of homogenous ethnicity, an assumption often not true for large cohorts. Failure to account for this heterogeneity increases the type I error rate. Here we propose a bootstrap scheme applicable to most existing rare variant testing methods to control for population heterogeneity. This scheme uses a randomization layer to establish a null distribution of the test statistics while preserving the sample genetic relationships. The null distribution is then used to calculate an empirical p-value that accounts for population heterogeneity. We demonstrate how this scheme successfully controls the type I error rate without loss of statistical power.