TY - JOUR T1 - Bootstrat: Population Informed Bootstrapping for Rare Variant Tests JF - bioRxiv DO - 10.1101/068999 SP - 068999 AU - Hailiang Huang AU - Gina M. Peloso AU - Daniel Howrigan AU - Barbara Rakitsch AU - Carl Johann Simon-Gabriel AU - Jacqueline I. Goldstein AU - Mark J. Daly AU - Karsten Borgwardt AU - Benjamin M. Neale Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/08/11/068999.abstract N2 - Recent advances in genotyping and sequencing technologies have made detecting rare variants in large cohorts possible. Various analytic methods for associating disease to rare variants have been proposed, including burden tests, C-alpha and SKAT. Most of these methods, however, assume that samples come from a homogeneous population, which is not realistic for analyses of large samples. Not correcting for population stratification causes inflated p-values and false-positive associations. Here we propose a population-informed bootstrap resampling method that controls for population stratification (Bootstrat) in rare variant tests. In essence, the Bootstrat procedure uses genetic distance to create a phenotype probability for each sample. We show that this empirical approach can effectively correct for population stratification while maintaining statistical power comparable to established methods of controlling for population stratification. The Bootstrat scheme can be easily applied to existing rare variant testing methods with reasonable computational complexity.Author Summary Recent technology advances have enabled large-scale analysis of rare variants, but properly testing rare variants remains a significant challenge as most rare variant testing methods assume a sample of homogenous ethnicity, an assumption often not true for large cohorts. Failure to account for this heterogeneity increases the type I error rate. Here we propose a bootstrap scheme applicable to most existing rare variant testing methods to control for population heterogeneity. This scheme uses a randomization layer to establish a null distribution of the test statistics while preserving the sample genetic relationships. The null distribution is then used to calculate an empirical p-value that accounts for population heterogeneity. We demonstrate how this scheme successfully controls the type I error rate without loss of statistical power. ER -