TY - JOUR T1 - Genotyping Allelic and Copy Number Variation in the Immunoglobulin Heavy Chain Locus JF - bioRxiv DO - 10.1101/042226 SP - 042226 AU - Shishi Luo AU - Jane A. Yu AU - Yun S. Song Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/03/03/042226.abstract N2 - The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individuals. The immunoglobulin heavy variable (IGHV) locus, which plays an integral role in the adaptive immune response, is an example of a genomic region that is known to vary in gene copy number. Lack of standard methods to genotype this region prevents it from being included in association studies and is holding back the growing field of antibody repertoire analysis. Here, we establish a convention of representing the locus in terms of a reference panel of operationally distinguishable segments defined by hierarchical clustering. Using this reference set, we develop a pipeline that identifies copy number and allelic variation in the IGHV locus from whole-genome sequencing reads. Tests on simulated reads demonstrate that our approach is feasible and accurate for detecting the presence and absence of gene segments using reads as short as 70 bp. With reads 100 bp and longer, coverage depth can also be used to determine copy number. When applied to a family of European ancestry, our method finds new copy number variants and confirms existing variants. This study paves the way for analyzing population-level patterns of variation in the IGHV locus in larger diverse datasets and for quantitatively handling regions of copy number variation in other structurally varying and complex loci. ER -