PT - JOURNAL ARTICLE AU - Mikhail Lipatov AU - Komal Sanjeev AU - Rob Patro AU - Krishna R Veeramah TI - Maximum Likelihood Estimation of Biological Relatedness from Low Coverage Sequencing Data AID - 10.1101/023374 DP - 2015 Jan 01 TA - bioRxiv PG - 023374 4099 - http://biorxiv.org/content/early/2015/07/29/023374.short 4100 - http://biorxiv.org/content/early/2015/07/29/023374.full AB - The inference of biological relatedness from DNA sequence data has a wide array of applications, such as in the study of human disease, anthropology and ecology. One of the most common analytical frameworks for performing this inference is to genotype individuals for large numbers of independent genomewide markers and use population allele frequencies to infer the probability of identity-by-descent (IBD) given observed genotypes. Current implementations of this class of methods assume genotypes are known without error. However, with the advent of 2nd generation sequencing data there are now an increasing number of situations where the confidence attached to a particular genotype may be poor because of low coverage. Such scenarios may lead to biased estimates of the kinship coefficient, ε We describe an approach that utilizes genotype likelihoods rather than a single observed best genotype to estimate ϕ and demonstrate that we can accurately infer relatedness in both simulated and real 2nd generation sequencing data from a wide variety of human populations down to at least the third degree when coverage is as low as 2x for both individuals, while other commonly used methods such as PLINK exhibit large biases in such situations. In addition the method appears to be robust when the assumed population allele frequencies are diverged from the true frequencies for realistic levels of genetic drift. This approach has been implemented in the C++ software lcMLkin.