PT - JOURNAL ARTICLE AU - Adriana Sperlea AU - Jason Ernst TI - Systematic Discovery of Conservation States for Single-Nucleotide Annotation of the Human Genome AID - 10.1101/262097 DP - 2018 Jan 01 TA - bioRxiv PG - 262097 4099 - http://biorxiv.org/content/early/2018/04/17/262097.short 4100 - http://biorxiv.org/content/early/2018/04/17/262097.full AB - Comparative genomics sequence data is an important source of information for interpreting genomes. Genome-wide annotations based on this data have largely focused on univariate scores or binary calls of evolutionary constraint. Here we present a complementary whole genome annotation approach, ConsHMM, which applies a multivariate hidden Markov model to learn de novo different ‘conservation states’ based on the combinatorial and spatial patterns of which species align to and match a reference genome in a multiple species DNA sequence alignment. We applied ConsHMM to a 100-way vertebrate sequence alignment to annotate the human genome at single nucleotide resolution into 100 different conservation states. These states have distinct enrichments for other genomic information including gene annotations, chromatin states, and repeat families, which were used to characterize their biological significance. Conservation states have greater or complementary predictive information than standard constraint based measures for a variety of genome annotations. Bases in constrained elements have distinct heritability enrichments depending on the conservation state assignment demonstrating their relevance to analyzing phenotypic associated variation. The conservation states also highlight similarities and differences between constrained bases identified based on inter and intra species approaches. The ConsHMM method and conservation state annotations provide a valuable resource for interpreting genetic variation.