Abstract
The accurate characterization of ancestry is essential to interpret and integrate human genomics data and for individuals from all ancestral backgrounds to benefit from advances in the field. However, there are no established guidelines for the consistent, unambiguous description of ancestry. To fill this gap and increase standardization, we developed a framework that is applicable to all human genomics studies and resources. In this report we describe the framework and its use to curate all 2,854 NHGRI-EBI GWAS Catalog publications. We demonstrate the broader relevance through its application to populations in projects such as HapMap and 1000 Genomes. We outline recommendations for authors on the implementation of our method and urge that, wherever possible, ancestry be determined using genomic methods. Finally, we present an analysis of the ancestry of individuals, studies and associations included in the Catalog. While the known bias towards inclusion of European ancestry individuals persists, African and Hispanic or Latin American ancestry populations contribute disproportionately more associations than expected. We thus encourage the scientific community to target future GWAS and other discovery studies to under-represented groups, which, in addition to being intrinsically merited, may also be more effective at identifying new associations. Widespread adoption of the framework presented here will enable improved analysis, interpretation and integration of data and ultimately, further our understanding of disease.
Footnotes
Author information Lucia A. Hindorff and Jacqueline A.L. MacArthur share joint last authorship of this manuscript.