RT Journal Article SR Electronic T1 Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data JF bioRxiv FD Cold Spring Harbor Laboratory SP 117812 DO 10.1101/117812 A1 Julie A McMurry A1 Nick Juty A1 Niklas Blomberg A1 Tony Burdett A1 Tom Conlin A1 Nathalie Conte A1 Mélanie Courtot A1 John Deck A1 Michel Dumontier A1 Donal K Fellows A1 Alejandra Gonzalez-Beltran A1 Philipp Gormanns A1 Jeffrey Grethe A1 Janna Hastings A1 Henning Hermjakob A1 Jean-Karim Hériché A1 Jon C Ison A1 Rafael C Jimenez A1 Simon Jupp A1 John Kunze A1 Camille Laibe A1 Nicolas Le Novère A1 James Malone A1 Maria Jesus Martin A1 Johanna R McEntyre A1 Chris Morris A1 Juha Muilu A1 Wolfgang Müller A1 Philippe Rocca-Serra A1 Susanna-Assunta Sansone A1 Murat Sariyar A1 Jacky L Snoep A1 Natalie J Stanford A1 Stian Soiland-Reyes A1 Neil Swainston A1 Nicole Washington A1 Alan R Williams A1 Sarala Wimalaratne A1 Lilly Winfree A1 Katherine Wolstencroft A1 Carole Goble A1 Christopher J Mungall A1 Melissa A Haendel A1 Helen Parkinson YR 2017 UL http://biorxiv.org/content/early/2017/03/20/117812.abstract AB In many disciplines, data is highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline ten lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers; we also outline important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.