TY - JOUR T1 - Routes for breaching and protecting genetic privacy JF - bioRxiv DO - 10.1101/000042 SP - 000042 AU - Yaniv Erlich AU - Arvind Narayanan Y1 - 2013/01/01 UR - http://biorxiv.org/content/early/2013/11/07/000042.1.abstract N2 - We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.About the Authors Yaniv Erlich is a Fellow at the Whitehead Institute for Biomedical Research. Erlich received his Ph.D. from Cold Spring Harbor Laboratory in 2010 and B.Sc. from Tel-Aviv University in 2006. Prior to that, Erlich worked in computer security and was responsible for conducting penetration tests on financial institutes and commercial companies. Dr. Erlich’s research involves developing new algorithms for computational human genetics.Arvind Narayanan is an Assistant Professor in the Department of Computer Science and the Center for Information Technology and Policy at Princeton. He studies information privacy and security. His research has shown that data anonymization is broken in fundamental ways, for which he jointly received the 2008 Privacy Enhancing Technologies Award. His current research interests include building a platform for privacy-preserving data sharing.Broad data dissemination is essential for advancements in genetics, but also brings to light concerns regarding privacy.Privacy breaching techniques work by cross-referencing two or more pieces of information to gain new, potentially undesirable knowledge on individuals or their families.Broadly speaking, the main routes to breach privacy are identity tracing, attribute disclosure, and completion of sensitive DNA information.Identity tracing exploits quasi-identifiers in the DNA data or metadata to uncover the identity of an unknown genetic dataset.Attribute disclosure techniques work on known DNA datasets. They use the DNA information to link the identity of a person with a sensitive phenotype.Completion techniques also work on known DNA data. They try to uncover sensitive genomic areas that were masked to protect the participant.In the last few years, we have witnessed a rapid growth in the range of techniques and tools to conduct these privacy-breaching attacks. Currently, most of the techniques are beyond the reach of the general public, but can be executed by trained persons with varying degrees of effort.There is considerable debate regarding risk management. One camp supports a pragmatic, ad-hoc approach of privacy by obscurity and the other supports a systematic, mathematically-backed approach of privacy by design.Privacy by design algorithms include access control, differential privacy, and cryptographic techniques. So far, data custodians of genetic databases mainly adopted access control as a mitigation strategy.New developments in cryptographic techniques may usher in an additional arsenal of security by design techniques.SAFE HARBORA standard in the HIPAA Rule for de-identification of protected health information by removing 18 types of quasi-identifiers.HAPLOTYPESA set of alleles along the same chromosome.CRYPTOGRAPHIC HASHINGA procedure that yields a fixed length output from any size of input in a way that is hard to determine the input from the output.DICTIONARY ATTACKSA brute force approach to reverse cryptographic hashing by scanning the relatively small input space.TYPE I ERRORThe probability to obtain a positive answer from a negative item.LINKAGE EQUILIBRIUMAbsence of correlation between the alleles in two loci.POWERThe probability to obtain a positive answer for a positive item.SPECIFICITYThe probability to obtain a negative answer for a negative item.EFFECT SIZESIn quantitative traits, the contribution of a certain allele to the value of the trait.EXPRESSION QUANTITATIVE TRAIT LOCIGenetic variants associated with variability in gene expression.LINKAGE DISEQUILIBRIUMThe correlation between alleles in two loci.ALICE AND BOBCommon placeholders in cryptography to denote party A and party B. ER -