The complete sequence of a human Y chromosome
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, and segmental duplications1–3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4, 5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the complete ampliconic structures of TSPY, DAZ, and RBMY gene families; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the CHM13 genome4 and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Competing Interest Statement
S.N. is now an employee of Oxford Nanopore Technologies; S.K. has received travel funds to speak at events hosted by Oxford Nanopore Technologies; A.F. is an employee of DNAnexus; C.-S.C. is an employee of GeneDX Holdings Corp.; N.-C.C. is an employee of Exai Bio; L.F.P. receives research support from Genetech; F.J.S. receives research support from Pacific Biosciences, Oxford Nanopore Technologies, Illumina, and Genetech; K.S. is an employee of Google LLC and owns Alphabet stock as part of the standard compensation package; W.T. has two patents (8,748,091 and 8,394,584) licensed to Oxford Nanopore Technologies; E.E.E. is a scientific advisory board member of Variant Bio, Inc. All other authors declare no competing interests.
Footnotes
↵# Retired
The manuscript has been updated to reflect updates in gene annotations and to make the manuscript more succinct.
Subject Area
- Biochemistry (11730)
- Bioengineering (8743)
- Bioinformatics (29179)
- Biophysics (14964)
- Cancer Biology (12080)
- Cell Biology (17399)
- Clinical Trials (138)
- Developmental Biology (9417)
- Ecology (14174)
- Epidemiology (2067)
- Evolutionary Biology (18294)
- Genetics (12233)
- Genomics (16791)
- Immunology (11858)
- Microbiology (28051)
- Molecular Biology (11575)
- Neuroscience (60919)
- Paleontology (451)
- Pathology (1870)
- Pharmacology and Toxicology (3238)
- Physiology (4955)
- Plant Biology (10422)
- Synthetic Biology (2881)
- Systems Biology (7338)
- Zoology (1650)