RT Journal Article SR Electronic T1 SAM/BAM format v1.5 extensions for de novo assemblies JF bioRxiv FD Cold Spring Harbor Laboratory SP 020024 DO 10.1101/020024 A1 Peter J. A. Cock A1 James Bonfield A1 Bastien Chevreux A1 Heng Li YR 2015 UL http://biorxiv.org/content/early/2015/05/29/020024.abstract AB Summary: The plain text Sequence Alignment/Map (SAM) file format and its companion binary form (BAM) are a generic alignment format for storing read alignments against reference sequences (and unmapped reads) together with structured meta-data (Li et al., 2009). Driven by the needs of the 1000 Genomes Project which sequenced many individual human genomes, early SAM/BAM usage focused on pairwise alignments of reads to a reference. However, through the CIGAR P operator multiple sequence alignments can also be preserved. Herein we describe clarifications and additions in version 1.5 of the specification to facilitate storing de novo sequence alignments: Padded reference sequences (with gap characters), annotation of reads or regions of the reference, and the option of embedding the reference sequence within the file.Availability: The latest public release of the specification is at http://samtools.sourceforge.net/SAM1.pdf, with in development drafts at https://github.com/samtools/hts-specs/ under version control.Contact: peter.cock{at}hutton.ac.uk