TY - JOUR T1 - SAM/BAM format v1.5 extensions for <em>de novo</em> assemblies JF - bioRxiv DO - 10.1101/020024 SP - 020024 AU - Peter J. A. Cock AU - James Bonfield AU - Bastien Chevreux AU - Heng Li Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/05/29/020024.abstract N2 - Summary: The plain text Sequence Alignment/Map (SAM) file format and its companion binary form (BAM) are a generic alignment format for storing read alignments against reference sequences (and unmapped reads) together with structured meta-data (Li et al., 2009). Driven by the needs of the 1000 Genomes Project which sequenced many individual human genomes, early SAM/BAM usage focused on pairwise alignments of reads to a reference. However, through the CIGAR P operator multiple sequence alignments can also be preserved. Herein we describe clarifications and additions in version 1.5 of the specification to facilitate storing de novo sequence alignments: Padded reference sequences (with gap characters), annotation of reads or regions of the reference, and the option of embedding the reference sequence within the file.Availability: The latest public release of the specification is at http://samtools.sourceforge.net/SAM1.pdf, with in development drafts at https://github.com/samtools/hts-specs/ under version control.Contact: peter.cock{at}hutton.ac.uk ER -