PT - JOURNAL ARTICLE AU - René L. Warren AU - Benjamin P. Vandervalk AU - Steven J.M. Jones AU - Inanç Birol TI - LINKS: Scaffolding genome assemblies with kilobase-long nanopore reads AID - 10.1101/016519 DP - 2015 Jan 01 TA - bioRxiv PG - 016519 4099 - http://biorxiv.org/content/early/2015/03/13/016519.short 4100 - http://biorxiv.org/content/early/2015/03/13/016519.full AB - Motivation: Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. Established and emerging long read technologies show great promise in this regard, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics preprocessing before they could be of value. We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a solution that makes use of the information in error-rich long reads, without the need for read alignment or base correction. We show how the contiguity of an ABySS E. coli K-12 genome assembly could be increased over five-fold by the use of beta-released Oxford Nanopore Ltd. (ONT) long reads and how LINKS leverages long-range information in S. cerevisiae W303 ONT reads to yield an assembly with less than half the errors of competing applications. Re-scaffolding the colossal white spruce assembly draft (PG29, 20 Gbp) and how LINKS scales to larger genomes is also presented. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts.Availability: http://www.bcgsc.ca/bioinformatics/software/linksContact: rwarren@bcgsc.ca