GetOrganelle: a simple and fast pipeline for de novo assembly of a complete circular chloroplast genome using genome skimming data

Jian-Jun Jian; Wen-Bin Yu; Jun-Bo Yang; Yu Song; Ting-Shuang Yi; De-Zhu Li

doi:10.1101/256479

Abstract

Background: Chloroplast genes and genomes are the most important genomic data for plant phylogeny and DNA barcoding. Since the rapid development of high throughput sequencing technologies, it is cheap to get the low coverage data of whole genome, which is enough to assemble a complete chloroplast genome. To date, there are many assembly processes/pipelines described to assemble a complete chloroplast genome. In this study, we reported a simple and fast procedure to assemble a circular chloroplast genome using GetOrganelle pipeline.

Findings: The GetOrganelle pipeline consists of four steps: 1) recruiting plastid-like reads; 2) de novo assembly using SPAdes; 3) filtering plastid-like contigs; and 4) visualizing and editing de novo assembly graph. Of them, the first three steps can be fulfilled automatically just using a combined command; and the fourth step is to visualize and evaluate the assemblies. Of 57 tested species with public datasets, we directly reassembled the circular chloroplast genome in 47 species. The eight non-circular species having break points, which may be caused by mononucleotide or dinonucleotide repeats, or small reads pool. In addition, we successfully assembled the circular chloroplast genome for the other 903 species of angiosperms using this pipeline, representing 41 families and 358 genera.

Conclusion: The GetOrganelle pipeline is an effective way for land plants to assemble the circular chloroplast genome, without needs for reference-guided scaffolding, gap filling nor start-end point closing. This pipeline can be also applied to assemble mitochondrial genomes and nuclear Ribosomal DNAs using genome skimming data.