PT - JOURNAL ARTICLE AU - Milan Malinsky AU - Jared T. Simpson AU - Richard Durbin TI - <kbd>trio-sga</kbd>: facilitating <em>de novo</em> assembly of highly heterozygous genomes with parent-child trios AID - 10.1101/051516 DP - 2016 Jan 01 TA - bioRxiv PG - 051516 4099 - http://biorxiv.org/content/early/2016/05/03/051516.short 4100 - http://biorxiv.org/content/early/2016/05/03/051516.full AB - Motivation Most DNA sequence in diploid organisms is found in two copies, one contributed by the mother and the other by the father. The high density of differences between the maternally and paternally contributed sequences (heterozygous sites) in some organisms makes de novo genome assembly very challenging, even for algorithms specifically designed to deal with these cases. Therefore, various approaches, most commonly inbreeding in the laboratory, are used to reduce heterozygosity in genomic data prior to assembly. However, many species are not amenable to these techniques.Results We introduce trio-sga, a set of three algorithms designed to take advantage of mother-father-offspring trio sequencing to facilitate better quality genome assembly in organisms with moderate to high levels of heterozygosity. Two of the algorithms use haplotype phase information present in the trio data to eliminate the majority of heterozygous sites before the assembly commences. The third algorithm is designed to reduce sequencing costs by enabling the use of parents’ reads in the assembly of the genome of the offspring. We test these algorithms on a ‘simulated trio’ from four hap-loid datasets, and further demonstrate their performance by assembling three highly heterozygous Heliconius butterfly genomes. While the implementation of trio-sga is tuned towards Illumina-generated data, we note that the trio approach to reducing heterozygosity is likely to have cross-platform utility for de novo assembly.