The haplotype-resolved chromosome pairs and transcriptome of a heterozygous diploid African cassava cultivar

Weihong Qi; Yi-Wen Lim; Andrea Patrignani; Pascal Schläpfer; Anna Bratus-Neuenschwander; Simon Grüter; Christelle Chanez; Nathalie Rodde; Elisa Prat; Sonia Vautrin; Margaux-Alison Fustier; Diogo Pratas; Ralph Schlapbach; Wilhelm Gruissem

doi:10.1101/2021.11.16.468774

Abstract

Background Cassava (Manihot esculenta) is an important clonally propagated food crop in tropical and sub-tropical regions worldwide. Genetic gain by molecular breeding is limited because cassava has a highly heterozygous, repetitive and difficult to assemble genome.

Findings Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present two chromosome scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. Genome comparisons revealed extensive chromosome re-arrangements and abundant intra-genomic and inter-genomic divergent sequences despite high gene synteny, with most large structural variations being LTR-retrotransposon related. Allele-specific expression analysis of different tissues based on the haplotype-resolved transcriptome identified both stable and inconsistent alleles with imbalanced expression patterns, while most alleles expressed coordinately. Among tissue-specific differentially expressed transcripts, coordinately and biasedly regulated transcripts were functionally enriched for different biological processes. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding.

Conclusions The haplotype-resolved genome allows the first systematic view of the heterozygous diploid genome organization in cassava. The completely phased and annotated chromosome pairs will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy and continuity.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

↵• Equal contributions.
Author email addresses: weihong.qi{at}fgcz.ethz.ch
yi-wen.lim{at}biol.ethz.ch
andrea.patrignani{at}fgcz.ethz.ch
pascal.schlaepfer{at}biol.ethz.ch
anna.bratus{at}fgcz.ethz.ch
simon.oliver.grueter{at}fgcz.ethz.ch
christelle.chanez{at}biol.ethz.ch
nathalie.rodde{at}inrae.fr
elisa.prat{at}inrae.fr
sonia.vautrin{at}inrae.fr
margaux.fustier{at}inrae.fr
diogo.pratas{at}helsinki.fi
ralph.schlapbach{at}fgcz.ethz.ch
wilhelm_gruissem{at}ethz.ch
https://www.ebi.ac.uk/ena/browser/view/PRJEB43673?show=reads
https://www.ncbi.nlm.nih.gov/nuccore/?term=TME204+Mes-B
https://data.mendeley.com/datasets/fr6g4tgnfh/1#folder-dbb00a94-9bc5-4dad-a2bc-8da65fe270a0
https://www.ncbi.nlm.nih.gov/bioproject/758616
https://www.ncbi.nlm.nih.gov/bioproject/758615

List of abbreviations

ASE: allele-specific expression

BAC: bacterial artificial chromosome

BP: biological process

CCS: circular consensus sequence

CDS: coding sequence

CLR: continuous long reads

CMD: Cassava Mosaic Diseases

DE: differentially expressed/differential expression

DET: differentially expressed transcript

ENA: European Nucleotide Archive

GO: gene ontology

HiFi: high-fidelity

HMW: high molecular weight

Indel: insertion and deletion

IPA: improved Phased Assembler

MF: molecular function

NCBI: National Center for Biotechnology Information

numt’s: nuclear mitochondrial pseudogene regions

PacBio: Pacific Biosciences

PE: paired-end

QV: quality value

SMRT: Single Molecule Real-Time

SNP: single nucleotide polymorphism

SV: structural variation

TPM: transcript per million

UDI: Unique Dual Indices

VGP: the Vertebrate Genome Project

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.