TY - JOUR T1 - A High Quality Assembly of the Nile Tilapia (<em>Oreochromis niloticus</em>) Genome Reveals the Structure of Two Sex Determination Regions JF - bioRxiv DO - 10.1101/099564 SP - 099564 AU - Matthew A Conte AU - William J Gammerdinger AU - Kerry L Bartie AU - David J Penman AU - Thomas D Kocher Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/10/099564.abstract N2 - We report a high-quality assembly of the tilapia genome, a perciform fish important in aquaculture around the world. We sequenced a homozygous clonal XX female Nile tilapia (Oreochromis niloticus) to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. We then developed 37 candidate assemblies using two different algorithms and a variety of parameter settings. The quality of these assemblies was evaluated using likelihood scores calculated from paired-end sequencing at several spatial scales. Principal component analysis was used to select an optimal assembly that had a contig NG50 of 3.3Mbp. We used physical and genetic maps to anchor this assembly to linkage groups (LGs) and to identify 34 likely misassemblies. Each putative misassembly showed a signature consisting of high sequence variation in the aligned PacBio reads, as well as low physical coverage in a complementary 40kbp-insert Illumina library. The sites of these misjoins contained long (&gt;50kbp) stretches of nested transposable element (TE) repeats and were fixed in the final assembly. Several of these regions border large centromeric satellite repeats, which have now been partially assembled for the first time. The number of annotated genes in the new assembly increased by 27.3% compared to a previous O. niloticus assembly. The overall repeat landscape of the tilapia genome, including recent TE insertions, is now well represented. The final anchored assembly has a contig NG50 of 3.1Mbp, and a total size of 1.01Gbp. A total of 868.6Mbp of the assembly contigs has been anchored to LGs. The new assembly provides insight into the structure of an ∼9Mbp XY sex-determination region on LG1 in O. niloticus, and a large (∼50Mbp) WZ sex-determination region on LG3 in the related species O. aureus. This study highlights new techniques for generating and validating high quality reference genome assemblies.Author Summary Tilapias are the second most farmed fishes in the world and a sustainable source of food. Programs to genetically improve these species will benefit from an improved reference sequence for the genome of the Nile tilapia, O. niloticus. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods. We use a variety of statistics to demonstrate the quality and completeness of the new assembly. Finally, we use the new assembly to characterize genetic differences between males and females in two tilapia species that have different sex determination mechanisms. In each species, differences between the sexes extend over many megabases of DNA sequence. This study provides a framework for identifying sex-determining genes in tilapia and related fish species. ER -