Created on 11th January 2017
We report a high-quality assembly of the tilapia genome, a perciform fish important in aquaculture around the world. We sequenced a homozygous clonal XX female Nile tilapia (Oreochromis niloticus) to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. We then developed 37 candidate assemblies using two different algorithms and a variety of parameter settings. The quality of these assemblies was evaluated using likelihood scores calculated from paired-end sequencing at several spatial scales. Principal component analysis was used to select an optimal assembly that had a contig NG50 of 3.3Mbp. We used physical and genetic maps to anchor this assembly to linkage groups (LGs) and to identify 34 likely misassemblies. Each putative misassembly showed a signature consisting of high sequence variation in the aligned PacBio reads, as well as low physical coverage in a complementary 40kbp-insert Illumina library. The sites of these misjoins contained long (>50kbp) stretches of nested transposable element (TE) repeats and were fixed in the final assembly. Several of these regions border large centromeric satellite repeats, which have now been partially assembled for the first time. The number of annotated genes in the new assembly increased by 27.3% compared to a previous O. niloticus assembly. The overall repeat landscape of the tilapia genome, including recent TE insertions, is now well represented. The final anchored assembly has a contig NG50 of 3.1Mbp, and a total size of 1.01Gbp. A total of 868.6Mbp of the assembly contigs has been anchored to LGs. The new assembly provides insight into the structure of an ~9Mbp XY sex-determination region on LG1 in O. niloticus, and a large (~50Mbp) WZ sex-determination region on LG3 in the related species O. aureus. This study highlights new techniques for generating and validating high quality reference genome assemblies.Show more
This paper has 0 completed reviews and 0 reviews in progress.