Presented at Biology of Genomes 2017
Altmetric score 20.6 (top 5.4%)
Review content is open, signing review is optional.
Created on 27th January 2017
Motivation: Linked reads are a form of DNA sequencing commercialized by 10X Genomics that uses highly multiplexed barcoding within microdroplets to tag short reads to progenitor molecules. The linked reads, spanning tens to hundreds of kilobases, offer an alternative to long-read sequencing for de novo assembly, haplotype phasing and other applications. However, there is no available simulator, making it difficult to measure their capability or develop new informatics tools. Results: Our analysis of 13 real linked read datasets revealed their characteristics of barcodes, molecules and partitions. Based on this, we introduce LRSim that simulates linked reads by emulating the library preparation and sequencing process with fine control of 1) the number of simulated variants; 2) the linked-read characteristics; and 3) the Illumina reads profile. We conclude from the phasing and genome assembly of multiple datasets, recommendations on coverage, fragment length, and partitioning when sequencing human and non-human genome. Availability: LRSIM is under MIT license and is freely available at https://github.com/aquaskyline/LRSIMShow more
This paper has 0 completed reviews and 0 reviews in progress.