Author: Ruibang Luo
Research area: bioinformatics

LRSim: a Linked Reads Simulator generating insights for better genome partitioning

Created on 27th January 2017

Ruibang Luo; Fritz J Sedlazeck; Charlotte Darby; Stephen Kelly; Michael Schatz

Motivation: Linked reads are a form of DNA sequencing commercialized by 10X Genomics that uses highly multiplexed barcoding within microdroplets to tag short reads to progenitor molecules. The linked reads, spanning tens to hundreds of kilobases, offer an alternative to long-read sequencing for de novo assembly, haplotype phasing and other applications. However, there is no available simulator, making it difficult to measure their capability or develop new informatics tools. Results: Our analysis of 13 real linked read datasets revealed their characteristics of barcodes, molecules and partitions. Based on this, we introduce LRSim that simulates linked reads by emulating the library preparation and sequencing process with fine control of 1) the number of simulated variants; 2) the linked-read characteristics; and 3) the Illumina reads profile. We conclude from the phasing and genome assembly of multiple datasets, recommendations on coverage, fragment length, and partitioning when sequencing human and non-human genome. Availability: LRSIM is under MIT license and is freely available at

