NPBSS: A new PacBio sequencing simulator for generating the continuous long reads with an empirical model

29Citations
Citations of this article
66Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: PacBio sequencing platform offers longer read lengths than the second-generation sequencing technologies. It has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. Due to its extremely wide range of application areas, fast sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of subsequent analysis tools. Although there are several available simulators (e.g., PBSIM, SimLoRD and FASTQSim) that target the specific generation of PacBio libraries, the error rate of simulated sequences is not well matched to the quality value of raw PacBio datasets, especially for PacBio's continuous long reads (CLR). Results: By analyzing the characteristic features of CLR data from PacBio SMRT (single molecule real time) sequencing, we developed a new PacBio sequencing simulator (called NPBSS) for producing CLR reads. NPBSS simulator firstly samples the read sequences according to the read length logarithmic normal distribution, and choses different base quality values with different proportions. Then, NPBSS computes the overall error probability of each base in the read sequence with an empirical model, and calculates the deletion, substitution and insertion probabilities with the overall error probability to generate the PacBio CLR reads. Alignment results demonstrate that NPBSS fits the error rate of the PacBio CLR reads better than PBSIM and FASTQSim. In addition, the assembly results also show that simulated sequences of NPBSS are more like real PacBio CLR data. Conclusion: NPBSS simulator is convenient to use with efficient computation and flexible parameters setting. Its generating PacBio CLR reads are more like real PacBio datasets.

References Powered by Scopus

Basic local alignment search tool

78878Citations
N/AReaders
Get full text

Sequencing technologies the next generation

5476Citations
N/AReaders
Get full text

Canu: Scalable and accurate long-read assembly via adaptive κ-mer weighting and repeat separation

4809Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Correlation and association analyses in microbiome study integrating multiomics in health and disease

160Citations
N/AReaders
Get full text

PBSIM2: A simulator for long-read sequencers with a novel generative model of quality scores

78Citations
N/AReaders
Get full text

Twelve quick steps for genome assembly and annotation in the classroom

40Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wei, Z. G., & Zhang, S. W. (2018). NPBSS: A new PacBio sequencing simulator for generating the continuous long reads with an empirical model. BMC Bioinformatics, 19(1). https://doi.org/10.1186/s12859-018-2208-0

Readers over time

‘18‘19‘20‘21‘22‘23‘2406121824

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 29

63%

Researcher 9

20%

Professor / Associate Prof. 6

13%

Lecturer / Post doc 2

4%

Readers' Discipline

Tooltip

Biochemistry, Genetics and Molecular Bi... 25

53%

Agricultural and Biological Sciences 11

23%

Computer Science 7

15%

Engineering 4

9%

Article Metrics

Tooltip
Mentions
Blog Mentions: 1
Social Media
Shares, Likes & Comments: 1

Save time finding and organizing research with Mendeley

Sign up for free
0