Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods

9Citations
Citations of this article
51Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-Throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.

References Powered by Scopus

From fastQ data to high-confidence variant calls: The genome analysis toolkit best practices pipeline

4622Citations
N/AReaders
Get full text

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

4280Citations
N/AReaders
Get full text

Pindel: A pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads

1565Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Best practices for variant calling in clinical sequencing

206Citations
N/AReaders
Get full text

A robust benchmark for detection of germline large deletions and insertions

201Citations
N/AReaders
Get full text

An open resource for accurately benchmarking small variant and reference calls

195Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Mu, J. C., Tootoonchi Afshar, P., Mohiyuddin, M., Chen, X., Li, J., Bani Asadi, N., … Lam, H. Y. K. (2015). Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods. Scientific Reports, 5. https://doi.org/10.1038/srep14493

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 18

45%

Researcher 18

45%

Professor / Associate Prof. 4

10%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 16

40%

Biochemistry, Genetics and Molecular Bi... 11

28%

Computer Science 10

25%

Engineering 3

8%

Article Metrics

Tooltip
Social Media
Shares, Likes & Comments: 20

Save time finding and organizing research with Mendeley

Sign up for free