Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing

Sanchit Misra; Ankit Agrawal; Wei Keng Liao; Alok Choudhary

Journal ArticleOPEN ACCESS

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing

Bioinformatics (2011) 27(2) 189-195

DOI: 10.1093/bioinformatics/btq648

21Citations

140Readers

Abstract

Motivation: Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts. Results: We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (<5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2. © The Author 2010. Published by Oxford University Press. All rights reserved.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Misra, S., Agrawal, A., Liao, W. K., & Choudhary, A. (2011). Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing. Bioinformatics, 27(2), 189–195. https://doi.org/10.1093/bioinformatics/btq648

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 52

45%

Researcher 41

36%

Professor / Associate Prof. 22

19%

Readers' Discipline

Agricultural and Biological Sciences 84

70%

Computer Science 19

16%

Biochemistry, Genetics and Molecular Bi... 11

Engineering 6

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing

Abstract

References Powered by Scopus

Basic local alignment search tool

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

Improved tools for biological sequence comparison.

Cited by Powered by Scopus

The Subread aligner: Fast, accurate and scalable read mapping by seed-and-vote

Perspective: Materials informatics and big data: Realization of the "fourth paradigm" of science in materials science

Next generation sequencing and bioinformatic bottlenecks: The current state of metagenomic data analysis

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline