Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly

19Citations
Citations of this article
68Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Next-generation sequencers such as Illumina can now produce reads up to 300 bp with high throughput, which is attractive for genome assembly. A first step in genome assembly is to computationally correct sequencing errors. However, correcting all errors in these longer reads is challenging. Here, we show that reads with remaining errors after correction often overlap repeats, where short erroneous k-mers occur in other copies of the repeat. We developed an iterative error correction pipeline that runs the previously published String Graph Assembler (SGA) in multiple rounds of k-mer-based correction with an increasing k-mer size, followed by a final round of overlap-based correction. By combining the advantages of small and large k-mers, this approach corrects more errors in repeats and minimizes the total amount of erroneous reads. We show that higher read accuracy increases contig lengths two to three times. We provide SGA-Iteratively Correcting Errors (https:// github.com/hillerlab/IterativeErrorCorrection/) that implements iterative error correction by using modules from SGA.

References Powered by Scopus

Fast gapped-read alignment with Bowtie 2

36326Citations
N/AReaders
Get full text

BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs

8379Citations
N/AReaders
Get full text

Repetitive DNA and next-generation sequencing: Computational challenges and solutions

1234Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Illuminating an Ecological Blackbox: Using High Throughput Sequencing to Characterize the Plant Virome Across Scales

72Citations
N/AReaders
Get full text

A benchmark study of k-mer counting methods for high-throughput sequencing

72Citations
N/AReaders
Get full text

Alignment-free inference of hierarchical and reticulate phylogenomic relationships

64Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Sameith, K., Roscito, J. G., & Hiller, M. (2017). Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly. Briefings in Bioinformatics, 18(1), 1–8. https://doi.org/10.1093/bib/bbw003

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 29

57%

Researcher 15

29%

Professor / Associate Prof. 7

14%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 34

58%

Computer Science 13

22%

Biochemistry, Genetics and Molecular Bi... 9

15%

Engineering 3

5%

Save time finding and organizing research with Mendeley

Sign up for free