Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads
A peer-reviewed article of this Preprint also exists.
Author and article information
Abstract
Motivation: The third generation sequencing (3GS) technology generates long sequences of thousands of bases. However, its error rates are estimated in the range of 15-40%, much higher than the previous generation (approximately 1%). Fundamental tasks such as genome assembly and variant calling require us to obtain high quality sequences from these long erroneous sequences. Results: In this paper we describe a versatile and efficient linear complexity consensus algorithm Sparc that builds a sparse k-mer graph using a collection of sequences from the same genomic region. The heaviest path approximates the most likely genome sequence (consensus) and is sought through a sparsity-induced reweighted graph. Experiments show that our algorithm can efficiently provide high-quality consensus sequences with error rate <0.5% using both PacBio and Oxford Nanopore sequencing technologies. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, uses 80% less memory, and is 5x faster, approximately. Availability: The source code is available for download at http://sourceforge.net/p/sparc-consensus/code/ and a testing dataset is available: https://www.dropbox.com/sh/trng8vdaeqywx1e/AAASJesLVAJZcbORkU9f4LuBa?dl=0 (Please copy the link to a browser to access if directly clicking the link fails)
Cite this as
2015. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads. PeerJ PrePrints 3:e1401v1 https://doi.org/10.7287/peerj.preprints.1401v1Author comment
This is a submission to PeerJ for review.
Sections
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Chengxi Ye conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Sam Ma conceived and designed the experiments, wrote the paper, reviewed drafts of the paper.
Funding
The research received funding from the following sources: NSFC (Grant No: 61175071 & 71473243) and “Exceptional Scientists Program of Yunnan Province, China.” The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.