Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads

Chengxi Ye; Sam Ma

doi:10.7287/peerj.preprints.1401v1

Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads

Chengxi Ye ¹, Sam Ma ²

September 27, 2015

A peer-reviewed article of this Preprint also exists.

View peer-reviewed version

Author and article information

Abstract

Motivation: The third generation sequencing (3GS) technology generates long sequences of thousands of bases. However, its error rates are estimated in the range of 15-40%, much higher than the previous generation (approximately 1%). Fundamental tasks such as genome assembly and variant calling require us to obtain high quality sequences from these long erroneous sequences. Results: In this paper we describe a versatile and efficient linear complexity consensus algorithm Sparc that builds a sparse k-mer graph using a collection of sequences from the same genomic region. The heaviest path approximates the most likely genome sequence (consensus) and is sought through a sparsity-induced reweighted graph. Experiments show that our algorithm can efficiently provide high-quality consensus sequences with error rate <0.5% using both PacBio and Oxford Nanopore sequencing technologies. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, uses 80% less memory, and is 5x faster, approximately. Availability: The source code is available for download at http://sourceforge.net/p/sparc-consensus/code/ and a testing dataset is available: https://www.dropbox.com/sh/trng8vdaeqywx1e/AAASJesLVAJZcbORkU9f4LuBa?dl=0 (Please copy the link to a browser to access if directly clicking the link fails)

Cite this as

Ye C, Ma S. 2015. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads. PeerJ PrePrints 3:e1401v1 https://doi.org/10.7287/peerj.preprints.1401v1

note This preprint is not peer-reviewed. You may wish to reference the subsequent peer-reviewed version of this article.

Author comment

This is a submission to PeerJ for review.

Sections

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Chengxi Ye conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Sam Ma conceived and designed the experiments, wrote the paper, reviewed drafts of the paper.

Funding

The research received funding from the following sources: NSFC (Grant No: 61175071 & 71473243) and “Exceptional Scientists Program of Yunnan Province, China.” The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads

Author and article information

Abstract

Author comment

Sections

Additional Information

Competing Interests

Author Contributions

Funding

Add your feedback

Publish for free

Five new journals in Chemistry

Sections

Additional Information

Competing Interests

Author Contributions

Funding

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article

Publish for free

Five new journals in Chemistry