Crossing the streams: a framework for streaming analysis of short DNA sequencing reads

Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA
Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
Population Health and Reproduction, University of California, Davis, Davis, California, USA
DOI
10.7287/peerj.preprints.890v1
Subject Areas
Bioinformatics, Computational Biology
Keywords
Bioinformatics, streaming algorithm, k-mer counting
Copyright
© 2015 Zhang et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Zhang Q, Awad S, Brown CT. 2015. Crossing the streams: a framework for streaming analysis of short DNA sequencing reads. PeerJ PrePrints 3:e890v1

Abstract

We present a semi-streaming algorithm for k-mer spectral analysis of DNA sequencing reads, together with a derivative approach that is fully streaming. The approach can also be applied to genomic, transcriptomic, and metagenomic data sets. We develop two tools for short-read analysis based on these approaches, a method for semi-streaming k-mer-based error trimming, and a method for the analysis of error profiles in short reads using a streaming sublinear approach. These tools are implemented in the khmer software package, which is freely available under the BSD License at github.com/ged-lab/khmer/.

Author Comment

This is a submission to PeerJ Computer Science for review.