Crossing the streams: a framework for streaming analysis of short DNA sequencing reads
1 Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA
2 Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
3 Population Health and Reproduction, University of California, Davis, Davis, California, USA
- Subject Areas
- Bioinformatics, Computational Biology
- Bioinformatics, streaming algorithm, k-mer counting
- © 2015 Zhang et al.
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Crossing the streams: a framework for streaming analysis of short DNA sequencing reads. PeerJ PrePrints 3:e890v1 https://doi.org/10.7287/peerj.preprints.890v1
We present a semi-streaming algorithm for k-mer spectral analysis of DNA sequencing reads, together with a derivative approach that is fully streaming. The approach can also be applied to genomic, transcriptomic, and metagenomic data sets. We develop two tools for short-read analysis based on these approaches, a method for semi-streaming k-mer-based error trimming, and a method for the analysis of error profiles in short reads using a streaming sublinear approach. These tools are implemented in the khmer software package, which is freely available under the BSD License at github.com/ged-lab/khmer/.
This is a submission to PeerJ Computer Science for review.