GenomePeek - An online tool for prokaryotic and metagenome analysis

Department of Computer Science, San Diego State University, San Diego, CA, USA
Department of Biology, San Diego State University, San Diego, CA, USA
Computational Sciences Research Center, San Diego State University, San Diego, CA, USA
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
DOI
10.7287/peerj.preprints.525v1
Subject Areas
Bioinformatics, Computational Biology, Genomics, Microbiology
Keywords
genome, metagenome, taxonomic, bacteria, sequencing, population, distribution, archaea, abundance
Copyright
© 2014 McNair et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
McNair K, Edwards R. 2014. GenomePeek - An online tool for prokaryotic and metagenome analysis. PeerJ PrePrints 2:e525v1

Abstract

As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach; where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

Author Comment

This is a resubmission to PeerJ.

Supplemental Information

Supplementary Tables

The abundace data and run times of the various sequence files, by the four different analysis tools.

DOI: 10.7287/peerj.preprints.525v1/supp-1