MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data
1
Biological Sciences Division, Pacific Northwest National Laboratory (PNNL), Richland, Washington, USA
2
Infomation technology, High Performance Computing (HPC) and Cloud Services, Pacific Northwest National Laboratory (PNNL), Richland, Washington, USA
3
Environmental and Molecular Sciences Laboratory (EMSL), Pacific Northwest National Laboratory (PNNL), Richland, Washington, USA
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Data Science, Scientific Computing and Simulation, Software Engineering
- Keywords
- K-mer counting, Database-independent property analysis (DIPA), Metagenomic analysis, Metatranscriptomic analysis, Diversity-estimation
- Copyright
- © 2017 White III et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data. PeerJ Preprints 5:e2825v1 https://doi.org/10.7287/peerj.preprints.2825v1
Abstract
MerCat (“ Mer - Cat enate”) is a parallel, highly scalable and modular property software package for robust analysis of features in next-generation sequencing data. Using assembled contigs and raw sequence reads from any platform as input, MerCat performs k-mer counting of any length k, resulting in feature abundance counts tables. MerCat allows for direct analysis of data properties without reference sequence database dependency commonly used by search tools such as BLAST for compositional analysis of whole community shotgun sequencing (e.g., metagenomes and metatranscriptomes).
Author Comment
Initial version of our manuscript submitted for peer review to Bioinformatics