This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Background. Taxonomic classification of marker-gene sequences is an important step in microbiome analysis.
Results. We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based taxonomy classifiers that meet or exceed classification accuracy of existing methods. We evaluated and optimized several commonly used taxonomic classification methods (RDP, BLAST, BLAST+, UCLUST) and several new methods (a scikit-learn naive Bayes machine-learning classifier, and VSEARCH and SortMeRNA alignment-based methods).
Conclusions. Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make explicit recommendations regarding parameter choices for a range of standard operating conditions. q2-feature-classifier and our evaluation framework, tax-credit, are both free, open-source, BSD-licensed packages available on GitHub.
This pre-print describes the development, optimization, and benchmarking of marker-gene taxonomy classification methods. This manuscript has been submitted to a peer-reviewed journal.