Optimizing taxonomic classification of marker gene sequences

The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
Research School of Biology, Australian National University, Canberra, Australia
Departments of Pediatrics and Computer Science & Engineering, and Center for Microbiome Innovation, University of California, San Diego, La Jolla, California, United States
Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona, United States
DOI
10.7287/peerj.preprints.3208v1
Subject Areas
Bioinformatics, Microbiology, Taxonomy
Keywords
microbiome, marker-gene sequencing, taxonomy, sequence classification
Copyright
© 2017 Bokulich et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, Huttley GA, Caporaso JG. 2017. Optimizing taxonomic classification of marker gene sequences. PeerJ Preprints 5:e3208v1

Abstract

Background. Taxonomic classification of marker-gene sequences is an important step in microbiome analysis.

Results. We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based taxonomy classifiers that meet or exceed classification accuracy of existing methods. We evaluated and optimized several commonly used taxonomic classification methods (RDP, BLAST, BLAST+, UCLUST) and several new methods (a scikit-learn naive Bayes machine-learning classifier, and VSEARCH and SortMeRNA alignment-based methods).

Conclusions. Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make explicit recommendations regarding parameter choices for a range of standard operating conditions. q2-feature-classifier and our evaluation framework, tax-credit, are both free, open-source, BSD-licensed packages available on GitHub.

Author Comment

This pre-print describes the development, optimization, and benchmarking of marker-gene taxonomy classification methods. This manuscript has been submitted to a peer-reviewed journal.