TaxaSE: Exploiting evolutionary conservation within 16S rDNA sequences for enhanced taxonomic annotation
- Published
- Accepted
- Subject Areas
- Biodiversity, Bioinformatics, Computational Biology, Microbiology
- Keywords
- Taxonomic Annotation, Shannon Entropy, Pipeline, 16S rDNA, Microbial, Bacterial, Next Generation Sequencing, QIIME
- Copyright
- © 2017 Ijaz et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. TaxaSE: Exploiting evolutionary conservation within 16S rDNA sequences for enhanced taxonomic annotation. PeerJ Preprints 5:e2941v1 https://doi.org/10.7287/peerj.preprints.2941v1
Abstract
Amplicon based taxonomic analysis, which determines the presence of microbial taxa in different environments on the basis of marker gene annotations, often uses percentage identity as the main metric to determine sequence similarity against databases. These data are then used to study the distribution of biodiversity as well as response of microbial communities to environmental conditions. However the 16S rRNA gene displays varying degrees of sequence conservation along its length and percentage identity does not fully utilize this information. Additionally, the prevalent usage of Operational Taxonomic Unit, or OTUs is not without its own issues and may lead to a reduction in annotation capability of the system. Hence a novel approach to taxonomic annotation is needed. Here we introduce a new taxonomic annotation pipeline, TaxaSE, which utilizes Shannon entropy to quantify evolutionary conservation within 16S rDNA sequences for enhanced taxonomic annotations. Furthermore, the system is capable of annotation of individual sequences in order to improve fine grain taxonomic annotations. We present both in-silico comparison of the new similarity metric with percentage identity, as well as comparison with the popular QIIME pipeline. The results demonstrate the new similarity metric achieves better performance especially at lower taxa levels. Furthermore, the pipeline is able to extract more fine grain taxonomic annotations compared to QIIME. These exhibit not only the effectiveness of the new pipeline but also highlight the need to shift away from both percentage identity and OTU based approaches for ecological projects.
Author Comment
This is the first version of the TaxaSE pipeline, developed primarily in Java 8 and UNIX bash scripts.