TaxaSE: Exploiting evolutionary conservation within 16S rDNA sequences for enhanced taxonomic annotation
Author and article information
Abstract
Amplicon based taxonomic analysis, which determines the presence of microbial taxa in different environments on the basis of marker gene annotations, often uses percentage identity as the main metric to determine sequence similarity against databases. These data are then used to study the distribution of biodiversity as well as response of microbial communities to environmental conditions. However the 16S rRNA gene displays varying degrees of sequence conservation along its length and percentage identity does not fully utilize this information. Additionally, the prevalent usage of Operational Taxonomic Unit, or OTUs is not without its own issues and may lead to a reduction in annotation capability of the system. Hence a novel approach to taxonomic annotation is needed. Here we introduce a new taxonomic annotation pipeline, TaxaSE, which utilizes Shannon entropy to quantify evolutionary conservation within 16S rDNA sequences for enhanced taxonomic annotations. Furthermore, the system is capable of annotation of individual sequences in order to improve fine grain taxonomic annotations. We present both in-silico comparison of the new similarity metric with percentage identity, as well as comparison with the popular QIIME pipeline. The results demonstrate the new similarity metric achieves better performance especially at lower taxa levels. Furthermore, the pipeline is able to extract more fine grain taxonomic annotations compared to QIIME. These exhibit not only the effectiveness of the new pipeline but also highlight the need to shift away from both percentage identity and OTU based approaches for ecological projects.
Cite this as
2017. TaxaSE: Exploiting evolutionary conservation within 16S rDNA sequences for enhanced taxonomic annotation. PeerJ Preprints 5:e2941v1 https://doi.org/10.7287/peerj.preprints.2941v1Author comment
This is the first version of the TaxaSE pipeline, developed primarily in Java 8 and UNIX bash scripts.
Sections
Supplemental Information
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Ali Z Ijaz conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables.
Thomas Jeffries conceived and designed the experiments, reviewed drafts of the paper.
Christopher Quince conceived and designed the experiments, reviewed drafts of the paper.
Kelly Hamonts sugarcane sampling and sequencing.
Brajesh Singh conceived and designed the experiments, reviewed drafts of the paper.
Data Deposition
The following information was supplied regarding data availability:
Dropbox location for TaxaSE pipeline and associated data
URL: https://www.dropbox.com/sh/rv7p9vd6ci5qf5x/AAD0gFwPnnpHeFk4FK8v-cl4a?dl=0
Funding
The authors received no funding for this work.