Venomix: A simple bioinformatic pipeline for identifying and characterizing toxin gene candidates from transcriptomic data
- Published
- Accepted
- Subject Areas
- Bioinformatics, Data Mining and Machine Learning
- Keywords
- Venom, transcriptome, Python, Transdecoder, SignalP, protein, Transcriptome
- Copyright
- © 2018 Macrander et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. Venomix: A simple bioinformatic pipeline for identifying and characterizing toxin gene candidates from transcriptomic data. PeerJ Preprints 6:e26733v1 https://doi.org/10.7287/peerj.preprints.26733v1
Abstract
The advent of next-generation sequencing has resulted in transcriptome-based approaches to investigate functionally significant biological components in a variety of non-model organism. This has resulted in the area of “venomics”: a rapidly growing field using combined transcriptomic and proteomic datasets to characterize toxin diversity in a variety of venomous taxa. Ultimately, the transcriptomic portion of these analyses follows very similar pathways after transcriptome assembly: candidate toxin identification using BLAST, expression level screening, protein sequence alignment, gene tree reconstruction, and characterization of potential toxin function. Here we describe the python package Venomix, which streamlines these processes using commonly used bioinformatic tools along with a public, annotated database comprised of characterized venom proteins. In this study, we use the Venomix pipeline to characterize candidate venom diversity in four phylogenetically distinct organisms, a cone snail (Conidae; Conus sponsalis), a snake (Viperidae; Echis coloratus), an ant (Formicidae; Tetramorium bicarinatum), and a scorpion (Scorpionidae; Urodacus yaschenkoi). Data on these organisms was sampled from public databases and thus different approaches to either transcriptome assembly, toxin identification, or gene expression quantification was used for each. Of the organisms used in our analysis, Venomix recovered numerically more candidate toxin transcripts for three of the four transcriptomes than the original analyses. In four of four organisms we identified new toxin candidates that were not reported in the original analysis. In summary, we show that the Venomix package is a useful tool to identify and characterize the diversity of toxin-like transcripts. Venomix is available at: https://bitbucket.org/JasonMacrander/Venomix/
Author Comment
This is a submission to PeerJ for review.
Supplemental Information
Venomix outputs for C. sponsalis used to write this manuscript
Venomix outputs for C. sponsalis used to write this manuscript.
Venomix outputs for E. coloratus used to write this manuscript
Venomix outputs for E. coloratus used to write this manuscript.
Venomix outputs for T. bicarinatum used to write this manuscript
Venomix outputs for T. bicarinatum used to write this manuscript.
Venomix outputs for original T. bicarinatum assembly used to write this manuscript
Venomix outputs for original T. bicarinatum assembly used to write this manuscript.
Venomix outputs for U. yaschenkoi used to write this manuscript
Venomix outputs for U. yaschenkoi used to write this manuscript.