TED toolkit: a comprehensive approach for convenient transcriptomic profiling as a clinically-oriented application

Hunter College of The City University of New York, Weill Cornell Medicine - Belfer Research Building, New York, NY, USA
Department of Biological Sciences, City University of New York, Hunter College, New York, New York, United States
Joan and Sanford I. Weill Department of Medicine, Weill Medical College of Cornell University, New York, New York, United States
The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, George Washington University, Washington, DC, United States
The McCormick Genomic and Proteomic Center, George Washington University, Washington, DC, United States
Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York, United States
DOI
10.7287/peerj.preprints.3385v1
Subject Areas
Bioinformatics, Genomics, Translational Medicine, Computational Science, Data Science
Keywords
Transcriptome, Bioinformatics, Galaxy, RNA-sequencing, Workflow, Data Analysis
Copyright
© 2017 Ali et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Ali T, Kim B, Lijeron C, Ogunwobi OO, Mazumder R, Krampis K. 2017. TED toolkit: a comprehensive approach for convenient transcriptomic profiling as a clinically-oriented application. PeerJ Preprints 5:e3385v1

Abstract

In translational medicine, the technology of RNA sequencing (RNA-seq) continues to prove powerful, and transforming the RNA-seq data into biological insights has become increasingly imperative. We present the Transcriptomics profiler for Easy Discovery (TED) toolkit, a comprehensive approach to processing and analyzing RNA-seq data. TED is divided into three major modules: data quality control, transcriptome data analysis, and data discovery, with eleven pipelines in total. These pipelines perform the preliminary steps from assessing and correcting the quality of the RNA-seq data, to the simultaneous analysis of five transcriptomic features (differentially expressed coding, non-coding, novel isoform genes, gene fusions, alternative splicing events, genetic variants of somatic and germline mutations) and ultimately translating the RNA-seq analysis findings into actionable, clinically-relevant reports. TED was evaluated using previously published prostate cancer transcriptome data where we observed previously studied outcomes, and also created a knowledge database of highly-integrated, biologically relevant reports demonstrating that it is well-positioned for clinical applications. TED is implemented on an instance of the Galaxy platform (Galaxy page: http://galaxy.hunter.cuny.edu/u/bioitcore/p/transcriptomics-profiler-for-easy-discovery-ted-toolkit , Documentation Manual: http://ted.readthedocs.io/en/latest/index.html ) as intuitive and reproducible pipelines providing a manageable strategy for conducting substantial transcriptome analysis in a routine and sustainable fashion for bioinformatics researchers and clinicians alike.

Author Comment

This is a preprint submission to PeerJ Preprints.