A tool for the comparison of transcript differential expression analysis pipelines

Stefano Beretta; Yuri Pirola; Valeria Ranzani; Grazisa Rossetti; Raoul Bonnal; Raffaella Rizzi; Gianluca Della Vedova; Massimiliano Pagani; Paola Bonizzoni

doi:10.7287/peerj.preprints.2212v1

Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

NOT PEER-REVIEWED

"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

Bioinformatics Italian Society Meeting 2016 thumbnail

Highlighted in Bioinformatics Italian Society Meeting 2016

A tool for the comparison of transcript differential expression analysis pipelines

Stefano Beretta ¹, Yuri Pirola², Valeria Ranzani³, Grazisa Rossetti³, Raoul Bonnal³, Raffaella Rizzi¹, Gianluca Della Vedova⁴, Massimiliano Pagani³, Paola Bonizzoni¹

1 University of Milan - Bicocca, Milan, Italy

2 DISCo, University of Milan - Bicocca, Milan, Italy

3 Istituto Nazionale di Genetica Molecolare (NGM), Milan, Italy

4 Dipartimento di Informatica, Sistemistica e Comunicazione, Università di Milano-Bicocca, Italy

DOI: 10.7287/peerj.preprints.2212v1

Published: 2016-07-02
Accepted: 2016-07-02

Subject Areas: Bioinformatics, Computational Science
Keywords: Differential Expression, RNA-seq data, Transcript Analysis, Long non-coding RNA, Pipeline Comparisons

Copyright: © 2016 Beretta et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Beretta S, Pirola Y, Ranzani V, Rossetti G, Bonnal R, Rizzi R, Della Vedova G, Pagani M, Bonizzoni P. 2016. A tool for the comparison of transcript differential expression analysis pipelines. PeerJ Preprints 4:e2212v1 https://doi.org/10.7287/peerj.preprints.2212v1

Abstract

MOTIVATION

Long non-coding RNAs (lncRNAs) have recently gained interest, especially for their involvement in controlling several cell processes, but a full understanding of their role is lacking. Differential Expression (DE) analysis is one of the most important tasks in the analysis of RNA-seq data, since it potentially points out genes involved in the regulation of the condition under study. However, a classical analysis at gene level may disregard the role of Alternative Splicing (AS) in regulating cell conditions. This is the case, for example, when a given gene is expressed in all the different conditions, but the expressed isoform is significantly diverse in the different conditions (that is an isoform switch). A transcript level analysis may better shed light on this case, especially in studies having as goal, for example, a better understanding of the behavior of lncRNAs in lymphocytes T cells, which are fundamental in studies of specific diseases, such as cancer. After Cufflinks/Cuffdiff, several approaches for DE analysis at isoform/transcript level have been proposed. However, their results are often sensitive to the upstream analysis such as read mapping, transcript reconstruction and quantification, and it is often hard to choose "a priori" the most appropriate combination of tools. This work presents a tool for assisting the user in this choice, and poses the bases for a study devoted to the characterization of lncRNAs and the identification of of isoform switch events. Our tool includes a framework for the description and the execution of a set of DE pipelines over the same input dataset, as well a set of tools for reconciling and comparing the results.

METHOD

We designed an automated and easily customizable tool which is able to execute a set of existing pipelines for DE analysis at transcript level starting from RNA-seq data. Our method is built upon Snakemake, a workflow management system, with the specific goal of reducing the complexity of creating workflows. This approach guarantees that the experimentation is fully replicable and easy to customize. Each considered pipeline is structured in three steps: (i) transcript assembly, (ii) quantification, and (iii) DE analysis. By default, our tool builds and compares 9 different pipelines, each taking as input the same set of RNA-seq reads, obtained by combining different state-of-the-art methods to perform the transcript assembly (TA step) with different state-of-the-art methods to perform quantification and differential expression analysis (Q+DE step). More precisely, the 9 pipelines are obtained by combining two tools (Cufflinks and StringTie) and a Reference Annotation (Ensembl annotated transcripts) for the TA step, with three tools (Cuffquant+Cuffdiff, StringTie-B+Ballgown and Kallisto+Sleuth) for the Q+DE step.

Abstract truncated at 3,000 characters - the full version is available in the pdf file

Author Comment

This is an abstract which has been accepted for the BITS2016 Meeting.

Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)

By posting this you agree to PeerJ's commenting policies

Questions

Ask a question

Learn more about Q&A

Links

Add a link

Content

Alert

Just enter your email

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article