RNA sequencing data: hitchhiker's guide to expression analysis

Bioinformatics Institute and Department of Applied Mathematics, Computer Science and Statistics, University of Ghent, Ghent, Belgium
Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
Department of Biostatistics and Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
Department of Computer Science, Stony Brook University, Stony Brook, United States
DOI
10.7287/peerj.preprints.27283v2
Subject Areas
Bioinformatics, Computational Biology, Genomics, Data Science
Keywords
RNA sequencing, Statistical bioinformatics, Gene expression, Differential expression, Isoform quantification, Differential splicing
Copyright
© 2018 Van Den Berge et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Van Den Berge K, Hembach K, Soneson C, Tiberi S, Clement L, Love MI, Patro R, Robinson M. 2018. RNA sequencing data: hitchhiker's guide to expression analysis. PeerJ Preprints 6:e27283v2

Abstract

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Author Comment

Overall, the manuscript has received a minor revision. We have added a few important references, re-worded some text based on feedback and shorten the text in a few places.