Comprehensive comparison of large-scale tissue expression datasets

Alberto Santos; Kalliopi Tsafou; Christian Stolte; Sune Pletscher-Frankild; Seán I O’Donoghue; Lars Juhl Jensen

doi:10.7287/peerj.preprints.1072v1

Comprehensive comparison of large-scale tissue expression datasets

Alberto Santos ¹, Kalliopi Tsafou¹, Christian Stolte², Sune Pletscher-Frankild^1,3, Seán I O’Donoghue^4,5, Lars Juhl Jensen⁶

1 Cellular Network Biology Group, NNF Center for Protein Research, København Universitet, Copenhagen, Denmark

2 Digital Productivity, CSIRO, Sydney, NSW, Australia

3 Ferring Pharmaceuticals, Copenhagen, Denmark

4 Computational informatics, CSIRO, Sydney, Australia

5 Garvan Institute of Medical Research, Sydney, Australia

6 Cellular Network Biology Group, NNF Center for Protein Research, University of Copenhagen, Copenhagen, Denmark

DOI: 10.7287/peerj.preprints.1072v1

Published: 2015-05-14
Accepted: 2015-05-14

Subject Areas: Computational Biology, Genomics
Keywords: Immunohistochemistry, RNA sequencing, Tissue expression, Mass spectrometry, Microarrays, databases, tissue-specificity

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Santos A, Tsafou K, Stolte C, Pletscher-Frankild S, O’Donoghue SI, Jensen LJ. 2015. Comprehensive comparison of large-scale tissue expression datasets. PeerJ PrePrints 3:e1072v1 https://doi.org/10.7287/peerj.preprints.1072v1

Abstract

For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource ( http://tissues.jensenlab.org ), which makes all the scored and integrated data available through a single user-friendly web interface.

Author Comment

This is a submission to PeerJ for review.