This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Ahrens M, Turewicz M, Marcus K, Meyer HE, May C, Eisenacher M, Rahnenführer J.2015. FSOL - a workflow for the detection of patient subgroups and affected molecular features in high-throughput omics data. PeerJ PrePrints3:e1305v1https://doi.org/10.7287/peerj.preprints.1305v1
In personalized medicine, one major goal is the identification of yet unknown patient subgroups with specific gene or protein expression. Different subgroups can indicate different molecular subtypes of a disease. These subtypes might correlate with disease progression, prognosis or therapy response, and the subgroup-specific genes or proteins are potential drug targets. Using high-throughput molecular data, the aim is to characterize the patient subgroup by identifying both the set of samples that shows a distinct expression pattern as well as the set of features that are affected. We present the new workflow FSOL for the identification of patient subgroups from two sample comparisons (e.g. healthy vs. diseased). First, a pre-filtering based on the univariate score FisherSum (FS) is applied to assess subgroup-specific expression of the features. FS has been shown to outperform competing methods in several settings. Second, the selected features are compared regarding the samples that form the affected subgroup. This step uses the OrderedList (OL) method that was originally developed for the comparison of result lists from gene expression studies. We compare our workflow FSOL to a reference workflow based on biclustering using real world and simulated data. On a leukemia data set, a true biological subgroup can be detected with higher stability by FSOL. On simulated data, FSOL shows higher sensitivity and accuracy compared to biclustering especially for small to moderate differences. The exploratory approach FSOL may help in identifying yet unknown mechanisms in pathologic processes and may assist in the generation of new research hypotheses.
The two last authors, Martin Eisenacher and Jörg Rahnenführer, contributed equally. This work has been presented at the German Conference on Bioinformatics 2015.
Although no complete peer review has been done, article was reviewed by members of the program committee of the German Conference on Bioinformatics (2015) and the majority of their comments have been addressed.