A new GRASP metaheuristic for biclustering of gene expression data

Department of Mathematics and Applications "R. Caccioppoli", University of Napoli FEDERICO II, Napoli, Italia
Institute of Food Science, CNR, Avellino, Italia
Department of Chemistry and Biology, University of Salerno, Fisciano (SA), Italia
DOI
10.7287/peerj.preprints.1679v1
Subject Areas
Bioinformatics, Computational Biology, Scientific Computing and Simulation
Keywords
Computational Biology, GRASP metaheuristic, Gene expression data, Combinatorial optimization
Copyright
© 2016 Ferone et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Ferone D, Facchiano A, Marabotti A, Festa P. 2016. A new GRASP metaheuristic for biclustering of gene expression data. PeerJ PrePrints 4:e1679v1

Abstract

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

Author Comment

This is an abstract of the presentation at the BBCC2015 conference.