A new GRASP metaheuristic for biclustering of gene expression data

Daniele Ferone; Angelo Facchiano; Anna Marabotti; Paola Festa

doi:10.7287/peerj.preprints.1679v1

A new GRASP metaheuristic for biclustering of gene expression data

Daniele Ferone¹, Angelo Facchiano², Anna Marabotti³, Paola Festa ¹

1 Department of Mathematics and Applications "R. Caccioppoli", University of Napoli FEDERICO II, Napoli, Italia

2 Institute of Food Science, CNR, Avellino, Italia

3 Department of Chemistry and Biology, University of Salerno, Fisciano (SA), Italia

DOI: 10.7287/peerj.preprints.1679v1

Published: 2016-01-26
Accepted: 2016-01-26

Subject Areas: Bioinformatics, Computational Biology, Scientific Computing and Simulation
Keywords: Computational Biology, GRASP metaheuristic, Gene expression data, Combinatorial optimization

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Ferone D, Facchiano A, Marabotti A, Festa P. 2016. A new GRASP metaheuristic for biclustering of gene expression data. PeerJ PrePrints 4:e1679v1 https://doi.org/10.7287/peerj.preprints.1679v1

Abstract

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

Author Comment

This is an abstract of the presentation at the BBCC2015 conference.