Finding biologically significant biclusters: a new function for co-expression evaluation

Systems and Computation, Technological Institute of La Paz, La Paz, Baja California Sur, Mexico
Computer Sciences, Center for Scientific Research and Higher Education of Ensenada, Ensenada, Baja California, Mexico
DOI
10.7287/peerj.preprints.3110v2
Subject Areas
Bioinformatics, Computational Biology
Keywords
biclustering, gene expression, DNA microarray data, shifting and scaling patterns, biologically significant, genetic algorithms
Copyright
© 2017 Luna-Taylor et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Luna-Taylor JE, Brizuela CA, Alvarado IN. 2017. Finding biologically significant biclusters: a new function for co-expression evaluation. PeerJ Preprints 5:e3110v2

Abstract

Analysis of DNA microarray data has been very useful for experimental molecular biology, as it provides unprecedented opportunities to study a wide variety of biological processes. As a part of this analysis, biclustering has been consolidated as one of the first steps in the discovery of new knowledge. Biclustering consists in identifying clusters of genes that present coherent behavior patterns for a subset of experimental conditions. The measure to assess this consistency is a key factor in the quality of discovered biclusters. In this paper, we propose a new function (VF) to evaluate the coherence of biclusters. This function recognizes shifting, and positive and negative scaling patterns, more efficiently than well-known reported functions with a similar purpose. Also, the VF function identifies positive and negative scaling subpatterns, which may be of biological interest and have not previously been discussed in the literature. To assess the performance of the VF function, a biclustering genetic algorithm (BGA_VF) was also designed, and tested on both synthetic and real data. The results show that the BGA_VF algorithm obtains high percentages of significant biclusters and recognizes all the analyzed combinations of coherence patterns.

Author Comment

In some parts of the document, including the figures, the algorithm and the appendices, we had used the variable p instead of r. This was corrected.

Supplemental Information

Code that implements the concept

The algorithm designed in this project was implemented in Microsoft Visual C# 2012

DOI: 10.7287/peerj.preprints.3110v2/supp-1

Results obtained from the Yeast dataset

Results obtained by the algorithm from the Yeast dataset, including execution parameters, execution time and best discovered biclusters.

DOI: 10.7287/peerj.preprints.3110v2/supp-2

Statistical significance for the discovered biclusters from the Yeast dataset

Evaluation of statistical significance, applying the AGO tool, of the obtained biclusters by the algorithm from the Yeast dataset.

DOI: 10.7287/peerj.preprints.3110v2/supp-3

Statistical significance for the discovered biclusters without overlap from the Yeast dataset

Evaluation of the statistical significance of biclusters discovered from the Yeast data, after filtering biclusters that do not overlap more than 25%.

DOI: 10.7287/peerj.preprints.3110v2/supp-4

Results obtained from the Steminal dataset

Results obtained by the algorithm from the Steminal dataset, including execution parameters, execution time and best discovered biclusters.

DOI: 10.7287/peerj.preprints.3110v2/supp-5

Statistical significance for the discovered biclusters from the Steminal dataset

Evaluation of statistical significance, applying the software g:Profiler with the Bonferroni correction, of the obtained biclusters by the algorithm from the Steminal dataset.

DOI: 10.7287/peerj.preprints.3110v2/supp-6

Results obtained from the Leukemia dataset

Results obtained by the algorithm from the Leukemia dataset, including execution parameters, execution time and best discovered biclusters.

DOI: 10.7287/peerj.preprints.3110v2/supp-7

Statistical significance for the discovered biclusters from the Leukemia dataset

Evaluation of statistical significance, applying the software g:Profiler with the Bonferroni correction, of the obtained biclusters by the algorithm from the Leukemia dataset.

DOI: 10.7287/peerj.preprints.3110v2/supp-8