Finding biologically significant biclusters: a new function for co-expression evaluation
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology
- Keywords
- biclustering, gene expression, DNA microarray data, shifting and scaling patterns, biologically significant, genetic algorithms
- Copyright
- © 2017 Luna-Taylor et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Finding biologically significant biclusters: a new function for co-expression evaluation. PeerJ Preprints 5:e3110v2 https://doi.org/10.7287/peerj.preprints.3110v2
Abstract
Analysis of DNA microarray data has been very useful for experimental molecular biology, as it provides unprecedented opportunities to study a wide variety of biological processes. As a part of this analysis, biclustering has been consolidated as one of the first steps in the discovery of new knowledge. Biclustering consists in identifying clusters of genes that present coherent behavior patterns for a subset of experimental conditions. The measure to assess this consistency is a key factor in the quality of discovered biclusters. In this paper, we propose a new function (VF) to evaluate the coherence of biclusters. This function recognizes shifting, and positive and negative scaling patterns, more efficiently than well-known reported functions with a similar purpose. Also, the VF function identifies positive and negative scaling subpatterns, which may be of biological interest and have not previously been discussed in the literature. To assess the performance of the VF function, a biclustering genetic algorithm (BGA_VF) was also designed, and tested on both synthetic and real data. The results show that the BGA_VF algorithm obtains high percentages of significant biclusters and recognizes all the analyzed combinations of coherence patterns.
Author Comment
In some parts of the document, including the figures, the algorithm and the appendices, we had used the variable p instead of r. This was corrected.
Supplemental Information
Code that implements the concept
The algorithm designed in this project was implemented in Microsoft Visual C# 2012
Results obtained from the Yeast dataset
Results obtained by the algorithm from the Yeast dataset, including execution parameters, execution time and best discovered biclusters.
Statistical significance for the discovered biclusters from the Yeast dataset
Evaluation of statistical significance, applying the AGO tool, of the obtained biclusters by the algorithm from the Yeast dataset.
Statistical significance for the discovered biclusters without overlap from the Yeast dataset
Evaluation of the statistical significance of biclusters discovered from the Yeast data, after filtering biclusters that do not overlap more than 25%.
Results obtained from the Steminal dataset
Results obtained by the algorithm from the Steminal dataset, including execution parameters, execution time and best discovered biclusters.
Statistical significance for the discovered biclusters from the Steminal dataset
Evaluation of statistical significance, applying the software g:Profiler with the Bonferroni correction, of the obtained biclusters by the algorithm from the Steminal dataset.
Results obtained from the Leukemia dataset
Results obtained by the algorithm from the Leukemia dataset, including execution parameters, execution time and best discovered biclusters.
Statistical significance for the discovered biclusters from the Leukemia dataset
Evaluation of statistical significance, applying the software g:Profiler with the Bonferroni correction, of the obtained biclusters by the algorithm from the Leukemia dataset.