Collecting reliable clades using the Greedy Strict Consensus Merger

Markus Fleischauer; Sebastian Böcker

doi:10.7287/peerj.preprints.1297v3

Collecting reliable clades using the Greedy Strict Consensus Merger

Markus Fleischauer , Sebastian Böcker

Lehrstuhl für Bioinformatik, Friedrich-Schiller Universität Jena, Jena, Thüringen, Germany

DOI: 10.7287/peerj.preprints.1297v3

Published: 2015-09-10
Accepted: 2015-09-10

Subject Areas: Bioinformatics, Computational Biology, Algorithms and Analysis of Algorithms
Keywords: Consensus, Supertree, Supermatrix, Divide and Conquer, FlipCut, Phylogeny

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Fleischauer M, Böcker S. 2015. Collecting reliable clades using the Greedy Strict Consensus Merger. PeerJ PrePrints 3:e1297v3 https://doi.org/10.7287/peerj.preprints.1297v3

Abstract

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well known Matrix Representation with Parsimony, others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the final supertree. We find this modifications to increase the number of true positive clades by 16% while decreasing the number of false positive clades by 3% compared to the currently used Overlap scoring.

Author Comment

This work has been presented at the German Conference on Bioinformatics 2015. Minor edits have been made (Abstract and Figure 4 caption). Appendix with results for the larger data set has been added.