Cause of gene tree discord? Distinguishing incomplete lineage sorting and lateral gene transfer in phylogenetics
- Published
- Accepted
- Subject Areas
- Evolutionary Studies, Data Mining and Machine Learning
- Keywords
- phylogenomics, incomplete lineage sorting, lateral gene transfer, machine learning
- Copyright
- © 2017 Huang et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Cause of gene tree discord? Distinguishing incomplete lineage sorting and lateral gene transfer in phylogenetics. PeerJ Preprints 5:e3489v1 https://doi.org/10.7287/peerj.preprints.3489v1
Abstract
Despite recent efforts that have produced data sets with hundreds and thousands of gene regions to resolve regions of the tree of life, recalcitrant nodes persist and disagreement among genes as well as disagreement between individual gene trees and species trees are common. There are a number of evolutionary processes that contribute to these conflicts between gene trees and species trees, including deep coalescence (lineage sorting), horizontal gene transfer or hybridization, etc. While for some of these processes, we have very powerful and sophisticated models that uses the conflict in the gene trees as information that contributes materially to correctly inferring the species tree, such as the multispecies coalescent (MSC). However, usage of these models require a priori recognition of relevant processes, which is often unknown for empirical dataset. Here we propose a new perspective to not only identify the cause of discord among gene trees, but also use it to classify loci by the underlying cause of discord to identify subsets of loci for analysis with the goal of improving phylogenetic accuracy. This approach differs fundamentally from all other criteria used for making decisions about which loci to include in a phylogenetic analysis. In particular, the choice of loci in this framework is based on identifying those that reflect descent from a common ancestor (as opposed to other processes), and thereby can minimize problems with model misspecification. We present preliminary results that demonstrate the potential of this framework in distinguishing the lateral gene transfer (LGT) from incomplete lineage sorting (ILS) process, as implemented in a new software package CLASSIPHY, while also highlighting areas for further development and testing. We discussed why such methods (i) are critical to improving phylogenetic accuracy with the increased complexity of genomic/transcriptomic datasets, and that (ii) characterizing patterns of discordance and the contribution of different processes to this discordance is itself of interest for generating hypotheses about the role of lateral gene transfer, gene duplication, and incomplete lineage sorting during the divergence of different taxa.
Author Comment
This is a submission to PeerJ for review.