MIPhy: Identify and quantify rapidly evolving members of large gene families
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Evolutionary Studies
- Keywords
- Phylogenetic clustering, Phylogenetic instability, Gene family evolution
- Copyright
- © 2018 Curran et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. MIPhy: Identify and quantify rapidly evolving members of large gene families. PeerJ Preprints 6:e26593v1 https://doi.org/10.7287/peerj.preprints.26593v1
Abstract
After transitioning to a new environment, species often exhibit rapid phenotypic innovation. One of the fastest mechanisms for this is duplication followed by specialization of existing genes, which leaves a phylogenetic signature of lineage-specific expansions and contractions. These can be identified by analyzing the gene family across several species and identifying patterns of gene duplication and loss that do not correlate with the known relationships between those species. This signature, termed phylogenetic instability, has been previously linked to adaptations that change the way an organism samples and responds to its environment; conversely, low phylogenetic instability has been previously linked to proteins with endogenous functions.
Here, we present MIPhy, a method to identify and quantify phylogenetic instability by quantifying the incongruence of a gene’s evolutionary history. The motivation behind MIPhy was to produce a tool to aid in interpreting phylogenetic trees. It can predict which members of a gene family are under adaptive evolution, working only from a gene tree and the relationship between the species under consideration.
We demonstrate the usefulness of MIPhy by accurately predicting which members of the mammalian cytochrome P450 gene superfamily metabolize xenobiotics and which metabolize endogenous compounds. Our predictions correlate very well with known substrate specificities of the human enzymes. We also analyze the Caenorhabditis collagen gene family and use MIPhy to predict genes that produce an observable phenotype when knocked down in C. elegans, and show that our predictions correlate well with existing knowledge. The software can be downloaded and installed from https://github.com/dave-the-scientist/miphy under a BSD 2-clause license. It is also available as an online web tool at http://miphy.wasmuthlab.org.
Author Comment
This is a submission to PeerJ for review.
Supplemental Information
Supporting information
Table S1: The nematode collagens identified in this work from species in the genus Caenorhabditis.
Article S1. The MIPhy model of gene family evolution.
Figure S1: Example species and gene trees.
Data from collagen analysis
The following files are contained in the tarball:
all_caen.col2.fas - collagen sequences from seven species of Caenorhabditis in fasta format.
MIPhy_scores.csv - comma separated file generated by MIPhy. (1) sequence identifier; (2) species identifier; (3) MIPhy cluster identifier; (4) Instability score.
withPheno.txt - MIPhy clusters that contained a C. elegans collagen with a major RNAi phenotype. (1) MIPhy cluster identifier; (2) Instability score.
withoutPheno.txt - MIPhy clusters that contained no C. elegans collagens with a major RNAi phenotype. (1) MIPhy cluster identifier; (2) Instability score.