MIPhy: Identify and quantify rapidly evolving members of large gene families
A peer-reviewed article of this Preprint also exists.
Author and article information
Abstract
After transitioning to a new environment, species often exhibit rapid phenotypic innovation. One of the fastest mechanisms for this is duplication followed by specialization of existing genes, which leaves a phylogenetic signature of lineage-specific expansions and contractions. These can be identified by analyzing the gene family across several species and identifying patterns of gene duplication and loss that do not correlate with the known relationships between those species. This signature, termed phylogenetic instability, has been previously linked to adaptations that change the way an organism samples and responds to its environment; conversely, low phylogenetic instability has been previously linked to proteins with endogenous functions.
Here, we present MIPhy, a method to identify and quantify phylogenetic instability by quantifying the incongruence of a gene’s evolutionary history. The motivation behind MIPhy was to produce a tool to aid in interpreting phylogenetic trees. It can predict which members of a gene family are under adaptive evolution, working only from a gene tree and the relationship between the species under consideration.
We demonstrate the usefulness of MIPhy by accurately predicting which members of the mammalian cytochrome P450 gene superfamily metabolize xenobiotics and which metabolize endogenous compounds. Our predictions correlate very well with known substrate specificities of the human enzymes. We also analyze the Caenorhabditis collagen gene family and use MIPhy to predict genes that produce an observable phenotype when knocked down in C. elegans, and show that our predictions correlate well with existing knowledge. The software can be downloaded and installed from https://github.com/dave-the-scientist/miphy under a BSD 2-clause license. It is also available as an online web tool at http://miphy.wasmuthlab.org.
Cite this as
2018. MIPhy: Identify and quantify rapidly evolving members of large gene families. PeerJ Preprints 6:e26593v1 https://doi.org/10.7287/peerj.preprints.26593v1Author comment
This is a submission to PeerJ for review.
Sections
Supplemental Information
Supporting information
Table S1: The nematode collagens identified in this work from species in the genus Caenorhabditis.
Article S1. The MIPhy model of gene family evolution.
Figure S1: Example species and gene trees.
Data from collagen analysis
The following files are contained in the tarball:
all_caen.col2.fas - collagen sequences from seven species of Caenorhabditis in fasta format.
MIPhy_scores.csv - comma separated file generated by MIPhy. (1) sequence identifier; (2) species identifier; (3) MIPhy cluster identifier; (4) Instability score.
withPheno.txt - MIPhy clusters that contained a C. elegans collagen with a major RNAi phenotype. (1) MIPhy cluster identifier; (2) Instability score.
withoutPheno.txt - MIPhy clusters that contained no C. elegans collagens with a major RNAi phenotype. (1) MIPhy cluster identifier; (2) Instability score.
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
David M Curran conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
John S Gilleard conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.
James D Wasmuth conceived and designed the experiments, analyzed the data, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.
Data Deposition
The following information was supplied regarding data availability:
MIPhy is freely available https://github.com/dave-the-scientist/miphy under a BSD 2-clause license.
It is also available as an online tool at http://miphy.wasmuthlab.org.
Funding
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) through a Discovery Grant (#06239-2015) to JDW, a Collaborative Research and Training Experience Program (CREATE) program in Host-Parasite Interactions (#413888-2012) to JDW and JSG (and others), and by Alberta Innovates – Technology Futures through a doctoral scholarship to DMC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.