A simple and general method for accounting for phylogenetic uncertainty via Rubin’s rules in comparative analysis
- Published
- Accepted
- Subject Areas
- Computational Biology, Evolutionary Studies, Genetics, Taxonomy, Statistics
- Keywords
- Bayesian statistics, data augmentation, likelihood methods, phylogenetic comparative methods, missing data, model averaging, multiple imputation
- Copyright
- © 2015 Nakagawa et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. A simple and general method for accounting for phylogenetic uncertainty via Rubin’s rules in comparative analysis. PeerJ PrePrints 3:e1216v1 https://doi.org/10.7287/peerj.preprints.1216v1
Abstract
Phylogenetic comparative methods (PCMs), especially ones based on linear models, have played a central role in understanding species’ trait evolution. These methods, however, usually assume that phylogenetic trees are known without error or uncertainty, but this assumption is most likely incorrect. So far, Markov chain Monte Carlo, MCMC-based Bayesian methods have successfully been deployed to account for such phylogenetic uncertainty in PCMs. Yet, the use of these methods seems to have been limited, probably due to difficulties in their implementation. Here, we propose an approach with which phylogenetic uncertainty is incorporated in a simple, readily implementable and reliable manner. Our approach uses Rubin’s rules, which are an integral part of a standard multiple imputation procedure, often employed to recover missing data. In our case, we see the true phylogenetic tree as a missing piece of data, and apply Rubin’s rules to amalgamate parameter estimates from a number of models using a set of phylogenetic trees (e.g. a Bayesian posterior distribution of phylogenetic trees). Using a simulation study, we demonstrate that our approach using Rubin’s rules performs better in accounting for phylogenetic uncertainty than alternative methods such as MCMC-based Bayesian and Akaike information criterion, AIC-based model averaging approaches; that is, on average, our approach has the best 95% confidence/credible interval coverage among all. A unique property of the multiple imputation procedure is that the index, named ‘relative efficiency’, could be used to quantify the number of trees required for incorporating phylogenetic uncertainty. Thus, by using the relative efficiency, we show the required tree number is surprisingly small (~50 trees) at least in our simulation. In addition to these advantages above, our approach could be combined seamlessly with PCMs that utilize multiple imputation to recover missing data. Given the ubiquity of missing data, it is likely that the use of the multiple imputation procedure with Rubin’s rules will be popular to deal with phylogenetic uncertainty as well as missing data in comparative data.
Author Comment
This will be revised and submitted to Systematic Biology for review.