A simple and general method for simultaneously accounting for phylogenetic and species sampling uncertainty via Rubin’s rules in comparative analysis

Evolution & Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, Australia
Laboratoire d’Écologie Alpine (LECA-UMR CNRS 5553), Université Joseph Fourier, Grenoble, France
DOI
10.7287/peerj.preprints.1216v2
Subject Areas
Computational Biology, Evolutionary Studies, Genetics, Taxonomy, Statistics
Keywords
Bayesian statistics, data augmentation, likelihood methods, phylogenetic comparative methods, missing data, model averaging, multiple imputation
Copyright
© 2018 Nakagawa et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Nakagawa S, de Villemereuil P. 2018. A simple and general method for simultaneously accounting for phylogenetic and species sampling uncertainty via Rubin’s rules in comparative analysis. PeerJ Preprints 6:e1216v2

Abstract

Phylogenetic comparative methods (PCMs), especially ones based on linear models, have played a central role in understanding species’ trait evolution. These methods, however, usually assume that phylogenetic trees are known without error or uncertainty, but this assumption is most likely incorrect. So far, Markov chain Monte Carlo, MCMC-based Bayesian methods have mainly been deployed to account for such ‘phylogenetic uncertainty’ in PCMs. Here, we propose an approach with which phylogenetic uncertainty is incorporated in a simple, readily implementable and reliable manner. Our approach uses Rubin’s rules, which are an integral part of a standard multiple imputation procedure, often employed to recover missing data. We see true phylogenetic trees as missing data under this approach. Further, unmeasured species in comparative data (i.e. missing trait data) can be seen as another source of uncertainty in PCMs because arbitrary sampling of species in a given taxon or ‘species sampling uncertainty’ can affect estimation in PCMs. Using two simulation studies, we show our method can account for phylogenetic uncertainty under many different scenarios (e.g. uncertainty in branching and branch lengths) and, at the same time, it can handle missing trait data (i.e., species sampling uncertainty). A unique property of the multiple imputation procedure is that an index, named ‘relative efficiency’, could be used to quantify the number of trees required for incorporating phylogenetic uncertainty. Thus, by using the relative efficiency, we show the required tree number is surprisingly small (~50 trees). However, the most notable advantage of our method is that it could be combined seamlessly with PCMs that utilize multiple imputation to handle simultaneously phylogenetic uncertainty (i.e. missing true trees) and species sampling uncertainty (i.e., missing trait data) in PCMs.

Author Comment

This version is a complete revision of the last version including two new simulation studies