Fast and accurate estimation of the covariance between pairwise maximum likelihood distances

Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
DOI
10.7287/peerj.preprints.387v1
Subject Areas
Computational Biology, Mathematical Biology, Molecular Biology, Computational Science, Statistics
Keywords
maximum likelihood, pairwise distance, covariance, correlation, alignment
Copyright
© 2014 Gil
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Gil M. 2014. Fast and accurate estimation of the covariance between pairwise maximum likelihood distances. PeerJ PrePrints 2:e387v1

Abstract

Pairwise evolutionary distances are a model-based summary statistic for a set of molecular sequences. They represent the leaf-to-leaf path lengths of the underlying phylogenetic tree. Estimates from pairwise distances with overlapping paths covary because of shared mutation events. In any process that compares or combines distances, it is desirable to take these covariance structure into account to increase precision. In this paper, we present fast estimator for the covariance of two pairwise distance estimates under general Markov models. The estimator is based on a conjecture (going back to Nei and Jin, 1989) which links the covariance to path lengths. We prove it here under a simple symmetric substitution model. In a simulation, we show that our estimator outperforms previously published ones in terms of the mean squared error.

Author Comment

This submission is part of the Gaston Gonnet Festchrift.