Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets

Nicholas School of the Environment, Duke University, Durham, North Carolina, United States
Program for Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States
Medical Scientist Training Program, Duke University, Durham, North Carolina, United States
Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States
Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina, United States
Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado, United States
Department of Earth Science and Engineering, Imperial College London, London, United Kingdom
Institute of Zoology, Zoological Society of London, London, United Kingdom
Department of Ecology and Evolution, University of Colorado Boulder, Boulder, Colorado, United States
Department of Statistical Science, Mathematics, and Computer Science, Duke University, Durham, North Carolina, United States
DOI
10.7287/peerj.preprints.2685v1
Subject Areas
Computational Biology, Ecology, Mathematical Biology, Microbiology, Statistics
Keywords
Microbiome, community phylogenetics, compositional data, sequence-count data, microbial biogeography, factor analysis, phylofactorization
Copyright
© 2017 Washburne et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Washburne AD, Silverman JD, Leff JW, Bennett DJ, Darcy JL, Mukherjee S, Fierer N, David LA. 2017. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PeerJ Preprints 5:e2685v1

Abstract

Marker gene sequencing of microbial communities has generated big datasets of microbial relative abundances varying across environmental conditions, sample sites and treatments. These data often come with putative phylogenies, providing unique opportunities to investigate how shared evolutionary history affects microbial abundance patterns. Here, we present a method to identify the phylogenetic factors driving patterns in microbial community composition. We use the method, "phylofactorization", to re-analyze datasets from the human body and soil microbial communities, demonstrating how phylofactorization is a dimensionality-reducing tool, an ordination-visualization tool, and an inferential tool for identifying edges in the phylogeny along which putative functional ecological traits may have arisen.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Feces/Tongue data and phylofactorization R scripts

DOI: 10.7287/peerj.preprints.2685v1/supp-3

Soil phylofactorization data and scripts

DOI: 10.7287/peerj.preprints.2685v1/supp-4

Phylofactor Tutotrial

How to use, visualize and customize phylofactorizations with the R package 'phylofactor'

DOI: 10.7287/peerj.preprints.2685v1/supp-5