Automatic definition of robust microbiome sub-states in longitudinal data

Biological Informatics Group, Centro de Biotecnologıa y Genomica de Plantas UPM-INIA, Universidad Politécnica de Madrid, Madrid, Spain
Systems Biotechnology Group, Centro Nacional de Biotecnología, Spanish National Research Council, Madrid, Spain
DOI
10.7287/peerj.preprints.26657v1
Subject Areas
Bioinformatics, Computational Biology, Microbiology, Data Mining and Machine Learning
Keywords
Microbiome, Sub-states, Clustering, Longitudinal dataset, Machine Learning, Metagenomics
Copyright
© 2018 García-Jiménez et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
García-Jiménez B, Wilkinson MD. 2018. Automatic definition of robust microbiome sub-states in longitudinal data. PeerJ Preprints 6:e26657v1

Abstract

The analysis of microbiome dynamics would allow us to elucidate patterns within microbial community evolution; however, microbiome state-transition dynamics have been scarcely studied. This is in part because a necessary first-step in such analyses has not been well-defined: how to deterministically describe a microbiome’s ”state”. Clustering in states have been widely studied, although no standard has been concluded yet. We propose a generic, domain-independent and automatic procedure to determine a reliable set of microbiome sub-states within a specific dataset, and with respect to the conditions of the study. The robustness of sub-state identification is established by the combination of diverse techniques for stable cluster verification. We reuse four distinct longitudinal microbiome datasets to demonstrate the broad applicability of our method, analysing results with different taxa subset allowing to adjust it depending on the application goal, and showing that the methodology provides a set of robust sub-states to examine in downstream studies about dynamics in microbiome.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Robust clustering evaluation, with HCLUST algorithm, in different datasets

From top to bottom: (1) Human gut microbiome (David2014 et al.,2014), (2) Chick gut (Ballou et al.,2016), (3) Vagina (Gajer et al.,2012), (4) Preterm infant gut (La Rosa et al.,2014).

DOI: 10.7287/peerj.preprints.26657v1/supp-1

Clusters in Chick Gut with different number of taxa, represented as Principal Coordinates graphs

Top row: default taxonomic level (i.e. species), bottom row: genus aggregation. Columns from left to right: all, dominant and non-dominant.

DOI: 10.7287/peerj.preprints.26657v1/supp-2