Meta-SourceTracker: Application of Bayesian source tracking to shotgun metagenomics

Bioinformatics and Medical Informatics Program, San Diego State University, San Diego, California, United States
Department of Mathematics and Statistics, San Diego State University, San Diego, California, United States
Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
Department of Biology, San Diego State University, San Diego, California, United States
DOI
10.7287/peerj.preprints.27869v1
Subject Areas
Bioinformatics, Microbiology
Keywords
Bioinformatics, Environmental microbiology, Software
Copyright
© 2019 McGhee et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
McGhee JJ, Rawson N, Bailey BA, Fernandez-Guerra A, Kelley ST. 2019. Meta-SourceTracker: Application of Bayesian source tracking to shotgun metagenomics. PeerJ Preprints 7:e27869v1

Abstract

Background. Microbial source tracking methods are used to determine the origin of contaminating bacteria and other microorganisms, particularly in contaminated water systems. The Bayesian SourceTracker approach uses deep-sequencing marker gene libraries (16S ribosomal RNA) to determine the proportional contributions of bacteria from many potential source environments to a given sink environment simultaneously. Since its development, SourceTracker has been applied to an extensive diversity of studies, from beach contamination to studying human behavior.

Methods. Here, we developed metagenomic-SourceTracker (mSourceTracker), an expanded SourceTracker approach for shotgun metagenomic datasets. We tested mSourceTracker using sink samples from coastal marine environment metagenomes and source environment metagenomes collected from freshwater, marine, soil, sand and gut environments. We also implemented features for determining the stability of source proportion estimates using new techniques that split metagenomic data for domain-specific analyses (i.e., Bacteria, Archaea, Eukarya and viruses). The added features allow users to visualize the precision of mSourceTracker and assess ways to optimize performance.

Results.Our results found mSourceTracker to be highly effective at predicting the composition of known sources using shotgun metagenomic libraries. In addition, we showed that different taxonomic domains sometimes presented highly divergent pictures of source origins. These findings indicated that applying mSourceTracker to separate domains may provide a deeper understanding of the microbial origins of complex, mixed-source environments, and further suggested that certain domains may be preferable for tracking specific sources of contamination.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

List of accession numbers

Accession numbers of metagenomes used to test mSourceTracker separated by environment type.

DOI: 10.7287/peerj.preprints.27869v1/supp-1

Python code

Modified python files for use in SourceTracker2.

DOI: 10.7287/peerj.preprints.27869v1/supp-2