Background. Microbial source tracking methods are used to determine the origin of contaminating bacteria and other microorganisms, particularly in contaminated water systems. The Bayesian SourceTracker approach uses deep-sequencing marker gene libraries (16S ribosomal RNA) to determine the proportional contributions of bacteria from many potential source environments to a given sink environment simultaneously. Since its development, SourceTracker has been applied to an extensive diversity of studies, from beach contamination to studying human behavior.
Methods. Here, we developed metagenomic-SourceTracker (mSourceTracker), an expanded SourceTracker approach for shotgun metagenomic datasets. We tested mSourceTracker using sink samples from coastal marine environment metagenomes and source environment metagenomes collected from freshwater, marine, soil, sand and gut environments. We also implemented features for determining the stability of source proportion estimates using new techniques that split metagenomic data for domain-specific analyses (i.e., Bacteria, Archaea, Eukarya and viruses). The added features allow users to visualize the precision of mSourceTracker and assess ways to optimize performance.
Results.Our results found mSourceTracker to be highly effective at predicting the composition of known sources using shotgun metagenomic libraries. In addition, we showed that different taxonomic domains sometimes presented highly divergent pictures of source origins. These findings indicated that applying mSourceTracker to separate domains may provide a deeper understanding of the microbial origins of complex, mixed-source environments, and further suggested that certain domains may be preferable for tracking specific sources of contamination.