imGLAD: Accurate detection and quantification of target organisms in metagenomes

Center for Bioinformatics and Computational Genomics, Georgia Institute of Technology, Atlanta, Georgia, United States
School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, United States
School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States
Center for Disease Control and Prevention, Atlanta, Georgia, United States
Produce Safety and Microbiology, USDA-ARS Western Regional Research Center, U.S. Department of Agriculture, Albany, California, United States
DOI
10.7287/peerj.preprints.26515v1
Subject Areas
Bioinformatics, Genomics
Keywords
genomes, metagenomics, limit of detection
Licence
This is an open access article, free of all copyright, made available under the Creative Commons Public Domain Dedication. This work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
Cite this article
Castro J, Rodriguez-R LM, Weigand MR, Hatt JK, Carter MQ, Konstantinidis KT. 2018. imGLAD: Accurate detection and quantification of target organisms in metagenomes. PeerJ Preprints 6:e26515v1

Abstract

Accurate detection of target microbial species in metagenomic datasets from environmental samples remains limited because the limit of detection of current methods is typically inaccessible and the frequency of false-positives, resulting from inadequate identification of regions of the genome that are either too highly conserved to be diagnostic (e.g., rRNA genes) or prone to frequent horizontal genetic exchange (e.g., mobile elements) remains unknown. To overcome these limitations, we introduce imGLAD, which aims to detect genomic sequences in metagenomic datasets. imGLAD achieves high accuracy because it uses the sequence-discrete population concept for discriminating between metagenomic reads originating from the target organism compared to reads from co-occurring close relatives, masks regions of the genome that are not informative using the MyTaxa engine, and models both the sequencing breadth and depth to determine relative abundance and limit of detection. We validated imGLAD by analysing metagenomic datasets derived from spinach leafs inoculated with the enteric pathogen Escherichia coli O157:H7 and showed that its limit of detection is comparable to that of PCR-based approaches (~1 cell/gram).

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Supplementary Material

Supplementary Figures S1-S7.

DOI: 10.7287/peerj.preprints.26515v1/supp-1