mockrobiota: a public resource for microbiome bioinformatics benchmarking
Author and article information
Abstract
Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at https://github.com/caporaso-lab/mockrobiota. The materials contained in mockrobiota include dataset and sample metadata, expected composition data, which are annotated based on one or more reference taxonomies, links to raw data (e.g., raw sequence data) for each mock community dataset, and optional reference sequences for mock community members. mockrobiota does not supply physical sample materials directly, but the dataset metadata included for each mock community indicate whether physical sample materials are available (and associated contact information). At the time of this writing, mockrobiota contains 11 mock community datasets with known species compositions (including bacterial, archaeal, and eukaryotic mock communities), analyzed by high-throughput marker-gene sequencing. The availability of standard, public mock community data will facilitate ongoing methods optimizations; comparisons across studies that share source data; greater transparency and access; and eliminate redundancy. This dynamic resource is intended to expand and evolve to meet the changing needs of the ‘omics community.
Cite this as
2016. mockrobiota: a public resource for microbiome bioinformatics benchmarking. PeerJ Preprints 4:e2065v1 https://doi.org/10.7287/peerj.preprints.2065v1Author comment
This is a preprint submission to PeerJ.
Sections
Supplemental Information
Fig 1
Fig 1. Example usage of mockrobiota MC resource for marker-gene sequencing pipelines. MC datasets are selected based on multiple input criteria, including dataset metadata, sample metadata, and represented taxa. Raw data (e.g., fastq) are demultiplexed, sequences are dereplicated or clustered as OTUs, and taxonomy is assigned to representative sequences. Observed taxonomic assignments and abundances are compared to the expected composition (expected taxonomic assignments and abundances) of that MC, e.g., to generate precision and recall scores or correlations between observed/expected values.
Additional Information
Competing Interests
J. Gregory Caporaso is an Academic Editor for PeerJ.
Author Contributions
Nicholas A Bokulich conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Jai Ram Rideout conceived and designed the experiments, performed the experiments, analyzed the data, reviewed drafts of the paper.
William G Mercurio analyzed the data, reviewed drafts of the paper.
Benjamin Wolfe contributed reagents/materials/analysis tools, reviewed drafts of the paper.
Corinne F Maurice contributed reagents/materials/analysis tools, reviewed drafts of the paper.
Rachel J Dutton contributed reagents/materials/analysis tools, reviewed drafts of the paper.
Peter J Turnbaugh contributed reagents/materials/analysis tools, reviewed drafts of the paper.
Rob Knight contributed reagents/materials/analysis tools, reviewed drafts of the paper.
J. Gregory Caporaso conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, reviewed drafts of the paper.
Data Deposition
The following information was supplied regarding data availability:
https://github.com/caporaso-lab/mockrobiota
Funding
The authors received no funding for this work.