Visitors   Views   Downloads

Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction methods

View preprint
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
Denoising the Denoisers https://t.co/kasw99RzIq via @PeerJPreprints
286 days ago
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
286 days ago
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
287 days ago
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
288 days ago
RT @PeerJPreprints: Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction…
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
Popular preprint this week: Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction methods https://t.co/bZ8u6G5wfm https://t.co/RVNOWSAtzG
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
290 days ago
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
290 days ago
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
RT @BetaScience: Moving away from OTUs and can't decide what method to use for sequence correction? Check out our preprint comparing Deblur…
NOT PEER-REVIEWED
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

A peer-reviewed article of this Preprint also exists.

View peer-reviewed version

Supplemental Information

Figure 1: Total number of ASVs identified by each denoising method for four different mock communities

Amplicon sequence variants (ASVs) were compared to a database of full-length amplicon sequences for jSMB5ust the microbes supposedly in the community (“Expected”) and against the full SILVA or ITS databases (“Database”) using BLASTN at 97% and 100% identity cutoffs. “Unmatched” sequences did not match an expected sequence or the SILVA/ITS databases at 97% identity or greater. Dotted lines indicate the total number of ASVs expected, accounting for 16S copy variation within genomes. A) Human Microbiome Project mock community; B) Extreme dataset; C) Fungal ITS1 mock community; D) Zymomock community.

DOI: 10.7287/peerj.preprints.26566v1/supp-1

Figure 2: Relative abundances of taxa generated by each denoising method for four different mock communities

All ASVs that matched with expected sequences at 97% or greater identity were assigned taxonomy using a BLASTN search against the expected sequences provided for each the Extreme, Human Microbiome Project, and Zymomock mock communities. All ASVs that matched an expected species with 97% or greater identity to the UNITE database were classified as expected sequences. Non_reference refers to the abundance of ASVs that did not match expected sequences with 97% or greater identity. A) Human Microbiome Project mock community; B) Extreme dataset - it is important to note that due to the low abundance of some organisms in the Extreme dataset they were not displayed in this figure; C) fungal ITS1 mock community; D) Zymomock community.

DOI: 10.7287/peerj.preprints.26566v1/supp-2

Figure 3: Intra-sample distances between denoising methods based on a real soil community

A) The weighted UniFrac distances between the same biological samples based on ASVs outputted by each of the different methods. B) The Bray-Curtis dissimilarity distances between the same biological samples based on genera outputted by the three methods after being classified with the RDP classifier. Deblur tends to be slightly more dissimilar when compared to the other two methods. C) Principal coordinates analysis of the weighted UniFrac distances of all the samples in the real soil dataset generated by each method. The three different profiles generated for each biological sample are colour-coded and are joined by an interconnecting line. D) Non-metric multidimensional scaling plot that displays the Bray-Curtis dissimilarity profiles of all the samples in the real soil dataset generated by each method. The three different profiles generated for each biological sample are colour-coded and are joined by an interconnecting line.

DOI: 10.7287/peerj.preprints.26566v1/supp-3

Figure 4: Run time and memory usage of each denoising method on a dataset of varying size

The time in seconds A) and memory in megabytes B) to run varying amounts of reads through the three different methods. Note time is on a log10 scale.

DOI: 10.7287/peerj.preprints.26566v1/supp-4

Supplemental Figure 1: Removal of low abundance ASVs removes many unmatched sequences from Deblur- and DADA2-generated ASVs

Amplicon sequence variants (ASVs) were run through an abundance filtering at 0.1% and then were compared to a database of full-length amplicon sequences for just the microbes supposedly in the community (“Expected”) and against the full SILVA or ITS databases (“Database”) using BLASTN at 97% and 100% identity cutoffs. “Unmatched” sequences did not match an expected sequence or the SILVA 16S rRNA gene database at 97% identity or greater. Dotted lines indicate the total number of ASVs expected, accounting for 16S gene-copy variations within genomes. A) Human Microbiome Project mock community; B) Extreme dataset; C) Fungal ITS1 mock community; D) Zymo mock community.

DOI: 10.7287/peerj.preprints.26566v1/supp-5

Supplemental Figure 2: DADA2 finds more rare organisms than Deblur or UNOISE3

Rank-abundance curves for ASVs (A) and classified species (B) generated from the soil dataset using the DADA2, Deblur and UNOISE3 methods. ASVs were classified using the RDP classifier against the Greengenes (13_8) database.

DOI: 10.7287/peerj.preprints.26566v1/supp-6

Supplemental Figure 3: Filter stringency does not affect relative abundance data drastically

The Human Microbiome Project mock community was run using DADA2, UNOISE3, and Deblur at varying stringency filters (low, medium and high). Resulting relative abundance profiles are shown for A) DADA2, B) Deblur and C) UNOISE3.

DOI: 10.7287/peerj.preprints.26566v1/supp-7

Supplemental Figure 4: Intra-sample distances between methods based on intestinal biopsy samples from pediatric Crohn’s disease patients and controls

A) The weighted UniFrac distances between the same biological samples based on ASVs outputted by each of the different methods. B) The Bray-Curtis dissimilarity distance between the same biological samples based on genera outputted by the three methods after being classified with the RDP classifier. C) Principal coordinates analysis of the weighted UniFrac distances of all the samples in the real soil dataset generated by each method. The three different profiles generated for each biological sample are colour-coded and are joined by an interconnecting line. D) Non-metric multidimensional scaling plot that displays the Bray-Curtis dissimilarity profiles of all the samples in the real soil dataset generated by each method. The three different profiles generated for each biological sample are colour-coded and are joined by an interconnecting line.

DOI: 10.7287/peerj.preprints.26566v1/supp-8

Supplemental Figure 5: Intra-sample distances between methods based on mouse exercise associated fecal samples

A) The weighted UniFrac distances between the same biological sample based on ASVs outputted by each of the different methods. B) The Bray-Curtis dissimilarity distance between the same biological samples based on genera outputted by the three methods after being classified with the RDP classifier. C) Principal coordinates analysis of the weighted UniFrac distances of all the samples in the real soil dataset generated by each method. The three different profiles generated for each biological sample are colour-coded and are joined by an interconnecting line. D) Non-metric multidimensional scaling plot that displays the Bray-Curtis dissimilarity profiles of all the samples in the real soil dataset generated by each method. The three different profiles generated for each biological sample are colour-coded and are joined by an interconnecting line.

DOI: 10.7287/peerj.preprints.26566v1/supp-9

Supplemental Figure 6: There are outlier genera that drastically differ in relative abundance between Deblur and the other denoising methods

ASVs were classified using the RDP classifier against the Greengenes (13_8) database. Relative abundances of each genus were than compared between methods and differences were plotted in a histogram. A) Relative abundance differences by genus between DADA2 and Deblur. B) Relative abundance differences by genus between DADA2 and UNOISE3. C) Relative abundance differences by genus between Deblur and UNOISE3.

DOI: 10.7287/peerj.preprints.26566v1/supp-10

Supplemental Figure 7: Top 5 genera driving differences between Deblur and the other two denoising tools in the soil dataset

Boxplots of the relative abundances per sample of five of the classified genera that had relative abundance differences greater than 1% between Deblur and both DADA2 and UNOISE3. Deblur calls more reads that were unclassified at the kingdom and class levels than DADA2 or UNOISE3. A) ASVs only classified at the Bacteria kingdom level. Deblur tends to find higher abundances of these ASVs. B) ASVs only classified at the Verrucomicrobia phylum level. Deblur finds higher abundances of these ASVs. C) ASVs only classified at the Spartobacteria class level. DADA2 and UNOISE3 find more of these ASVs than Deblur. D) ASVs classified at the Gp1 order level of the Acidobacteria_Gp1 class. E) ASVs classified at the Granulicella order level of the Acidobacterta_Gp1 class. Strikingly these two classifications share opposite relationships where Deblur finds more ASVs in the Gp1 order and DADA2 and UNOISE3 find more ASVs in the Granulicella order.

DOI: 10.7287/peerj.preprints.26566v1/supp-11

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Jacob T Nearing conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.

Gavin M Douglas conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the paper, approved the final draft.

André M Comeau contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

Morgan G.I Langille conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

The Human Microbiome Project mock community and Zymomock community sequences described here are accessible via ENA accession number PRJEB24409.

Data Deposition

The following information was supplied regarding data availability:

Scripts that were used to run all data analysis can be found at: https://github.com/nearinj/Denoiser-Comparison

Funding

JTN is a trainee in the Cancer Research Training Program of the Beatrice Hunter Cancer Research Institute, with funds provided by the Terry Fox Research Institute (TFRI). G.M.D is supported by an NSERC Alexander Graham Bell Canada Graduate Scholarship. MGIL is supported by an NSERC Discovery Grant. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
 
By posting this you agree to PeerJ's commenting policies