Sorting things out - assessing effects of unequal specimen biomass on DNA metabarcoding
- Published
- Accepted
- Subject Areas
- Biodiversity, Conservation Biology, Environmental Sciences, Molecular Biology, Zoology
- Keywords
- Biomass bias, specimen sorting, metabarcoding, next generation sequencing, ecosystem assessment, metagenomics, DNA barcoding
- Copyright
- © 2017 Elbrecht et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Sorting things out - assessing effects of unequal specimen biomass on DNA metabarcoding. PeerJ Preprints 5:e2561v2 https://doi.org/10.7287/peerj.preprints.2561v2
Abstract
1) Environmental bulk samples often contain many taxa with biomass differences of several orders of magnitude. This can be problematic in DNA metabarcoding and metagenomic high throughput sequencing approaches, as large specimens contribute disproportionate amounts of DNA template. Thus a few specimens of high biomass will dominate the dataset, potentially leading to smaller specimens remaining undetected. Sorting of samples and balancing the amounts of tissue used per size fraction should improve detection rates, but this approach has not been systematically tested.
2) Here we tested the effects of size sorting on taxa detection using freshwater macroinvertebrates. Kick sampling was performed at two locations of a low-mountain stream in West Germany, specimens were morphologically identified and sorted into small, medium and large size classes (< 2.5x5, 5x10 and up to 10x20 mm). Tissue from the 3 size categories was extracted individually, and pooled to simulate samples that were not sorted by biomass and samples that were sorted and then pooled so that each specimen contributed approximately equal amounts of biomass. DNA from all five extractions of samples from both sites was amplified using four different DNA metabarcoding primer sets targeting the Cytochrome c oxidase I (COI) gene. The library was sequenced on a HiSeq Illumina sequencer.
3) Sorting taxa by size and pooling them proportionately according to their abundance lead to a more equal amplification compared to the processing of complete samples without sorting. The sorted samples recovered 30% more taxa than the unsorted samples, at the same sequencing depth. Our results imply that sequencing depth can be decreased approximately five-fold when sorting the samples into three size classes.
4) Our results demonstrate that even a coarse size sorting can substantially improve detection of taxa using DNA metabarcoding. While high throughput sequencing will become more accessible and cheaper within the next years, sorting bulk samples by specimen biomass is a simple yet efficient method to reduce current sequencing costs.
Author Comment
Manuscript was rejected at MME: Study was to found to be well done but results to obvious. Minor things in the manuscript / manuscript figures were improved based on reviewer suggestions. Planned resubmission to "Ecology and Evolution"
Supplemental Information
Figure S1. Pictures of sorted specimens
Pictures of the specimens sorted into small, medium and large individuals. Also provides information on how S, M and L tissue was pooled to generate the proportionally sorted (So) and unsorted (Un) samples.
Figure S2. Flowchart detailing laboratory processing
Overview of the steps carried out for sample sorting and processing in the laboratory.
Figure S3. DNA extraction protocol
Shows the step where the digested buffers of S, M and L were pooled to generate unsorted (Un) and sorted (So) samples.
Figure S4. Sequencing depth and sequences discarded in bioinformatic processing
Barplot showing the number of total reads and proportion of sequences discarded in subsequent bioinformatic processing steps for all samples.
Figure S5. Flowchart detailing the bioinformatic pipeline
Figure giving an overview of the metabarcoding pipeline applied to this dataset.
Figure S6. Reproducibility between HiSeq lanes
Comparison of relative OTUs abundances between both HiSeq lanes.
Figure S7. Plot of OTU table
Visualisation of taxa detected within S, M, L, Un, So DNA extractions, with 4 different primer combinations. Data is also compared to morphological identifications and number of specimens of each morphologically identified taxon.
Figure S8. Database completeness
Plot showing the percent match of each OTU to the reference database, under consideration of read abundance.
Figure S9. Taxa identification with metabarcoding and morphology
Comparison of number of taxa identified with morphology and DNA metabarcoding on different taxonomic resolutions.
Figure S10. Taxa detection in sorted and unsorted samples
Comparison of the amount of diversity and taxa detected in sorted samples (So) and unsorted samples (Un).
Table S1. OTU table
Detailed OTU table giving the number of reads for each sample, including assigned taxonomy and OTU sequence.
Table S2. Morphologically identified taxa
Table giving an overview of morphologically identified taxa and abundance of specimens in S, M and L for both sample locations.
Manuscript File
Please use for providing feedback (with track changes). Thank you