Sorting things out - assessing effects of unequal specimen biomass on DNA metabarcoding
- Published
- Accepted
- Subject Areas
- Biodiversity, Conservation Biology, Environmental Sciences, Molecular Biology, Zoology
- Keywords
- Biomass bias, specimen sorting, metabarcoding, next generation sequencing, ecosystem assessment, metagenomics, DNA barcoding
- Copyright
- © 2017 Elbrecht et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Sorting things out - assessing effects of unequal specimen biomass on DNA metabarcoding. PeerJ Preprints 5:e2561v3 https://doi.org/10.7287/peerj.preprints.2561v3
Abstract
Environmental bulk samples often contain many taxa that vary several orders of magnitude in biomass. This can be problematic in DNA metabarcoding and metagenomic high-throughput sequencing approaches, as large specimens contribute disproportionately high amounts of DNA template. Thus, a few specimens of high biomass will dominate the dataset, potentially leading to smaller specimens remaining undetected. Sorting of samples by specimen size and balancing the amounts of tissue used per size fraction should improve detection rates, but this approach has not been systematically tested.
Here we explored the effects of size sorting on taxa detection using two freshwater macroinvertebrate monitoring samples, collected from a low-mountain stream in Germany. Specimens were morphologically identified and sorted into three size classes (body size < 2.5x5, 5x10 and up to 10x20 mm). Tissue from each size category was extracted individually, and pooled to simulate samples that were not sorted by biomass ("Unsorted"). Additionally, size fractions were pooled so that each specimen contributed approximately equal amounts of biomass ("Sorted"). Mock samples were amplified using four different DNA metabarcoding primer sets targeting the Cytochrome c oxidase I (COI) gene.
Sorting taxa by size and pooling them proportionately according to their abundance lead to a more equal amplification of taxa compared to the processing of complete samples without sorting. The sorted samples recovered 30% more taxa than the unsorted samples, at the same sequencing depth. Our results imply that sequencing depth can be decreased approximately five-fold when sorting the samples into three size classes and pooling by specimen abundance.
Our study demonstrates that even a coarse size sorting can substantially improve taxa detection using DNA metabarcoding. While high throughput sequencing will become more accessible and cheaper within the next years, sorting bulk samples by specimen biomass is a simple yet efficient method to reduce current sequencing costs.
Author Comment
Revised manuscript, currently in revision at Ecology and Evolution. Did rewrite large sections of the methods parts and added improved figures to better explain how the mock communities were generated. Many further minor improvements trougout the manuscript.
Supplemental Information
Figure S1. Pictures of sorted specimens
Pictures of the specimens sorted into small, medium and large individuals. Also provides information on how S, M and L tissue was pooled to generate the proportionally sorted (So) and unsorted (Un) samples.
Figure S2. Flowchart detailing laboratory processing
Overview of the steps carried out for sample sorting and processing in the laboratory.
Figure S3. DNA extraction protocol
Shows the step where the digested buffers of S, M and L were pooled to generate unsorted (Un) and sorted (So) samples.
Figure S4. Sequencing depth and sequences discarded in bioinformatic processing
Barplot showing the number of total reads and proportion of sequences discarded in subsequent bioinformatic processing steps for all samples.
Figure S5. Flowchart detailing the bioinformatic pipeline
Figure giving an overview of the metabarcoding pipeline applied to this dataset.
Figure S6. Reproducibility between HiSeq lanes
Comparison of relative OTUs abundances between both HiSeq lanes.
Figure S7. Plot of OTU table
Visualisation of taxa detected within S, M, L, Un, So DNA extractions, with 4 different primer combinations. Data is also compared to morphological identifications and number of specimens of each morphologically identified taxon.
Figure S8. Database completeness
Plot showing the percent match of each OTU to the reference database, under consideration of read abundance.
Figure S9. Taxa identification with metabarcoding and morphology
Comparison of number of taxa identified with morphology and DNA metabarcoding on different taxonomic resolutions.
Figure S10. Taxa detection in sorted and unsorted samples
Comparison of the amount of diversity and taxa detected in sorted samples (So) and unsorted samples (Un).
Table S1. Used identifiaction keys
Used litrature for taxa identification
Table S2. OTU table
Detailed OTU table giving the number of reads for each sample, including assigned taxonomy and OTU sequence.
Table S3. Morphologically identified taxa
Table giving an overview of morphologically identified taxa and abundance of specimens in S, M and L for both sample locations.