Strengths and weaknesses of DNA-based monitoring: Assessing macroinvertebrates in 18 Finnish streams with metabarcoding and morphology
- Published
- Accepted
- Subject Areas
- Biodiversity, Environmental Sciences, Molecular Biology, Zoology, Science Policy
- Keywords
- next generation sequencing, Biomass bias, metabarcoding, ecological status, DNA barcoding, macroinvertebrates
- Copyright
- © 2017 Elbrecht et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Strengths and weaknesses of DNA-based monitoring: Assessing macroinvertebrates in 18 Finnish streams with metabarcoding and morphology. PeerJ Preprints 5:e2759v1 https://doi.org/10.7287/peerj.preprints.2759v1
Abstract
1) DNA metabarcoding holds great promise for assessment of stream ecosystems with macroinvertebrates. However, few large-scale studies have compared the performance of DNA metabarcoding with that of routine morphological identification.
2) We tested metabarcoding using 18 macroinvertebrate samples from Finland using four primer sets. The samples were collected in 2013 and identified based on morphology as part of a Finnish stream monitoring program. Morphological identification was performed to the taxonomic level at which identification was reliable following standardized protocols.
3) We identified over twice the number of taxa, with greater species-level resolution, using DNA metabarcoding than morphology-based identification. For each sample, we detected more taxa by metabarcoding than by previous morphological methods, and all four primer sets showed similarly good performance. There was a significant linear correlation between sequence abundance and the number of taxa in each sample, but the scatter was up to two orders of magnitude. Ecological status assessment indices calculated from morphological and DNA metabarcoding datasets were mostly similar, with a few exceptions. With the recent drop in sequencing costs per sample, both methods identification are currently equally expensive.
4) We used actual samples for monitoring to demonstrate that DNA metabarcoding can achieve similar results and better taxonomic resolution than current morphological identification methods. Metabarcoding has thus already become a viable and reliable invertebrate identification method for stream assessment. However, to unlock the full potential of DNA metabarcoding for ecosystem assessment key problems in current laboratory protocols and reference databases, specified in this work, will require further attention.
Author Comment
Initial "readable" version of our paper, additional improvements on the text and flow will follow based of co author feedback suggestions by bioedit. It is planned to submit this work to Methods in Ecology and Evolution.
Supplemental Information
Figure S1. Map of sample locations
Map showing the location of the 18 macroinvertebrate samples analysed in this study. Sample IDs are indicating the stream type: Sa = clayish catchments, K = mineral land catchments, T = peatland catchments.
Figure S2. Flow chart detailing bioinformatics steps in our metabarcoding pipeline
Detailed overview of the bioinformatic processing of the Illumina high throughput sequencing data. Raw sequence data (A) is demultiplexed and preprocessed (PE merging, remove primers, trimming, reverse complement, removal of low quality reads) (B). The processed sequences are then pooled and demultiplexed with a minimum size of 3, to reduce noise by sequencing errors in clustering (C). Reads from all samples are then compared against the generated OTUs and OTUs with a minimum of 0.003% of sequences assigned in at least one replicate, are discarded (D). Only OTUs which are present with >0.003% abundance in both replicates are kept for statistical analysis of individual samples. All reads are again mapped against the OTU subset to generate the final OTU table, with taxonomy being assigned to each centroid using NCBI and BOLD (E). Only OTUs with >0.003% abundance in both replicates per sample are kept for statistical analysis, OTUs below are set to 0% (F).
Figure S3. Scatterplot showing the number of reads obtained for the samples
Number of read pairs obtained for each of the 144 samples (= 18 DNA extractions * 2 replicates * 4 primer pairs) plotted against the concentration of each sample.
Figure S4. Matrix indicating potential tag switching
Matrix showing the amount of sequences for all possible primer combinations. Combinations which were used for tagging samples are highlighted in green (with 6 samples marked with asterisks were excluded from the dataset, as they belong to another project). Other combinations with matching tags were highlighted in blue based on relative sequence abundance. Combinations in Orange and Red are highlighting mismatching tags likely due to sequencing errors or in the case of NA+NA PhiX reference sequences that were spiked into the Illumina run to increase sequence diversity.
Figure S5. Plot showing the numbers of shared OTUs between primer sets
Bar plot showing which of the 750 OTUs are detected with which primer sets. Hypothesized OTU reliability is shown with a gradient of reds, assuming that OTUs detected with only one or few primer sets are more likely to be false positives. Plot generated with UpSetR (Lex et al. 2014).
Figure S6. Plot showing reproducibility between replicates
Difference between OTU abundance between replicates sorted by read abundance indicated by color. If the ratio of maximum divided by minimum read abundance exceeds 10 the data point is plotted as an “x”. The total number of OTUs per sample is given in brackets, followed by the mean ratio. The mean ratio for rows and columns is given below the sample ID or right of the primer combinations.
Figure S7. Morphotaxa presence across samples
Occurrence of taxa across all 18 samples.
Figure S8. Comparison of taxonomic resolution between morphology and DNA metabarcoding
Taxonomic resolution of DNA metabarcoding and morphology based taxa determination across all 18 sample sites. Bars show the number of morphotaxa detected in each category, with relative abundance [%] given above individual bars.
Figure S9. Correlation between sequence abundance and morphotaxon abundance
Relative logarithmic sequence abundance plotted against logarithmic number of specimens in each morphologically identified taxon. The four primer combinations are indicated by color, with a linear regression line plotted it significant (p=<0.05). The values behind the primer names give the Adjusted R-squared value.