Assessing strengths and weaknesses of DNA metabarcoding based macroinvertebrate identification for routine stream monitoring

Vasco Elbrecht; Edith Vamos; Kristian Meissner; Jukka Aroviita; Florian Leese

doi:10.7287/peerj.preprints.2759v2

Assessing strengths and weaknesses of DNA metabarcoding based macroinvertebrate identification for routine stream monitoring

Vasco Elbrecht¹, Edith Vamos¹, Kristian Meissner², Jukka Aroviita³, Florian Leese ^1,4

February 5, 2017

A peer-reviewed article of this Preprint also exists.

View peer-reviewed version

Author and article information

Abstract

1) DNA metabarcoding holds great promise for the assessment of macroinvertebrates in stream ecosystems. However, few large-scale studies have compared the performance of DNA metabarcoding with that of routine morphological identification.

2) We performed metabarcoding using four primer sets on macroinvertebrate samples from 18 stream sites across Finland. The samples were collected in 2013 and identified based on morphology as part of a Finnish stream monitoring program. Specimens were morphologically classified, following standardized protocols, to the lowest taxonomic level for which identification was feasible in the routine national monitoring.

3) DNA metabarcoding identified more than twice the number of taxa than the morphology-based protocol, and also yielded higher taxonomic resolution. For each sample, we detected more taxa by metabarcoding than by the morphological method, and all four primer sets exhibited comparably good performance. Sequence read abundance and the number of specimens per taxon (proxy for biomass) were significantly correlated in each sample, although adjusted R2 were low. With a few exceptions, the ecological status assessment metrics calculated from morphological and DNA metabarcoding datasets were similar. Given the recent reduction in sequencing costs, metabarcoding is currently approximately equal priced per sample to morphology-based identification.

4) Using samples obtained in the field, we demonstrated that DNA metabarcoding can achieve similar assessment results as those of current protocols for morphological identification. Thus, metabarcoding represents a feasible and reliable method to identify macroinvertebrates in stream bioassessment, and offers powerful advantage over morphological identification in providing identification for taxonomic groups that are unfeasible to identify in routine protocols. To unlock the full potential of DNA metabarcoding for ecosystem assessment, however, it will be necessary to address key problems with current laboratory protocols and reference databases.

Cite this as

Elbrecht V, Vamos E, Meissner K, Aroviita J, Leese F. 2017. Assessing strengths and weaknesses of DNA metabarcoding based macroinvertebrate identification for routine stream monitoring. PeerJ Preprints 5:e2759v2 https://doi.org/10.7287/peerj.preprints.2759v2

note This preprint is not peer-reviewed. You may wish to reference the subsequent peer-reviewed version of this article.

Author comment

improved language / flow of the manuscript

Sections

Supplemental Information

Figure S1. Map of sample locations

Map showing the location of the 18 macroinvertebrate samples analysed in this study. Sample IDs are indicating the stream type: Sa = clayish catchments, K = mineral land catchments, T = peatland catchments.

DOI: 10.7287/peerj.preprints.2759v2/supp-1

Download

Figure S2. Flow chart detailing bioinformatics steps in our metabarcoding pipeline

Detailed overview of the bioinformatic processing of the Illumina high throughput sequencing data. Raw sequence data (A) is demultiplexed and preprocessed (PE merging, remove primers, trimming, reverse complement, removal of low quality reads) (B). The processed sequences are then pooled and demultiplexed with a minimum size of 3, to reduce noise by sequencing errors in clustering (C). Reads from all samples are then compared against the generated OTUs and OTUs with a minimum of 0.003% of sequences assigned in at least one replicate, are discarded (D). Only OTUs which are present with >0.003% abundance in both replicates are kept for statistical analysis of individual samples. All reads are again mapped against the OTU subset to generate the final OTU table, with taxonomy being assigned to each centroid using NCBI and BOLD (E). Only OTUs with >0.003% abundance in both replicates per sample are kept for statistical analysis, OTUs below are set to 0% (F).

DOI: 10.7287/peerj.preprints.2759v2/supp-2

Download

Figure S3. Scatterplot showing the number of reads obtained for the samples

Number of read pairs obtained for each of the 144 samples (= 18 DNA extractions * 2 replicates * 4 primer pairs) plotted against the concentration of each sample.

DOI: 10.7287/peerj.preprints.2759v2/supp-3

Download

Figure S4. Matrix indicating potential tag switching

Matrix showing the amount of sequences for all possible primer combinations. Combinations which were used for tagging samples are highlighted in green (with 6 samples marked with asterisks were excluded from the dataset, as they belong to another project). Other combinations with matching tags were highlighted in blue based on relative sequence abundance. Combinations in Orange and Red are highlighting mismatching tags likely due to sequencing errors or in the case of NA+NA PhiX reference sequences that were spiked into the Illumina run to increase sequence diversity.

DOI: 10.7287/peerj.preprints.2759v2/supp-4

Download

Figure S5. Plot showing the numbers of shared OTUs between primer sets

Bar plot showing which of the 750 OTUs are detected with which primer sets. Hypothesized OTU reliability is shown with a gradient of reds, assuming that OTUs detected with only one or few primer sets are more likely to be false positives. Plot generated with UpSetR (Lex et al. 2014).

DOI: 10.7287/peerj.preprints.2759v2/supp-5

Download

Figure S6. Plot showing reproducibility between replicates

Difference between OTU abundance between replicates sorted by read abundance indicated by color. If the ratio of maximum divided by minimum read abundance exceeds 10 the data point is plotted as an “x”. The total number of OTUs per sample is given in brackets, followed by the mean ratio. The mean ratio for rows and columns is given below the sample ID or right of the primer combinations.

DOI: 10.7287/peerj.preprints.2759v2/supp-6

Download

Figure S7. Morphotaxa presence across samples

Occurrence of taxa across all 18 samples.

DOI: 10.7287/peerj.preprints.2759v2/supp-7

Download

Figure S8. Comparison of taxonomic resolution between morphology and DNA metabarcoding

Taxonomic resolution of DNA metabarcoding and morphology based taxa determination across all 18 sample sites. Bars show the number of morphotaxa detected in each category, with relative abundance [%] given above individual bars.

DOI: 10.7287/peerj.preprints.2759v2/supp-8

Download

Figure S9. Correlation between sequence abundance and morphotaxon abundance

Relative logarithmic sequence abundance plotted against logarithmic number of specimens in each morphologically identified taxon. The four primer combinations are indicated by color, with a linear regression line plotted it significant (p=<0.05). The values behind the primer names give the Adjusted R-squared value.

DOI: 10.7287/peerj.preprints.2759v2/supp-9

Download

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Vasco Elbrecht conceived and designed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Edith Vamos performed the experiments, wrote the paper, reviewed drafts of the paper.

Kristian Meissner conceived and designed the experiments, contributed reagents/materials/analysis tools, wrote the paper, reviewed drafts of the paper.

Jukka Aroviita analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Florian Leese conceived and designed the experiments, wrote the paper, reviewed drafts of the paper.

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

NCBI - SRA: SRR4112287 (HiSeq raw data)

Data Deposition

The following information was supplied regarding data availability:

Metabarcoding pipeline is available on GitHub:

https://github.com/VascoElbrecht/JAMP

Funding

FL and VE were supported by a grant of the Kurt Eberhard Bode Foundation to FL. KM was supported by the Academy of Finland project DETECT (289104). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Assessing strengths and weaknesses of DNA metabarcoding based macroinvertebrate identification for routine stream monitoring

Author and article information

Abstract

Author comment

Sections

Supplemental Information

Figure S1. Map of sample locations

Figure S2. Flow chart detailing bioinformatics steps in our metabarcoding pipeline

Figure S3. Scatterplot showing the number of reads obtained for the samples

Figure S4. Matrix indicating potential tag switching

Figure S5. Plot showing the numbers of shared OTUs between primer sets

Figure S6. Plot showing reproducibility between replicates

Figure S7. Morphotaxa presence across samples

Figure S8. Comparison of taxonomic resolution between morphology and DNA metabarcoding

Figure S9. Correlation between sequence abundance and morphotaxon abundance

Table S1. Sample site coordinates and calculated assessment indices

Table S2. Tagging combination used in the metabarcoding library

Table S3. Overview of morphotaxa identified based on morphology across samples

Table S4. OTU table

Additional Information

Competing Interests

Author Contributions

DNA Deposition

Data Deposition

Funding

Add your feedback

Publish for free

Five new journals in Chemistry

Sections

Supplemental Information

Figure S1. Map of sample locations

Figure S2. Flow chart detailing bioinformatics steps in our metabarcoding pipeline

Figure S3. Scatterplot showing the number of reads obtained for the samples

Figure S4. Matrix indicating potential tag switching

Figure S5. Plot showing the numbers of shared OTUs between primer sets

Figure S6. Plot showing reproducibility between replicates

Figure S7. Morphotaxa presence across samples

Figure S8. Comparison of taxonomic resolution between morphology and DNA metabarcoding

Figure S9. Correlation between sequence abundance and morphotaxon abundance

Table S1. Sample site coordinates and calculated assessment indices

Table S2. Tagging combination used in the metabarcoding library

Table S3. Overview of morphotaxa identified based on morphology across samples

Table S4. OTU table

Additional Information

Competing Interests

Author Contributions

DNA Deposition

Data Deposition

Funding

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article

Publish for free

Five new journals in Chemistry