Validation and development of COI metabarcoding primers for freshwater macroinvertebrate bioassessment
- Published
- Accepted
- Subject Areas
- Biodiversity, Bioinformatics, Ecology, Genetics, Molecular Biology
- Keywords
- DNA Barcoding, primer evaluation, in silico PCR, ecosystem assessment, primer bias, Primer development
- Copyright
- © 2017 Elbrecht et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Validation and development of COI metabarcoding primers for freshwater macroinvertebrate bioassessment. PeerJ Preprints 5:e2044v5 https://doi.org/10.7287/peerj.preprints.2044v5
Abstract
A central challenge in the present era of biodiversity loss is to assess and manage human impacts on freshwater ecosystems. Macroinvertebrates are an important group for bioassessment as many taxa show specific responses to environmental conditions. However, generating accurate macroinvertebrate inventories based on larval morphology is difficult and error-prone. Here, DNA metabarcoding provides new opportunities. Its potential to accurately identify invertebrates in bulk samples to the species level, has been demonstrated in several case studies. However, DNA based identification is often limited by primer bias, potentially leading to taxa in the sample remaining undetected. Thus, the success of DNA metabarcoding as an emerging technique for bioassessment critically relies on carefully evaluating primers.
We used the R package PrimerMiner to obtain and process cytochrome c oxidase I (COI) sequence data for the 15 most globally relevant freshwater invertebrate groups for stream assessment. Using these sequence alignments, we developed four primer combinations optimized for freshwater macrozoobenthos. All primers were evaluated by sequencing ten mock community samples, each consisting of 52 freshwater invertebrate taxa. Additionally, popular metabarcoding primers from the literature and the developed primers were tested in silico against the 15 relevant invertebrate groups.
The developed primers varied in amplification efficiency and the number of detected taxa, yet all detected more taxa than standard ‘Folmer’ barcoding primers. Two new primer combinations showed more consistent amplification than a previously tested ribosomal marker (16S) and detected all 42 insect taxa present in the mock community samples. In silico evaluation revealed critical design flaws in some commonly used primers from the literature.
We demonstrate a reliable strategy to develop optimized primers using the tool PrimerMiner. The developed primers detected almost all taxa present in the mock samples, and we argue that high base degeneracy is necessary to decrease primer bias as confirmed by experimental results and in silico primer evaluation. We further demonstrate that some primers currently used in metabarcoding studies may not be suitable for amplification of freshwater macroinvertebrates. Therefore, careful primer evaluation and more region / ecosystem specific primers are needed before DNA metabarcoding can be used for routine bioassessment of freshwater ecosystems.
Author Comment
Updated revision, language and grammar improvements. Soon to be published in Frontiers in Environmental Science.
Supplemental Information
Fig. S1: Overview of obtained spots per sample and amount of sequences lost in bioinformatic processing
A: Number of PE reads obtained for each sample, and proportion of PhiX and COI sequences without matching tags. Numbers above bars give the proportion of reads in percentage. B: Amount of sequences excluded in each major bioinformatics processing step for each sample. Size of the amplified region (not the fragment size) is given below in boxes for each primer combination.
Fig. S2: Overview of the base composition of the COI Folmer region for the 15 most important freshwater groups
The plot of group base composition was generated with PrimerMiner and used to develop the BF / BR primers and manually evaluate other common metabarcoding primers. The sequences for the Folmer binding regions (opaque colours) have been downloaded in February 2015 and had 26 bp clipping applied, as many clusters were affected by sequences which still contained the primer sequences. Sequences from the Folmer region were downloaded April 2015 not trimmed, as only the region amplified by the primers was used.
Fig. S3: Overview of used fusion sequencing primers
They contain standard illumina flow cell binds and primer binding sites as well as in line tags to distinguish between multiplexed samples. They can be used to amplify the target COI barcoding region and PCR products can directly be sequenced after PCR cleanup. We recommend using the parallel sequencing strategy outlined in Elbrecht & Lesse 2015, maximising sequence diversity for sequencing and doubling the amount of samples which can be tagged (up to 288). See Figure S4 for ideal tagging combinations.
Fig. S4: Matrix of similarities between all possible primer combinations using 5 bp inline tags
For some primers, several tagging-sequences are shown, due to nucleotide degeneracy in the tag sequences. Primer combinations which are similar at 4 sites (orange background) should be avoided for tagging as a single read error could lead to mistagging (blue squares). As we are using a parallel sequencing approach, also combinations like BF22+BR11 should be avoided, as both forward and reverse reads could occur together in sequencing read 1 or 2, possibly leading to mistagging. With the presented primer sets a total of 276 samples can be securely tagged (excluding the problematic primer sequences in blue squares). Number of good tagging combinations for each primer set (tagging possiblities are doubled when using parallel sequencing, see Elbrecht & Leese 2015)
Fig S5: Overview of obtained products after one-step PCR and magnet bead clean up
Fragment concentrations were measured using the Qbit HS kit.
Fig. S6: Overview of “missing” base pairs at the primer 3' end for sequencing datasets from this study as well as Elbrecht & Leese 2015 and Elbrecht et al. 2016
After library demultiplexing a random subset of 5.000 reads was extracted from each sample and sequences aligned in Geneious 8 using MAFFT. The primer sequence + 10 bp (5 bp for 16S) was extracted from each alignment and the mean deviation from the expected primer length plotted for each sample. The proportion of sequences with the expected length is given for each sample on the right of each plot. The error bars show the standard deviation (N=10).
Fig. S7: Length distribution and abundance of individual sequences assigned to each OTU for all primer combinations
The percentage value indicates the proportion of sequences which are at least 10 bp longer or shorter than the expected amplicon length.
Fig. S8: Evaluation of primer combinations
Preliminary data, error penalties subject to changes / kind of mismatch not jet implemented! Primer pairs with a combined penalty score of below 100 were considered working (= green), with pairs above that being considered to no or only poor amplification (=red). If primer binding sites in the sequences contained terminal gaps they were counted as missing data (=gray).
Tab. S1: Primers evaluated in silico
Metabarcoding and barcoding primers from the literature.
Tab. S2: 15 taxa (an sub grups, e.g. excluding terestiral Coleoptera) downloaded for development of the BF / BR primers with PrimerMiner
Tab. S3: OTU tables and taxonomic assignments
These tables were used to generate Figure 3
Scripts S1: Metabaroding pipeline scripts
Scripts to analyse the raw sequence data (includes scripts for making figures etc. does require Usearch and/or Vsearch)
Scripts S2: Scripts to extract haplotype data and haplotype sequences
For each replicate 52 different taxa were extracted in bulk. Haplotypes were extracted for each of the 10 samples and assembled: see file 160609_Haplotypes.fasta in this folder.