Development and validation of DNA metabarcoding COI primers for aquatic invertebrates using the R package "PrimerMiner"

Vasco Elbrecht; Florian Leese

doi:10.7287/peerj.preprints.2044v1

Development and validation of DNA metabarcoding COI primers for aquatic invertebrates using the R package "PrimerMiner"

Vasco Elbrecht ^1,2, Florian Leese^2,3

1 Dept. Animal Ecology, Evolution and Biodiversity, Ruhr University Bochum, Bochum, NRW, Germany

2 Aquatic Ecosytem Research, University of Duisburg-Essen, Essen, NRW, Germany

3 Centre for Water and Environmental Research (ZWU) Essen, University of Duisburg-Essen, Essen, NRW, Germany

DOI: 10.7287/peerj.preprints.2044v1

Published: 2016-05-15
Accepted: 2016-05-15

Subject Areas: Biodiversity, Bioinformatics, Ecology, Genetics, Molecular Biology
Keywords: Primer Development, DNA metabarcoding, ecosystem assessment, data mining, primer bias, in silico PCR

Copyright: © 2016 Elbrecht et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Elbrecht V, Leese F. 2016. Development and validation of DNA metabarcoding COI primers for aquatic invertebrates using the R package "PrimerMiner" PeerJ Preprints 4:e2044v1 https://doi.org/10.7287/peerj.preprints.2044v1

Abstract

1) DNA metabarcoding is a powerful tool to assess biodiversity by amplifying and sequencing a standardized gene marker region. However, typically used barcoding genes, such as the cytochrome c oxidase subunit I (COI) region for animals, are highly variable. Thus, different taxa in communities under study are often not amplified equally well and some might even remain undetected due to primer bias. To reduce these problems, optimized region- and/or ecosystem- specific metabarcoding primers are necessary. 2) We developed the R package PrimerMiner, which batch downloads DNA barcode gene sequences from BOLD and NCBI databases for specified target taxa and then applies sequence clustering to reduce biases introduced by differed number of available sequences per species. To design primers targeted for freshwater invertebrates, we downloaded COI data for the 15 most important invertebrate groups relevant for stream ecosystem assessment. Four primer sets with high base degeneracy were developed and their performance tested by sequencing ten mock community samples consisting each of 52 freshwater invertebrate taxa. Additionally, we evaluated the developed primers against other metabarcoding primers in silico using PrimerMiner. 3) Amplification and sequencing was successful for all ten mock community samples with the four different primer combinations. The developed primers varied in amplification efficiency and amount of taxa detected, but all primer sets detected more taxa than standard Folmer barcoding primers. Additionally, the BF / BR primers amplified taxa very consistently, the BF2+BR2 and BF2+BR1 primer combination even better than a previously tested ribosomal marker (16S). Except for the BF1+BR1 primer combination, all BF / BR primers detected all 42 insect taxa present in the mock samples. In silico evaluation of the developed primers showed that they are also likely to work very well on other non aquatic invertebrate samples. 4) With PrimerMiner, we here provide a useful tool to obtain relevant sequence data for targeted primer development and evaluation. Our sequence datasets generated with the newly developed metabarcoding primers demonstrate that the design of optimized primers with high base degeneracy is superior to classical markers and enable us to detect almost 100% of animal taxa present in a sample using the standard COI barcoding gene. Therefore, the PrimerMiner package and primers developed using this tool are useful beyond assessment of biodiversity in aquatic ecosystems.

Author Comment

This manuscript is currently prepared for submission at Methods in Ecology and Evolution. It's an early draft and needs further improvement, especially the in silico primer evaluations are preliminary data and will change slightly in the final version. Please provide feedback and criticism! You can download the manuscript file as a word document and provide commentarys and corrections with track changes if you like. We apreciate any constructive criticism! - Vasco & Florian

Supplemental Information

Overview of obtained spots per sample and amount of sequences lost in bioinformatic processing

A: Number of PE reads obtained for each sample, and proportion of PhiX and COI sequences without matching tags. Numbers above bars give the proportion of reads in percentage. B: Amount of sequences excluded in each major bioinformatics processing step for each sample. Size of the amplified region (not the fragment size) is given below in boxes for each primer combination.

DOI: 10.7287/peerj.preprints.2044v1/supp-1

Download

Overview of the base composition of the COI Folmer region for the 15 most important freshwater groups

The plot of group base composition was generated with PrimerMiner and used to develop the BF / BR primers and manually evaluate other common metabarcoding primers. The sequences for the Folmer binding regions (opaque colours) have been downloaded in February 2015 and had 26 bp clipping applied, as many clusters were affected by sequences which still contained the primer sequences. Sequences from the Folmer region were downloaded April 2015 not trimmed, as only the region amplified by the primers was used.

DOI: 10.7287/peerj.preprints.2044v1/supp-2

Download

Overview of used fusion sequencing primers

They contain standard illumina flow cell binds and primer binding sites as well as in line tags to distinguish between multiplexed samples. They can be used to amplify the target COI barcoding region and PCR products can directly be sequenced after PCR cleanup. We recommend using the parallel sequencing strategy outlined in Elbrecht & Lesse 2015, maximising sequence diversity for sequencing and doubling the amount of samples which can be tagged (up to 288). See Figure S4 for ideal tagging combinations.

DOI: 10.7287/peerj.preprints.2044v1/supp-3

Download

Matrix of similarities between all possible primer combinations using 5 bp inline tags

For some primers, several tagging-sequences are shown, due to nucleotide degeneracy in the tag sequences. Primer combinations which are similar at 4 sites (orange background) should be avoided for tagging as a single read error could lead to mistagging (blue squares). As we are using a parallel sequencing approach, also combinations like BF22+BR11 should be avoided, as both forward and reverse reads could occur together in sequencing read 1 or 2, possibly leading to mistagging. With the presented primer sets a total of 276 samples can be securely tagged (excluding the problematic primer sequences in blue squares). Number of good tagging combinations for each primer set (tagging possiblities are doubled when using parallel sequencing, see Elbrecht & Leese 2015)

DOI: 10.7287/peerj.preprints.2044v1/supp-4

Download

Overview of obtained products after one-step PCR and magnet bead clean up

Fragment concentrations were measured using the Qbit HS kit.

DOI: 10.7287/peerj.preprints.2044v1/supp-5

Download

Overview of “missing” base pairs at the primer 3' end for sequencing datasets from this study as well as Elbrecht & Leese 2015 and Elbrecht et al. 2016

After library demultiplexing a random subset of 5.000 reads was extracted from each sample and sequences aligned in Geneious 8 using MAFFT. The primer sequence + 10 bp (5 bp for 16S) was extracted from each alignment and the mean deviation from the expected primer length plotted for each sample. The proportion of sequences with the expected length is given for each sample on the right of each plot. The error bars show the standard deviation (N=10).

DOI: 10.7287/peerj.preprints.2044v1/supp-6

Download

Length distribution and abundance of individual sequences assigned to each OTU for all primer combinations

The percentage value indicates the proportion of sequences which are at least 10 bp longer or shorter than the expected amplicon length.

DOI: 10.7287/peerj.preprints.2044v1/supp-7

Download

Evaluation of primer combinations

Preliminary data, error penalties subject to changes / kind of mismatch not jet implemented! Primer pairs with a combined penalty score of below 100 were considered working (= green), with pairs above that being considered to no or only poor amplification (=red). If primer binding sites in the sequences contained terminal gaps they were counted as missing data (=gray).

DOI: 10.7287/peerj.preprints.2044v1/supp-8

Download

15 taxa (an sub grups, e.g. excluding terestiral Coleoptera) downloaded for development of the BF / BR primers with PrimerMiner

DOI: 10.7287/peerj.preprints.2044v1/supp-9

Download

Manuscript file as a word document

Feel free to use this file with track changes if your would like to provide feedback to tis manuscript. We apreciate your feedback and support! = )

DOI: 10.7287/peerj.preprints.2044v1/supp-10

Download

Development and validation of DNA metabarcoding COI primers for aquatic invertebrates using the R package "PrimerMiner"

Abstract

Author Comment

Supplemental Information

Overview of obtained spots per sample and amount of sequences lost in bioinformatic processing

Overview of the base composition of the COI Folmer region for the 15 most important freshwater groups

Overview of used fusion sequencing primers

Matrix of similarities between all possible primer combinations using 5 bp inline tags

Overview of obtained products after one-step PCR and magnet bead clean up

Overview of “missing” base pairs at the primer 3' end for sequencing datasets from this study as well as Elbrecht & Leese 2015 and Elbrecht et al. 2016

Length distribution and abundance of individual sequences assigned to each OTU for all primer combinations

Evaluation of primer combinations

15 taxa (an sub grups, e.g. excluding terestiral Coleoptera) downloaded for development of the BF / BR primers with PrimerMiner

Manuscript file as a word document

Feedback on other revisions

Add your feedback

Supplemental Information

Overview of obtained spots per sample and amount of sequences lost in bioinformatic processing

Overview of the base composition of the COI Folmer region for the 15 most important freshwater groups

Overview of used fusion sequencing primers

Matrix of similarities between all possible primer combinations using 5 bp inline tags

Overview of obtained products after one-step PCR and magnet bead clean up

Overview of “missing” base pairs at the primer 3' end for sequencing datasets from this study as well as Elbrecht & Leese 2015 and Elbrecht et al. 2016

Length distribution and abundance of individual sequences assigned to each OTU for all primer combinations

Evaluation of primer combinations

15 taxa (an sub grups, e.g. excluding terestiral Coleoptera) downloaded for development of the BF / BR primers with PrimerMiner

Manuscript file as a word document

Feedback on other revisions

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article