Visitors   Views   Downloads
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

Supplemental Information

Fig. S1: Overview of obtained spots per sample and amount of sequences lost in bioinformatic processing

A: Number of PE reads obtained for each sample, and proportion of PhiX and COI sequences without matching tags. Numbers above bars give the proportion of reads in percentage. B: Amount of sequences excluded in each major bioinformatics processing step for each sample. Size of the amplified region (not the fragment size) is given below in boxes for each primer combination.

DOI: 10.7287/peerj.preprints.2044v3/supp-1

Fig. S2: Overview of the base composition of the COI Folmer region for the 15 most important freshwater groups

The plot of group base composition was generated with PrimerMiner and used to develop the BF / BR primers and manually evaluate other common metabarcoding primers. The sequences for the Folmer binding regions (opaque colours) have been downloaded in February 2015 and had 26 bp clipping applied, as many clusters were affected by sequences which still contained the primer sequences. Sequences from the Folmer region were downloaded April 2015 not trimmed, as only the region amplified by the primers was used.

DOI: 10.7287/peerj.preprints.2044v3/supp-2

Fig. S3: Overview of used fusion sequencing primers

They contain standard illumina flow cell binds and primer binding sites as well as in line tags to distinguish between multiplexed samples. They can be used to amplify the target COI barcoding region and PCR products can directly be sequenced after PCR cleanup. We recommend using the parallel sequencing strategy outlined in Elbrecht & Lesse 2015, maximising sequence diversity for sequencing and doubling the amount of samples which can be tagged (up to 288). See Figure S4 for ideal tagging combinations.

DOI: 10.7287/peerj.preprints.2044v3/supp-3

Fig. S4: Matrix of similarities between all possible primer combinations using 5 bp inline tags

For some primers, several tagging-sequences are shown, due to nucleotide degeneracy in the tag sequences. Primer combinations which are similar at 4 sites (orange background) should be avoided for tagging as a single read error could lead to mistagging (blue squares). As we are using a parallel sequencing approach, also combinations like BF22+BR11 should be avoided, as both forward and reverse reads could occur together in sequencing read 1 or 2, possibly leading to mistagging. With the presented primer sets a total of 276 samples can be securely tagged (excluding the problematic primer sequences in blue squares). Number of good tagging combinations for each primer set (tagging possiblities are doubled when using parallel sequencing, see Elbrecht & Leese 2015)

DOI: 10.7287/peerj.preprints.2044v3/supp-4

Fig S5: Overview of obtained products after one-step PCR and magnet bead clean up

Fragment concentrations were measured using the Qbit HS kit.

DOI: 10.7287/peerj.preprints.2044v3/supp-5

Fig. S6: Overview of “missing” base pairs at the primer 3' end for sequencing datasets from this study as well as Elbrecht & Leese 2015 and Elbrecht et al. 2016

After library demultiplexing a random subset of 5.000 reads was extracted from each sample and sequences aligned in Geneious 8 using MAFFT. The primer sequence + 10 bp (5 bp for 16S) was extracted from each alignment and the mean deviation from the expected primer length plotted for each sample. The proportion of sequences with the expected length is given for each sample on the right of each plot. The error bars show the standard deviation (N=10).

DOI: 10.7287/peerj.preprints.2044v3/supp-6

Fig. S7: Length distribution and abundance of individual sequences assigned to each OTU for all primer combinations

The percentage value indicates the proportion of sequences which are at least 10 bp longer or shorter than the expected amplicon length.

DOI: 10.7287/peerj.preprints.2044v3/supp-7

Fig. S8: Evaluation of primer combinations

Preliminary data, error penalties subject to changes / kind of mismatch not jet implemented! Primer pairs with a combined penalty score of below 100 were considered working (= green), with pairs above that being considered to no or only poor amplification (=red). If primer binding sites in the sequences contained terminal gaps they were counted as missing data (=gray).

DOI: 10.7287/peerj.preprints.2044v3/supp-8

Tab. S1: Primers evaluated in silico

Metabarcoding and barcoding primers from the literature.

DOI: 10.7287/peerj.preprints.2044v3/supp-9

Tab. S2: 15 taxa (an sub grups, e.g. excluding terestiral Coleoptera) downloaded for development of the BF / BR primers with PrimerMiner

DOI: 10.7287/peerj.preprints.2044v3/supp-10

Tab. S3: OTU tables and taxonomic assignments

These tables were used to generate Figure 3

DOI: 10.7287/peerj.preprints.2044v3/supp-11

Scripts S1: Metabaroding pipeline scripts

Scripts to analyse the raw sequence data (includes scripts for making figures etc. does require Usearch and/or Vsearch)

DOI: 10.7287/peerj.preprints.2044v3/supp-12

Scripts S2: Scripts to extract haplotype data and haplotype sequences

For each replicate 52 different taxa were extracted in bulk. Haplotypes were extracted for each of the 10 samples and assembled: see file 160609_Haplotypes.fasta in this folder.

DOI: 10.7287/peerj.preprints.2044v3/supp-13

Manuscript file as a word document

Feel free to use this file with track changes if your would like to provide feedback to tis manuscript. We apreciate your feedback and support! = )

DOI: 10.7287/peerj.preprints.2044v3/supp-14

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Vasco Elbrecht conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Florian Leese conceived and designed the experiments, wrote the paper, reviewed drafts of the paper.

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

Raw HiSeq sequence data is available in the NCB SRA archive under the accession number SRX1619153.

Data Deposition

The following information was supplied regarding data availability:

Used scripts are available as supplementary information. The PrimerMiner R package can be downloaded from GitHub


VE and FL are supported by a grant of the Kurt Eberhard Bode foundation to FL. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
By posting this you agree to PeerJ's commenting policies