Fig. S1: Overview of obtained spots per sample and amount of sequences lost in bioinformatic processing
A: Number of PE reads obtained for each sample, and proportion of PhiX and COI sequences without matching tags. Numbers above bars give the proportion of reads in percentage. B: Amount of sequences excluded in each major bioinformatics processing step for each sample. Size of the amplified region (not the fragment size) is given below in boxes for each primer combination.
Fig. S2: Overview of the base composition of the COI Folmer region for the 15 most important freshwater groups
The plot of group base composition was generated with PrimerMiner and used to develop the BF / BR primers and manually evaluate other common metabarcoding primers. The sequences for the Folmer binding regions (opaque colours) have been downloaded in February 2015 and had 26 bp clipping applied, as many clusters were affected by sequences which still contained the primer sequences. Sequences from the Folmer region were downloaded April 2015 not trimmed, as only the region amplified by the primers was used.
Fig. S3: Overview of used fusion sequencing primers
They contain standard illumina flow cell binds and primer binding sites as well as in line tags to distinguish between multiplexed samples. They can be used to amplify the target COI barcoding region and PCR products can directly be sequenced after PCR cleanup. We recommend using the parallel sequencing strategy outlined in Elbrecht & Lesse 2015, maximising sequence diversity for sequencing and doubling the amount of samples which can be tagged (up to 288). See Figure S4 for ideal tagging combinations.
Fig. S4: Matrix of similarities between all possible primer combinations using 5 bp inline tags
For some primers, several tagging-sequences are shown, due to nucleotide degeneracy in the tag sequences. Primer combinations which are similar at 4 sites (orange background) should be avoided for tagging as a single read error could lead to mistagging (blue squares). As we are using a parallel sequencing approach, also combinations like BF22+BR11 should be avoided, as both forward and reverse reads could occur together in sequencing read 1 or 2, possibly leading to mistagging. With the presented primer sets a total of 276 samples can be securely tagged (excluding the problematic primer sequences in blue squares). Number of good tagging combinations for each primer set (tagging possiblities are doubled when using parallel sequencing, see Elbrecht & Leese 2015)
Fig S5: Overview of obtained products after one-step PCR and magnet bead clean up
Fragment concentrations were measured using the Qbit HS kit.
Fig. S6: Overview of “missing” base pairs at the primer 3' end for sequencing datasets from this study as well as Elbrecht & Leese 2015 and Elbrecht et al. 2016
After library demultiplexing a random subset of 5.000 reads was extracted from each sample and sequences aligned in Geneious 8 using MAFFT. The primer sequence + 10 bp (5 bp for 16S) was extracted from each alignment and the mean deviation from the expected primer length plotted for each sample. The proportion of sequences with the expected length is given for each sample on the right of each plot. The error bars show the standard deviation (N=10).
Fig. S7: Length distribution and abundance of individual sequences assigned to each OTU for all primer combinations
The percentage value indicates the proportion of sequences which are at least 10 bp longer or shorter than the expected amplicon length.
Fig. S8: Evaluation of primer combinations
Preliminary data, error penalties subject to changes / kind of mismatch not jet implemented! Primer pairs with a combined penalty score of below 100 were considered working (= green), with pairs above that being considered to no or only poor amplification (=red). If primer binding sites in the sequences contained terminal gaps they were counted as missing data (=gray).
Tab. S1: Primers evaluated in silico
Metabarcoding and barcoding primers from the literature.
Tab. S2: 15 taxa (an sub grups, e.g. excluding terestiral Coleoptera) downloaded for development of the BF / BR primers with PrimerMiner
Tab. S3: OTU tables and taxonomic assignments
These tables were used to generate Figure 3
Scripts S1: Metabaroding pipeline scripts
Scripts to analyse the raw sequence data (includes scripts for making figures etc. does require Usearch and/or Vsearch)
Scripts S2: Scripts to extract haplotype data and haplotype sequences
For each replicate 52 different taxa were extracted in bulk. Haplotypes were extracted for each of the 10 samples and assembled: see file 160609_Haplotypes.fasta in this folder.