Figure S1: Schematic overview of errors affecting metabarcoding data and clustering / denoising strategies to reduce them
Overview of the metabarcoding process, with key biases potentially affecting sequence accuracy (shown in red). In the bulk sample (A) several species with different biomass (indicated by circle size) and distinct haplotypes (indicated by colour) are present. After tissue homogenization and DNA extraction the COI marker is amplified using PCR (B), which can not only skew sequence abundance but also fail to amplify taxa due to primer bias (Elbrecht & Leese, 2015) or insufficient sequencing depth in the case of underrepresented / rare taxa (Elbrecht, Peinert & Leese, 2017). In the process of HTS (C) many new false sequence variants are generated due to sequencing errors (Schirmer et al., 2015), chimera formation (Edgar et al., 2011) and mixing of multiplexed samples (Esling, Lejzerowicz & Pawlowski, 2015; Schnell, Bohmann & Gilbert, 2015). The impact of these errors is usually reduced by strict quality filtering and clustering of similar sequences into operational taxonomic units (OTUs). Normally, only the most abundant sequence in an OTU is considered and used to identify the respective species, which in turn means that information on genetic diversity is lost (Callahan, McMurdie & Holmes, 2017) (D). Recently alternative denoising strategies have been developed to remove sequences affected by errors from a dataset and retain the actual haplotype sequences present in a sample (Eren et al., 2015; Edgar & Flyvbjerg, 2015; Callahan et al., 2016; Amir et al., 2017). Figure based on Figure S1 in Callahan et al. 2016.
Figure S2: Overview of the haplotyping strategy used here and their implementation in the JAMP R package
Detailed bioinformatic processing of metabarcoding to extract haplotype sequences using the JAMP R package. A) Metabarcoding raw data is processed and quality filtered. These steps are integrated in JAMP, but most other standard metabarcoding pipelines could be used as well. B) The processed and quality filtered samples from step A would be usually clustered into operational taxonomic units, but are here additionally filtered (retaining reads of only the expected amplicon length and discarding reads of low abundance) and then denoised. C) In denoising with usearch unoise3 the strictness of denoising is controlled by the alpha value (low alpha = less noise, however more true haplotypes get discarded). D) The denoised reads (=haplotypes) are clustered into OTUs grouped by similarity and the abundance of each haplotype for each sample is exported in a table. E) The haplotype table is additionally filtered using different thresholds, to reduce the presence of low abundant OTUs and haplotypes and increase data reliability. F) The final filtered haplotype table can be used for phylogeographic and population genetic analysis.
Figure S3: Effect of different quality filtering (max ee) on reads of the single species mock sample
Effect of different expected error filtering thresholds on haplotype recovery (no denoising applied). All filtered reads are mapped against the expected haplotypes (black circles). Not all reads are shared between both replicates (indicated by A or B instead of a circle). The 15 expected haplotypes are shown in black, while unexpected ones are highlighted in gray or blue. Error bars show the standard deviation of relative read abundance between both replicates, for the respective haplotype.
Figure S4: Effect of different alpha values in read denoising of the single-species mock sample
Effect of different haplotype recovery of in the single species mock sample, when using different alpha values with Unoise3 (as integrated in the JAMP package). Not all reads are shared between both replicates (indicated by A or B instead of a circle). The 15 expected haplotypes are shown in black, while unexpected ones are highlighted in gray or blue. Error bars show the standard deviation of relative read abundance between both replicates, for the respective haplotype.
Figure S5: Bar plots of haplotype distribution within each OTU
Bar plots showing the haplotype composition of all 199 OTUs obtained with the BF2+BR2 primer combination. The OTU number is indicated above each bar, with the four taxa shown in Figure 2 being highlighted. Haplotypes are shown in different colours, with white bars indicating the proportion of sites where the respective OTU was not detected. Most OTUs were only present at a few sample sites.
Figure S6: Detailed plots of four taxa from the denoised multi-species monitoring samples, showing haplotype maps & networks, similarity between replicates and sequence alignments
Figure S6: Detailed haplotype maps, networks and sequence alignment for all 4 primer combinations and replicates of selected taxa. a) Haplotype maps for both replicates for each of the four primer combinations. For A. aquaticus only the 10 most common haplotypes are shown in different colours (remaining ones in white). For each primer combination, the haplotypes in the map and network have the same corresponding colours. b) Haplotype networks for each primer pair. Each cross line represents one base pair difference between the respective haplotypes. Haplotypes present in just one replicate are indicated by A or B next to the network node. Dashed lines around a circle indicate novel haplotypes that were not available in the BOLD reference database. c) Quantification of similarity between both replicates, by plotting abundance of individual haplotypes of each sampling point against each other. The red line indicates the best fit (with significance and adjusted R2 value given in each plot). d) Sequence alignment of all haplotypes, with mismatching nucleotides between sequences highlighted (green = T, red = A, yellow = G and blue = C). See the following pages for example plots of: Page 2: Taeniopteryx nebulosa Page 3: Hydropsyche pellucidula Page 4: Oulimnius tuberculatus Page 5: Asellus aquaticus