Testing the potential of a ribosomal 16S marker for DNA metabarcoding of insects

Dept. Animal Ecology, Evolution and Biodiversity, Ruhr University Bochum, Bochum, Germany
Aquatic Ecosystem Research, University of Duisburg-Essen, Essen, Germany
Laboratoire d'Ecologie Alpine (LECA), CNRS, Grenoble, France
Laboratoire d'Ecologie Alpine (LECA), Univ. Grenoble Alpes, Grenoble, France
SPYGEN, Savoie Technolac, Le Bourget du Lac, France
Lab Interdisciplinaire des Environnements Continentaux (LIEC), Université de Lorraine, Metz, France
Ecole Natl Genie Eau & Environm Strasbourg, Strasbourg, France
UMR CNRS 7362, Univ Strasbourg, Strasbourg, France
Centre for Water and Environmental Research (ZWU) Essen, University of Duisburg-Essen, Essen, Germany
DOI
10.7287/peerj.preprints.1855v1
Subject Areas
Biodiversity, Conservation Biology, Genetics, Molecular Biology, Zoology
Keywords
Biodiversity assessment, stream monitoring, small ribosomal subunit, high throughput sequencing
Copyright
© 2016 Elbrecht et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Elbrecht V, Taberlet P, Dejean T, Valentini A, Usseglio-polatera P, Beisel J, Coissac E, Boyer F, Leese F. 2016. Testing the potential of a ribosomal 16S marker for DNA metabarcoding of insects. PeerJ Preprints 4:e1855v1

Abstract

Cytochrome c oxidase I (COI) is a powerful marker for DNA barcoding of animals, with good taxonomic resolution and a large reference database. However, when used for DNA metabarcoding, estimation of taxa abundances and species detection are limited due to primer bias caused by highly variable primer binding sites across the COI gene. Therefore, we explored the ability of the 16S ribosomal DNA gene as an alternative metabarcoding marker for species level assessments. Ten bulk samples, each containing equal amounts of tissue from 52 freshwater invertebrate taxa, were sequenced with the Illumina NextSeq 500 system. In comparison to COI, the 16S marker amplified more insect species and amplified more equally, probably due to decreased primer bias. Rough estimation of biomass might thus be less biased with 16S than with COI. According to these results, the marker choice depends on the scientific question. If the goal is to obtain a taxonomic identification at the species level, then COI is more appropriate due to established reference databases and known taxonomic resolution of this marker, knowing that a greater proportion of species will be missed using COI Folmer primers. If the goal is to obtain a more comprehensive survey in a context where it is possible to build a local reference database, the 16S marker could be more appropriate.

Author Comment

First preprint of our 16S metabarcoding manuscript. Will be submitted for review and publication at PeerJ shortly. However feel free to provide feedback and your toughs in the comments, we really appreciate it and will try to incorporate your additional reviews / comments!

Supplemental Information

16S fusion primers used in this study

Figure S1. 16S Fusion primers developed in this study. They include flow cell and sequencing primer binding regions for current Illumina sequencers. The amplified fragment has a size fo ~157 bp and can be sequenced directly after purification (one step PCR). Up to 10 samples can be uniquely tagged from forward and reverse direction and pooled in one NextSeq run. The bases used for shifting on Ins_F and Ins_R can be used to uniquely tag samples (inline barcodes). It is recommended that all 10 primer pairs are used in the following combination to maximize sequence diversity and reduce effects of tag switching by uniquely tagging samples from both sides: P5_Ins_R0+P7_Ins_F4, P5_Ins_R1+P7_Ins_F3, P5_Ins_R2+P7_Ins_F2, P5_Ins_R3+P7_Ins_F1, P5_Ins_R4+P7_Ins_F0, P5_Ins_F0+P7_Ins_R4, P5_Ins_F1+P7_Ins_R3, P5_Ins_F2+P7_Ins_R2, P5_Ins_F3+P7_Ins_R1, P5_Ins_F4+P7_Ins_R0

DOI: 10.7287/peerj.preprints.1855v1/supp-1

Distribution of reads obtained by NextSeq and number of reads discarded throughout the different bioinformatics processing steps

Figure S2. Number of sequences obtained per sample after library demultiplexing (A) and percentage of sequences excluded in different bioinformatic analysis steps (B). A: Library demultiplexing; Numbers above bars indicate the relative contribution (in percent) to the total number of sequences obtained for each sample. Sequencing started with Ins_F (white) or Ins_R (black) is indicated by bar color. B: Number of reads excluded in data processing steps. Mean percentage of sequence abundance in each processing step is written in brackets. Ins_F / Ins_R primer bias was tested with a t-test.

DOI: 10.7287/peerj.preprints.1855v1/supp-2

Sequence of each OTU with abundance of assigned reads and assigned taxonomy

DOI: 10.7287/peerj.preprints.1855v1/supp-3

Distribution of OTUs across the 52 taxa

DOI: 10.7287/peerj.preprints.1855v1/supp-4

Raw number of reads assigned to each of the 52 taxa for 16S and COI across the 10 replicates

DOI: 10.7287/peerj.preprints.1855v1/supp-5

R scripts used in this study to process sequence data and create plots

DOI: 10.7287/peerj.preprints.1855v1/supp-6