De novo species delimitation in metabarcoding datasets using ecology and phylogeny
- Published
- Accepted
- Subject Areas
- Biodiversity, Ecology, Molecular Biology
- Keywords
- bioinformatics, OTU clustering, metabarcoding, biodiversity, ecological co-occurence
- Copyright
- © 2017 Potter et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. De novo species delimitation in metabarcoding datasets using ecology and phylogeny. PeerJ Preprints 5:e3121v1 https://doi.org/10.7287/peerj.preprints.3121v1
Abstract
Background: Metabarcoding studies allow a wide variety of taxa to be analysed simultaneously in a fraction of the time taken by morphological identification, but currently metabarcoding studies must rely on sequence similarity-based methodologies to delimit operational taxonomic units (OTUs). Similarity-based OTU clustering methodologies can lead to inaccurate estimates of diversity, species’ distributions or responses to change, meaning that there is a critical need for methods to delimit species in metabarcoding datasets.
Methods: We introduce SNAPhy (Species delimitation using Niche And PHYlogeny), a novel approach which utilises ecological and phylogenetic information to delimit de novo OTUs in metabarcoding datasets and avoids the problems associated with current OTU clustering methods. Sequencing reads are first divided into ecological groups based on co-occurrence, thereby reducing data complexity and facilitating the use of evolutionary and phylogenetic models (e.g. BEAST and GMYC) to delimit species-level groupings within discrete ecologically informed phylogenies. The utility of SNAPhy is demonstrated using an 18S rDNA nuclear small subunit (nSSU) dataset representing replicated samples taken along the entire length of an estuarine salinity gradient, and SNAPhy is then compared to existing OTU clustering methods.
Results: All of the OTU clustering methods compared yielded different numbers of OTUs and a different taxonomic distribution of OTUs, which we suggest is due to the taxon differences that are known to exist in the degree of intraspecific divergence. SNAPhy and UCLUST (with a 98% similarity threshold) gave the most plausible numbers of OTUs, especially within the Nematoda. Additionally, the degree of variation within nematode OTUs delimited by SNAPhy lies within the range of variation in deeply metabarcoded individuals.
Discussion: SNAPhy avoids the static clustering threshold problems associated with current OTU clustering methods and instead focuses on genuine biological diversity delimited according to a general lineage species concept. We suggest that the SNAPhy approach should play a crucial role in future sequencing-based biodiversity assessment by providing more accurate estimates of species diversity and distributions than current methods, thereby enabling more accurate impact assessments and better informing managerial decisions.
Author Comment
This version of the manuscript was previously submitted to PeerJ for review, and is currently undergoing major revisions.