Swarm: robust and fast clustering method for amplicon-based studies

Department of Ecology, Technische Universität Kaiserslautern, Kaiserslautern, Germany
CNRS, UMR 7144, EPEP -- Évolution des Protistes et des Écosystèmes Pélagiques, Station Biologique de Roscoff, Roscoff, France
Sorbonne Universités, UPMC Univ Paris 06, UMR 7144, Station Biologique de Roscoff, Roscoff, France
Department of Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway
Department of Informatics, University of Oslo, Oslo, Norway
School of Engineering, University of Glasgow, Glasgow, United Kingdom
DOI
10.7287/peerj.preprints.386v1
Subject Areas
Biodiversity, Bioinformatics, Ecology, Microbiology, Molecular Biology
Keywords
environmental diversity, barcoding, molecular operational taxonomic units
Copyright
© 2014 Mahé et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. 2014. Swarm: robust and fast clustering method for amplicon-based studies. PeerJ PrePrints 2:e386v1

Abstract

Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters' internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units, improving the amount of meaningful biological information that can be extracted from amplicon-based studies.

Supplemental Information

Supplementary File 1 (code and commands used to perform the analyses)

DOI: 10.7287/peerj.preprints.386v1/supp-1