A new unsupervised binning method for metagenomic dataset with automated estimation of number of species
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Computational Science
- Keywords
- metagenomics, unsupervised binning, improved fuzzy c-means, partition coefficient
- Copyright
- © 2015 Liu et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. A new unsupervised binning method for metagenomic dataset with automated estimation of number of species. PeerJ PrePrints 3:e839v1 https://doi.org/10.7287/peerj.preprints.839v1
Abstract
One necessary step of metagenome analysis is to assign sequences to classes according to their taxonomic origins. Unsupervised binning method is one of the two binning categories. However existing unsupervised binning methods yield to estimate the species number automatically and accurately. In this paper, a new unsupervised binning method based on an improved fuzzy c-means method (iFCM) is presented for metagenomic dataset. First, a range of the number of bins is obtained by the relationship among sequencing depth, number of reads and average read length. Secondly, iFCM algorithm is implemented several times with the initial number of bins in this range. Finally, the number of bins is determined by a clustering validity, modified partition coefficient. Experimental results show that this method is an effective unsupervised binning method for metagenomic dataset and could estimate the species number more accurately than MetaCluster3.0 and AbundanceBin.
Author Comment
This paper has been submitted to PeerJ for review.