A new unsupervised binning method for metagenomic dataset with automated estimation of number of species

College of Communication Engeneering, Jilin University, Changchun, Jilin Province, China
DOI
10.7287/peerj.preprints.839v1
Subject Areas
Bioinformatics, Computational Biology, Computational Science
Keywords
metagenomics, unsupervised binning, improved fuzzy c-means, partition coefficient
Copyright
© 2015 Liu et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Liu Y, Hou T, Fu L. 2015. A new unsupervised binning method for metagenomic dataset with automated estimation of number of species. PeerJ PrePrints 3:e839v1

Abstract

One necessary step of metagenome analysis is to assign sequences to classes according to their taxonomic origins. Unsupervised binning method is one of the two binning categories. However existing unsupervised binning methods yield to estimate the species number automatically and accurately. In this paper, a new unsupervised binning method based on an improved fuzzy c-means method (iFCM) is presented for metagenomic dataset. First, a range of the number of bins is obtained by the relationship among sequencing depth, number of reads and average read length. Secondly, iFCM algorithm is implemented several times with the initial number of bins in this range. Finally, the number of bins is determined by a clustering validity, modified partition coefficient. Experimental results show that this method is an effective unsupervised binning method for metagenomic dataset and could estimate the species number more accurately than MetaCluster3.0 and AbundanceBin.

Author Comment

This paper has been submitted to PeerJ for review.