A new unsupervised binning method for metagenomic dataset with automated estimation of number of species
Author and article information
Abstract
One necessary step of metagenome analysis is to assign sequences to classes according to their taxonomic origins. Unsupervised binning method is one of the two binning categories. However existing unsupervised binning methods yield to estimate the species number automatically and accurately. In this paper, a new unsupervised binning method based on an improved fuzzy c-means method (iFCM) is presented for metagenomic dataset. First, a range of the number of bins is obtained by the relationship among sequencing depth, number of reads and average read length. Secondly, iFCM algorithm is implemented several times with the initial number of bins in this range. Finally, the number of bins is determined by a clustering validity, modified partition coefficient. Experimental results show that this method is an effective unsupervised binning method for metagenomic dataset and could estimate the species number more accurately than MetaCluster3.0 and AbundanceBin.
Cite this as
2015. A new unsupervised binning method for metagenomic dataset with automated estimation of number of species. PeerJ PrePrints 3:e839v1 https://doi.org/10.7287/peerj.preprints.839v1Author comment
This paper has been submitted to PeerJ for review.
Sections
Additional Information
Competing Interests
The authors declare they have no competing interests.
Author Contributions
Yun Liu conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper.
Tao Hou performed the experiments, contributed reagents/materials/analysis tools, prepared figures and/or tables.
Liu Fu conceived and designed the experiments, analyzed the data, prepared figures and/or tables, reviewed drafts of the paper.
Funding
This work has been supported by National Natural Science Foundation of China (NSFC), grant No. 51105170. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.