Prediction of deleterious nonsynonymous SNPs by integrating multiple classifiers – An application to neurodegenerative diseases
Author and article information
Abstract
In this study, we propose a logistic regression model to classify deleterious missense mutation from a list of nonsynonymous SNPs (nsSNPs) – where multiple features (i.e. rank scores of 18 classifiers e.g. SIFT, PolyPhen2, MutationTaster, MutationAssessor, FATHMM, VEST 3.0, RadialSVM, LR, CADD, etc. from dbNSFP v2.5) are combined for 44,702 UniProt human polymorphisms and disease mutations (19,033 disease and 25,669 neutral). The model is trained and validated on 80% of the data (15,226 disease + 20,535 neutral nsSNPs), and tested on remaining 20% (3,807 disease + 5134 neutral nsSNPs); and finally applied to a neurodegenerative disease-specific dataset (NeuroTest) from UniProt. The ROC AUC of the model is 0.97 on test set and 0.92 on NeuroTest dataset, with an accuracy of 0.91 and 0.86, respectively. Our model outperforms SIFT, PolyPhen2, MutationTaster, MutationAssessor, and the two ensemble classifiers of dbNSFP v2.5, on both the testing sets.
Cite this as
2015. Prediction of deleterious nonsynonymous SNPs by integrating multiple classifiers – An application to neurodegenerative diseases. PeerJ PrePrints 3:e994v1 https://doi.org/10.7287/peerj.preprints.994v1Author comment
This is a submission to PeerJ PrePrints for review.
Sections
Additional Information
Competing Interests
The author declares they have no competing interests.
Author Contributions
Md Mesbah-Uddin conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Funding
The author declares there was no funding for this work.