Prediction of deleterious nonsynonymous SNPs by integrating multiple classifiers – An application to neurodegenerative diseases

Md Mesbah-Uddin

doi:10.7287/peerj.preprints.994v1

Prediction of deleterious nonsynonymous SNPs by integrating multiple classifiers – An application to neurodegenerative diseases

Md Mesbah-Uddin

Department of Biochemistry, Faculty of Science, King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia

DOI: 10.7287/peerj.preprints.994v1

Published: 2015-04-21
Accepted: 2015-04-21

Subject Areas: Bioinformatics, Computational Biology
Keywords: Neurodegenerative Disease, Nonsynonymous SNP, logistic regression, Mendelian disease, UniProt, missense mutation, ROC AUC, deleterious SNP

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Mesbah-Uddin M. 2015. Prediction of deleterious nonsynonymous SNPs by integrating multiple classifiers – An application to neurodegenerative diseases. PeerJ PrePrints 3:e994v1 https://doi.org/10.7287/peerj.preprints.994v1

Abstract

In this study, we propose a logistic regression model to classify deleterious missense mutation from a list of nonsynonymous SNPs (nsSNPs) – where multiple features (i.e. rank scores of 18 classifiers e.g. SIFT, PolyPhen2, MutationTaster, MutationAssessor, FATHMM, VEST 3.0, RadialSVM, LR, CADD, etc. from dbNSFP v2.5) are combined for 44,702 UniProt human polymorphisms and disease mutations (19,033 disease and 25,669 neutral). The model is trained and validated on 80% of the data (15,226 disease + 20,535 neutral nsSNPs), and tested on remaining 20% (3,807 disease + 5134 neutral nsSNPs); and finally applied to a neurodegenerative disease-specific dataset (NeuroTest) from UniProt. The ROC AUC of the model is 0.97 on test set and 0.92 on NeuroTest dataset, with an accuracy of 0.91 and 0.86, respectively. Our model outperforms SIFT, PolyPhen2, MutationTaster, MutationAssessor, and the two ensemble classifiers of dbNSFP v2.5, on both the testing sets.

Author Comment

This is a submission to PeerJ PrePrints for review.