All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Thank you for addressing the editorial suggestions. Your manuscript has been accepted in PeerJ.
[# PeerJ Staff Note - this decision was reviewed and approved by Paula Soares, a PeerJ Section Editor covering this Section #]
Thanks for addressing all the technical issues pointed out by the reviewers. I propose several editorial suggestions in the attached PDF as tracked changes; please review them and consider including them in a final revised version.
Please attend to the suggested improvements to the manuscript.
Li et al curated a balanced dataset and developed a cancer driver missense mutation predictor (CDMPred) employing feature selection through the ensemble learning technique XGBoost. The AUC values of CDMPred on the training and independent test sets are over 0.8, and the predictor showed superior performance compared to various state-ofthe-art methods.
The dataset and model construction are meaningful and could help dig out the features behind the principles of mutations.
The results are statistically reasonable, the manuscript is well written and the conclusion is consistent with their results.
1. Some previous important methods for the prediction of driver mutation are missing in their introduction (Am J Hum Genet. 2017, 100(1): 5-20; Nucleic Acids Res. 2019, 47: 315-321; Nucleic Acids Res. 2023, 51: 129-133)
2. Perspective prediction in the manuscript is absent, it is better to show or discuss the ability of this method in real cases.
1. The paper should include more references on the most recent studies.
2. The font style is not consistent throughout the paper. It can be fixed for better readability.
1. It is better to provide more details on the feature engineering process and the rationale behind the selected features.
2. The data preprocessing description such as how to control confounding variables can be better elaborated.
1. The paper should discuss the potential limitations of the current study, such as the possibility of biases in the curated datasets, or the lack of exploration of machine learning techniques.
In this papers the authors present CDMPred, a new predictor of cancer-associated mutations, which unlike previous predictors was trained to differentiate cancer-associated from "passenger" (non-cancer-related) mutations, using the Cancer Passenger Mutation database (dbCPM) as a source of natural passenger mutations (they do discuss the fact that "passenger mutations" might not be a clearly-cut concept).
I found the paper well written, succinct, useful (the model compares favorably or equivalently to previous models) and interesting (in particular the identification of key mutation features that determine the predictor's calling). The Supplementary sections provide detailed descrip
My only remark is that the paper is focused on model building and comparison, and could have benefited from a discussion of the top 10 features identified during the predictor's training/analysis. Are these top 10 features surprising? Do they correspond to known cancer mechanisms?
The experimental seems sound to me. The authors tested many different ML models to ensure they built the best possible model (they do mention ensembling as a potential next step in the discussion), then they compare CDMpred to pre-existing methods, using well-established classification metrics (AUC).
The findings seem valid.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.