All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
As you can see, the reviewer was satisfied by your responses to the critiques and by revisions. Therefore, I am pleased to let you know that your amended manuscript is acceptable now.
[# PeerJ Staff Note - this decision was reviewed and approved by Paula Soares, a PeerJ Section Editor covering this Section #]
Background provided, literature referenced, clear writing
Recommendations were worked on and methods described in details
The conclusion have been supported by the results
Recommendations have been worked on.
As you can see, the reviewer still thinks that the manuscript has some linguistic issues and requires additional editorial work. Please address these remaining concerns and make sure that the manuscript is edited by professional editors or fluent English speakers.
[# PeerJ Staff Note: The Academic Editor has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title) #]
Background context and literature reference have been provided.
Including more details and flowchart presented helps in better understanding
The conclusion have been supported by the results
The manuscript is more structured and issues have been addressed. However few sections in the manuscript lacks correct sentence formation and is sometimes confusing. Working on improving those is recommended.
Please address the critiques of both reviewers and revise the manuscript accordingly.
[# PeerJ Staff Note: Please ensure that all review comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. It is a common mistake to address reviewer questions in the response letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the response letter. Directions on how to prepare a response letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]
No comment
1 For the survey method part, I don’t think it is a fair comparison of the mentioned tools, since the benchmark set CAFA3 was published in 2019, the DeepGOplus was trained using more recent data released in 2020, and other tools, such as PFP and PANNZER2 were trained using different datasets (training data was not mentioned in the manuscript). The results of DEEPred were extracted from the DEEPred paper, the training data was still not clear. Because deep-learning models are easy to get the over-fitting problem, the sequence similarity between the training data and the benchmark set will strongly affect the performance. It is not clear whether the sequences in CAFA3 have been used in training those deep-learning methods, the performance may be overestimated for DeepGoplus and DEEPred. Since DeepGoplus and DEEPred both provide source code, they should be re-trained on the same dataset. Because of such a problem, the conclusions are not well supported. I suggest the authors do more strict experiments or discuss more on the performance evaluation.
2 For the machine-learning-based method, there lacks a clear problem formulation. For example, is it a multi-class classification problem for all the GO terms? Or is it a binary-class classification problem for each GO term? This is also related to the performance evaluation. The metrics used to evaluate the prediction performance, such as precision and recall, are designed for binary-class predictions, how to assign a single score per method?
1 Why only sequence-based methods were selected and compared in this paper? The readers expect to see the comparisons of deep-learning methods as much as possible.
2 As pointed out by the authors, one challenge of protein function prediction is the imbalanced GO classes and the multi-label problem, but what is the current state of the field regarding to these challenges? Do the authors have any insights in coping with these challenges?
In this review paper, the authors briefly reviewed the conventional approaches and focused on the review of recent deep-learning-based methods for protein function prediction. They presented an overview of current automated protein function prediction methods and conducted a mini comparison of several available tools. And they finally highlighted the challenges of the field. I have two other major concerns that are listed below:
1 Except for the deep-learning methods reviewed in this paper, several more advanced and recently published methods should be reviewed. Such as one in https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab198/6182677 that applied transformer and one in https://arxiv.org/abs/2007.12804 that applied GNN.
2 The data used as benchmark is not clearly explained in the Data section. I still don’t understand what does the NK (no-knowledge) and LK (limited-knowledge) mean. Does it mean the sequence has no or limited public annotations but was fully labeled first in the benchmark dataset? What do the partial mode and fully mode mean? The authors should explicit these two terms. I cannot understand the explanation from the current version from this sentence: “partial mode, for a set of proteins with at least one prediction, and full mode, computed for all benchmark proteins.”
1) Professional English used throughout
2) Background context and literature reference have been provided
Investigation into methods have been performed and results presented.
However, more structured details are required in the explanation of methods. While reading, details seemed confusing.
Conclusion have been supported by the results and overall review.
The review is well written giving an overview of conventional and new methods for protein function prediction.
Recommendation: Structured details are required in the explanation of methods.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.