Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on May 13th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on July 14th, 2025.
  • The first revision was submitted on August 10th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on September 5th, 2025.

Version 0.2 (accepted)

· Sep 5, 2025 · Academic Editor

Accept

Thank you for your valuable contribution.

[# PeerJ Staff Note - this decision was reviewed and approved by Shawn Gomez, a PeerJ Section Editor covering this Section #]

Reviewer 1 ·

Basic reporting

All of my concerns have been addressed in this revised version.

Experimental design

no further comments

Validity of the findings

no further comments

Additional comments

no further comments

Reviewer 2 ·

Basic reporting

With the revisions to the Introduction, the research background and motivation of this study have become clearer. My concern about the necessity of combining RFs and GNNs has been resolved.

Experimental design

The additional supplemental figure demonstrates the robustness of the proposed method. My concerns have been addressed.

Validity of the findings

no comment

Version 0.1 (original submission)

· Jul 14, 2025 · Academic Editor

Major Revisions

Please follow the requests and comments of the reviewers strictly.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 ·

Basic reporting

The authors propose SGCL-DPI, a structure-guided curriculum learning framework that initially utilizes global molecular insights captured by RF models to guide and gradually refine the structural pattern learning of graph neural networks (GNNs), improving the accuracy of drug-protein interaction predictions. SGCL-DPI adopts a curriculum learning strategy where the RF teacher model provides initial high-level prediction guidance for the GNN student model, with training focus gradually shifting from RF to GNN. The training objective integrates three components: binary cross-entropy for correct interaction classification, knowledge distillation to align GNN outputs with RF predictions, and a structural consistency term to maintain similarity-based relation patterns in learned representations. SGCL-DPI was evaluated on two benchmark datasets: BindingDB and a challenging suture-derived dataset. On the BindingDB dataset, SGCL-DPI effectively combines classical machine learning and deep learning approaches for DPI prediction. By integrating knowledge based on global descriptors with graph-based structural learning, the proposed framework significantly improves prediction accuracy and generalization capability on challenging interaction prediction tasks, pointing to a promising direction for future DPI prediction research.

Regarding related works, many recent publications on the topic of drug-protein interaction prediction were not discussed. Authors may read some recent works to present a more comprehensive literature review.

Experimental design

1. The paper proposes a structure-guided curriculum learning framework (SGCL-DPI), but lacks sufficient theoretical explanation as to why this particular curriculum learning strategy is suitable for drug-protein interaction prediction tasks. It does not clearly articulate the theoretical basis for the knowledge transfer between the Random Forest (RF) model and the Graph Neural Network (GNN), or why there should be a gradual transition from RF guidance to structural learning as training progresses. The authors should provide a more in-depth theoretical analysis, explaining the connection between the curriculum learning strategy and the characteristics of the drug-protein interaction domain, and demonstrate the necessity and effectiveness of this step-by-step learning approach.

2. Despite the authors emphasizing the simplicity and potential interpretability advantages of their method compared to multi-modal approaches, the paper does not provide substantial model interpretation analysis. For the field of drug discovery, understanding the reasons behind model predictions is crucial. The authors should add visualization analyses to demonstrate how the model utilizes graph structures for information propagation, as well as the identification of important substructures or interaction patterns. This would enhance its credibility in practical drug discovery applications.

3. The current graph construction method relies on pre-computed similarity measures and fixed K-nearest neighbor graph structures, which may limit the model's ability to capture complex drug-protein relationships. The paper does not explore the impact of different graph construction strategies on model performance. The authors should conduct ablation experiments to compare the effects of different graph construction methods and consider introducing learnable graph structure generation mechanisms that allow the model to dynamically adjust connections between nodes, better reflecting functional similarities.

Validity of the findings

1. The multi-component loss function proposed in the paper has a complex dynamic weight adjustment mechanism. Although ablation experiments indicate that each component contributes, the paper does not deeply analyze the necessity of this complex design, or potential redundancies or conflicts between components. The authors should provide a more detailed analysis of the loss function, including the impact of different weight configurations, and consider whether the loss function design could be simplified without significantly affecting performance, thereby improving model interpretability and training stability.

2. The paper lacks a detailed assessment of computational efficiency and resource requirements, which is crucial for practical applications. The authors should add a discussion about the application of the model in real drug discovery environments, including possible adaptation strategies and limitations, and provide more detailed computational performance analysis to enable readers to evaluate its practicality.

Additional comments

Limitations of the proposed model should be discussed. Regarding future work, authors should discuss the possibility of applying advanced graph neural networks for improved performance.

Reviewer 2 ·

Basic reporting

The Introduction mainly focuses on recent developments in GNNs, without addressing DPI prediction using Random Forest or the comparison between Random Forest and deep GNNs. It should more clearly explain why it is necessary to combine Random Forest with GNN, and why Random Forest alone is not sufficient.

Experimental design

Methods are described with sufficient detail. Experimental settings are adequately described as well.

Validity of the findings

The proposed method claims to achieve a good balance between precision and recall, but this seems to be a rather weak contribution. Given that the proposed method is outperformed by the RF baseline and Watanabe et al.'s Molecular model in terms of AUC-PR, it is possible that even with threshold adjustment, the F1 score of the proposed method would still be lower. As a defense against this point, the authors state in line 650 that "the proposed graph-based model delivers more robust and comprehensive predictions, identifying many more true interactions while maintaining reasonable precision." However, to substantiate this claim, it would be necessary to demonstrate that the proposed method achieves a similarly good balance of precision and recall on other dataset sources or different splits, or to show that simple threshold adjustments do not generalize well across different dataset settings.

Additional comments

1. In Table 1, AUC (%) should be AUC-ROC (%)

2. It is better to mention in the caption of Table 2 that this is an evaluation on the hard set.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.