All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Thank you for this fascinating work. Very much appreciated!
[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]
The author has solved all my questions. I have no other questions.
The author has solved all my questions. I have no other questions.
The author has solved all my questions. I have no other questions.
The author has solved all my questions. I have no other questions.
no comment
no comment
no comment
no comment
Please address the comments and requests of one reviewer.
The authors have not accurately addressed the reviewer’s comment. I would like to reiterate the following suggestion:
1. Conduct a precise comparison with miRNA–disease association extraction models published in 2023–2024. This should involve comparative experiments with models reported in other studies, rather than only performance evaluations of the proposed module
2. To investigate the generalization ability of the model, the authors need to evaluate the performance of their proposed model in more datasets.
3. The results section of the paper lacks a discussion of biological significance.
4. The novelty of the proposed methodology is not sufficiently highlighted in the paper. Please ensure it is properly emphasized.
4 Literature review is incomplete. The authors should discuss all the latest RNA research.
.
.
The author has solved all my questions. I have no other questions.
The author has solved all my questions. I have no other questions.
The author has solved all my questions. I have no other questions.
The author has solved all my questions. I have no other questions.
The author has addressed my concern.
no comment
no comment
Please address the requests and comments of Reviewer 2 thoroughly.
-
-
-
While the authors have thoroughly argued that the GIP kernel similarity is only interpolated for a very small fraction of missing values and that the overall leakage ratio is extremely low, the analysis remains largely qualitative.
To quantitatively and directly validate the impact of “using the complete MDA matrix to compute GIP” on cross-validation results, we recommend adding a leakage-free control experiment: within each cross-validation fold, temporarily mask (set to zero) all positive-sample associations in the validation fold’s MDA matrix, recompute the GIP kernel similarity, and then evaluate model performance on that fold.
By comparing the AUC/AUPRC differences between the “masked-recomputed GIP” and “original GIP” settings, the authors can precisely quantify the performance inflation caused by leakage, thereby providing stronger evidence that GIP-induced leakage has a negligible effect on the final metrics.
-
-
-
While your manuscript introduces a promising and novel deep learning framework for miRNA–disease association prediction, several important issues must be addressed before it can be considered for publication. First, reviewer concerns regarding terminology and clarity are valid—terms such as “heterogeneous graph transformation” and “heterogeneity map matrix” must be clearly defined, and all abbreviations (e.g., GAT, HGT) introduced properly. The biological rationale for fusing diverse data types (e.g., drugs, microbes) must be explicitly justified, ideally supported by additional ablation studies to demonstrate their individual and combined contributions. Reviewers 2 and 3 both raise the critical issue of potential label leakage when using GIP kernel similarities in cross-validation; this must be transparently addressed and, if necessary, corrected. You are not required to respond to Reviewer 3's objection regarding lack of biological novelty, as your work appropriately focuses on method development and validation. Additionally, Reviewer 2’s point about comparison with more recent models (MHGTMDA, MHCLMDA) is valid and should be incorporated to fairly position your model. Finally, please ensure that your figures are clearer and that your code is publicly available to support reproducibility.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.
.
.
After the authors clearly respond to all comments, the reviewer will reconsider acceptance.
1. Describe the hyperparameter details thoroughly in the paper.
2. Prove the superiority of the proposed model through performance evaluation with other models.
3. To investigate the generalization ability of the model, the authors need to evaluate the performance of their proposed model on more datasets.
4. The results section of the paper lacks a discussion of biological significance.
5. The novelty of the proposed methodology is not sufficiently highlighted in the paper. Please ensure it is properly emphasized.
6. The literature review is incomplete. The authors should discuss all the latest RNA research.
7. The authors should revise their English writing carefully and eliminate small errors in the paper to make the paper easier to understand.
1. The paper claims to fuse multisource data (e.g., miRNA sequences, disease semantics, GIP kernels), but lacks clarity on how these heterogeneous features are weighted or combined. For instance, lines 108121 describe a "dual channel" approach but omit technical details on feature fusion (e.g., concatenation, attention-based merging).
2. While the model outperforms older methods (e.g., EDTMDA, ABMDA), it does not compare with recent state-of-the-art models like MHGTMDA (Zou et al., 2024) or MHCLMDA (Peng et al., 2024), which also use multisource data.
References:
[1] Zou H, Ji B, Zhang M, et al. MHGTMDA: Molecular heterogeneous graph transformer based on biological entity graph for miRNA-disease associations prediction[J]. Molecular Therapy-Nucleic Acids, 2024, 35(1).
[2] Peng W, He Z, Dai W, et al. MHCLMDA: multihypergraph contrastive learning for miRNA–disease association prediction[J]. Briefings in bioinformatics, 2024, 25(1): bbad524.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful.
3. Negative samples are defined as "unknown" associations; the authors randomly select negative samples (line 328), but no evidence is provided that these pairs are truly noninteracting. Risk of false negatives contaminating training data, leading to inflated performance metrics. Do the authors consider the problem of false negative samples and alleviate the impact of this problem on prediction performance? It is recommended that the authors select reliable negative samples by considering miRNA and disease similarity instead of random selection.
4. The GAT module uses 10 attention heads (line 264), but the rationale for this choice is missing. It is recommended that the authors conduct a performance evaluation of hyperparameter selection to determine the optimal number of attention heads.
5. Did the author encounter a label leakage issue when using the MDAs matrix to calculate GIP kernel similarity during cross-validation? Specifically, through removing the association relationships of the positive sample pairs to recalculate GIP kernel similarity in the cross-validation process.
6. It is suggested that the author provide complete and reproducible code and upload it to GitHub for evaluating the accuracy and rationality of the author's experiment.
This paper presents a method for predicting miRNA-disease associations based on LSTM and GAT. However, several concerns diminish the Reviewers' interest in this work:
1. The manuscript contains numerous technical inaccuracies and undefined terminology. For example, terms like "heterogeneous graph transformation" and "heterogeneity map matrix" appear to be incorrectly used, and essential abbreviations such as "HGT" and "GAT" are not defined, despite potentially being familiar to the field.
2. The authors incorporate multiple data types to construct their heterogeneous graph, so what types of edges are included in this heterogeneous graph? More critically, the utility of fusing these diverse data sources is not demonstrated through ablation studies.
3. Does it make sense to fuse these data at the level of biological mechanisms? For example, why do we need to fuse drug data?
4. The flowcharts and diagrams are poorly designed and convey little meaningful information. Key details such as input specifications for each processing channel are unclear. In particular, the diagrams are very vague.
5. The use of LSTMs for concatenated features lacks proper justification, as LSTMs are typically designed for sequential data. The contribution of this component is not validated through ablation experiments.
6. Authors should describe relevant methods in the context of a specific MDA prediction task rather than simply describing the algorithmic process.
7. From the authors' ablation experiments, it can be seen that the effect of many modules provides minimal improvement (approximately 0.005). What is the point of stacking these modules, and a lot of time efficiency has to be spent?
8. The authors calculate miRNA/disease GIPK similarities using known MDA associations, but it remains unclear whether these associations were properly excluded during 5-cv to prevent information leakage when recalculating similarities.
9. The advantages of multi-source data integration are not convincingly demonstrated, as the proposed method shows only marginal improvements over baseline approaches.
no comment
no comment
no comment
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.