Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on June 22nd, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on July 30th, 2025.
  • The first revision was submitted on August 13th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on October 3rd, 2025.

Version 0.2 (accepted)

· Oct 3, 2025 · Academic Editor

Accept

Thank you for your valuable contribution.

[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]

Reviewer 2 ·

Basic reporting

-

Experimental design

-

Validity of the findings

-

Version 0.1 (original submission)

· Jul 30, 2025 · Academic Editor

Major Revisions

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** When you prepare your next revision, please either (i) have a colleague who is proficient in English and familiar with the subject matter review your manuscript, or (ii) contact a professional editing service to review your manuscript. PeerJ can provide language editing services - you can contact us at [email protected] for pricing (be sure to provide your manuscript number and title). – PeerJ Staff

Reviewer 1 ·

Basic reporting

I thank the author for providing the raw data with comprehensive results. However, several corrections should be made to improve reporting.

1. The first paragraph of the literature review seems repetitive of the introduction section. Please ensure such redundancy doesn’t appear in the manuscript.

2. Tables and Figures should be mentioned in the text before presenting them in the manuscript to enhance the user’s understanding. Please mention the Tables and Figures in the text. (eg. Table 1 presents the existing studies on KOA Classification using radiography images).

3. Table 1 should contain all studies discussed in the literature review section. Add abbreviations used in the table under the footnote.
Example: Support Vector Machine (SVM).

4. Overall reporting needs fixing for readability. The tables are not clear, especially with long explanations below them. I suggest the authors retain explanations in the main text instead of below the tables.

Experimental design

1. Avoid using the term “novel” in the study if the methodology only adapts existing techniques.

2. Table 4 should provide a detailed description of the original dataset.
How many images are there for each Kellgren-Lawrence grade?
The statement, “The dataset maintains balanced class distribution across grades 0–4, with each grade represented by approximately 2,650 samples after augmentation,” appears inconsistent with the information in the table. Please align.

3. The sentence, “The trained DenseNet201+ feature extractor generates 512-dimensional deep feature vectors for each input,” should be accompanied by a diagram or clearer description showing how the features are combined with hybrid ML classifiers.

4. The Data Augmentation section should be moved under “Dataset Preparation and Augmentation Strategy” for better structure.

Validity of the findings

1. For Table 9: How do you determine which comparison to perform? The significance level asterisk (*) is redundant. For the reliability metrics, which model prediction is being compared? There is no explanation regarding the findings from Table 9. Please rewrite this section with better clarity.

2. The information presented in Table 10 does not align with Figure 3. Please check for consistency and provide context in the text.

3. The section “ROC Curve analysis and Discriminative Performance” should be moved under “Per-class Performance Analysis.” Currently, it appears oddly placed after all result sections, despite discussing only the best classical model.

4. Table 5: What does DenseNet201 + Direct mean? Ensure representations are understandable by readers from different backgrounds. It appears that both single-point testing and cross-validation were used. Clarify what CV Mean represents, is it accuracy, F1-score, etc.?

5. In the Feature Importance Analysis and Interpretability section, the analysis is done using Random Forest rather than SVM. Please clarify or justify this choice.

6. In Class Distribution Analysis and Model Bias Assessment: Which model is being assessed? Why is the true prediction 600 for each class? Is this from original or synthetic data?

The flow of methodology and results is vague and unclear. Please improve clarity and cohesion.

Additional comments

The impact and novelty of this work are not well described. There are many existing studies on similar topics. Please clarify the specific research contributions and what gap in the literature this study addresses.

Reviewer 2 ·

Basic reporting

This article examines whether machine learning can accurately assess the degree of arthritis using the Kellgren–Lawrence scale, which can be a rather subjective task. This scale is based on three different aspects: sclerosis, osteophytes, and joint space assessment. This problem is difficult for machine learning; for example, the paper "Automated determination of hip arthritis on the Kellgren–Lawrence scale in pelvic digital radiograph scans using machine learning" indicated that neural networks only accurately assess extreme conditions, based on joint space. It is worth discussing why this issue was not addressed in this paper.

Experimental design

Another aspect worth clarifying is the question of whether the group of patients was examined in a homogeneous manner - i.e., in a lying or standing position (differences due to the load, important for the joint space).

Validity of the findings

-

Additional comments

Minor remarks:
- There is only one author, a representative of computer science. I think it would be better to use the impersonal form rather than 'we'.
- ln 153-154. What do you mean by "five clinically relevant preprocessing configurations"?
- eq. 10, what does A stand for?
- Fig. 2 is illegible. Could you please make the font larger?
- What fi and n stand for in the equation. 13-16, 18?
- ln 264 shear is in Celsius?
- eq. 21-23, could you define precision, recall, TRP, FPR, and d()

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.