Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on June 24th, 2025 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on August 11th, 2025.
  • The first revision was submitted on September 16th, 2025 and was reviewed by 3 reviewers and the Academic Editor.
  • A further revision was submitted on October 17th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on October 27th, 2025.

Version 0.3 (accepted)

· · Academic Editor

Accept

Thank you for addressing the requests and suggestions of the reviewers.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

Reviewer 1 ·

Basic reporting

Addressed all comments.

Experimental design

Addressed all comments.

Validity of the findings

Addressed all comments.

Additional comments

no

Version 0.2

· · Academic Editor

Minor Revisions

Please address the points raised by Reviewers 1 & 3.

Reviewer 1 ·

Basic reporting

Overall, the paper is promising but needs clearer writing, a sharper introduction, better synthesis of related work, and more detailed figure/table explanations to fully meet basic reporting standards.

Experimental design

The study is well designed and methodologically sound, but it needs clearer justification of dataset splits, stronger discussion of preprocessing choices, inclusion of statistical validation, and broader evaluation to fully meet experimental design standards.

Validity of the findings

The findings are well supported, but the paper needs statistical validation, broader dataset testing, clearer discussion of limitations, and stronger emphasis on generalizability to fully meet validity standards.

Additional comments

The study is timely and relevant, with strong potential impact. Improving clarity, expanding figure captions, refining language, and positioning the contribution more clearly against recent state-of-the-art work would make the paper more accessible and impactful.

Reviewer 2 ·

Basic reporting

As suggested review points covered by author.

Experimental design

As suggested review points covered by author.

Validity of the findings

As suggested review points covered by author.

Additional comments

As suggested review points covered by author.

Reviewer 3 ·

Basic reporting

The manuscript demonstrates a significant improvement in clarity and structure compared to the initial submission. The language is generally clear and professional, though minor grammatical inconsistencies persist in a few places. The introduction now effectively sets the context, clearly states the motivation, and outlines the paper's contributions. The related work section has been successfully restructured into thematic groups, which greatly enhances its coherence and readability. The overall organization conforms to the journal's standards, and the logical flow from introduction to conclusion is sound.

Experimental design

The article's content is well within the aims and scope of PeerJ Computer Science. The experimental design is rigorous, and the methodology is described with sufficient detail to allow for replication, supported by the provided code repository. The data splitting strategy (majority-class label-skew) is now clearly explained, and the use of a fixed random seed enhances reproducibility. The data augmentation techniques are appropriate for medical imaging, and the evaluation metrics are well-chosen. A point for further clarification is the specific rationale behind the chosen augmentation parameter ranges (e.g., shear range ±20%); linking these choices more explicitly to common practices in medical image analysis would strengthen this section.

Validity of the findings

The experiments are well-executed, and the results robustly support the conclusions. The findings demonstrate that the proposed framework, particularly with FedProx (μ=1.0), effectively addresses the challenges of extreme non-IID data. The use of Jensen-Shannon divergence to quantify heterogeneity is a strong point. The conclusions are appropriately stated and limited to the supporting results. The authors have adequately discussed the performance gap between IID and non-IID settings and have identified relevant limitations and future directions. It is accurate to state that μ=1.0 performed best in this specific experimental setup; the text correctly avoids presenting it as a universal optimum.

Additional comments

This is a timely and technically sound study that makes a valuable contribution to the field of federated learning for medical image analysis. The combination of FedProx with a targeted data augmentation strategy is practical and well-validated. The authors have been highly responsive to the reviewers' feedback, and the revisions have substantially improved the manuscript.
Minor Suggestions for Improvement:
Consider adding a sentence or two to justify the chosen ranges for the data augmentation parameters based on preserving clinical relevance in MRI images.
The caption for Figure 4 (proposed architecture) could be slightly more descriptive.
A final careful proofread is recommended to catch any remaining minor language issues.

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

Please thoroughly address all concerns and criticisms of the reviewers.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff

Reviewer 1 ·

Basic reporting

Thank you for submitting this manuscript. The topic is timely, and I appreciate the focus on federated learning with non-IID data in a medical imaging context. However, there are several areas that need improvement under basic reporting.

1. The manuscript needs a careful language and grammar check. Some sentences are long and awkward, making it difficult to follow the flow of ideas. A native English speaker or professional editing service could help improve clarity and readability.

2. The introduction covers useful background, but it feels a bit too lengthy and at times unfocused. It would help to sharpen the motivation and clearly state what the paper adds beyond existing work. Right now, it’s not entirely clear what gap this study is filling.

3. The related work section includes many references, but the way they are presented reads more like a list than a synthesis. It would be stronger if the authors grouped the studies thematically and discussed how their approach compares to or builds on those methods.

4. The structure of the manuscript mostly follows expected norms, but some sections feel repetitive. For example, parts of the methods section repeat information already mentioned in the introduction. This could be streamlined for better flow.

5. Figures and tables are adequate in number, but their captions need to be more informative. Some of them lack enough explanation for a reader to fully understand what is being shown without going back to the main text. Also, a few tables appear without being clearly discussed in the text.

6. Terminology and notation are mostly clear, but some mathematical expressions, especially the use of the proximal term and µ, could be explained more intuitively. A brief reminder of what each term represents, especially after long sections, would improve clarity.

7. The technical idea is promising, but the paper needs significant improvements in writing, organization, and clarity before it can be recommended for publication. I hope the authors will take this feedback constructively and revise the manuscript accordingly.

Experimental design

The experimental design is generally solid and falls within the scope of the journal. The authors have done a good job of explaining the FL setup, including client distribution, training rounds, and model architecture. However, a few aspects need further clarification to improve reproducibility and transparency.

1. The dataset source is clearly mentioned, and the class-wise distribution is reported, which is helpful. That said, the paper would benefit from more clarity on how exactly the IID data was split into non-IID clients. Were the splits randomized, or were certain classes manually assigned to simulate imbalance? A clearer explanation would help readers replicate the setup more accurately.

2. The data preprocessing and augmentation steps are described, and the choice of augmentation techniques seems appropriate for MRI image tasks. Still, it would be helpful to clarify if these augmentations were applied uniformly across all clients or only to the minority classes.

3. The authors introduce FedAvg and FedProx with sufficient detail and equations, which is appreciated. The algorithm implementation and training strategy are well structured, but details about hyperparameter tuning are quite brief. How were the µ values selected? Were they tuned manually, or through some validation approach? A brief justification for this choice would strengthen the methodology.

4. The choice of evaluation metrics is appropriate for a multi-class classification task, and the presentation of accuracy, precision, recall, and F1 scores over multiple rounds is clear. However, the paper could benefit from including standard deviation or confidence intervals to give an idea of result variability.

Validity of the findings

The experiments appear to be well executed, and the evaluation strategy is sound. The results are reported clearly across various µ values, and the performance improvements with FedProx under non-IID conditions are convincingly shown. The use of metrics like accuracy, precision, recall, and F1 score provides a comprehensive view of model behavior.

1. The paper does a good job of tying the results back to the problem statement in the introduction. The challenges of non-IID data in federated settings are consistently discussed throughout the paper, and the findings align well with the goals set out early on.

2. The conclusion section is fairly strong and does identify some limitations, including the trade-off between performance and complexity. The mention of potential future directions, such as blockchain-based secure FL frameworks and incentive mechanisms, is appreciated and adds depth to the discussion. However, the authors could expand a bit more on the specific practical implications of their findings for real-world deployments, especially in medical settings.

3. It would also be helpful to include a more detailed reflection on why performance is significantly better in the IID setting. While this is expected, a short discussion on how to bridge the gap between IID and realistic non-IID scenarios in future work would add more value.

4. The findings are well supported by the data, but a slightly more reflective discussion on practical constraints and limitations would further strengthen the manuscript.

Additional comments

The manuscript introduces a meaningful application of the FedProx algorithm for brain tumor classification under non-IID federated learning conditions. The inclusion of Jensen–Shannon divergence as a measure of heterogeneity is a valuable addition and supports the validity of the non-IID setup. However, the manuscript would benefit from clearer explanations around certain design choices, such as client data partitioning logic and hyperparameter tuning methods. Additionally, the authors should carefully revise the manuscript for language accuracy and consistency, particularly in the introduction and methods sections. Improving the organization of the related work section and providing more concise transitions between sections would enhance readability.

Reviewer 2 ·

Basic reporting

In my opinion, the introduction should contain the detailed background, then include the challenges of the previous literature, the motivation of the work, the objectives of the paper, the contributions, and the paper organization. In the current form, it is missing. The background and challenges should be supported by proper citations. In its current form, it is vague.

Experimental design

1. The literature review is not Proper, please elaborate.

2. In the current form abstract is not systematic. It is suggested to arrange it in the following order: Background, methods used, results achieved, and concluding remarks.

Validity of the findings

1. Explain the proposed model architecture for Brain Tumor Classification (Figure 5)

2. The paper is interesting, but the methods and results should be elaborated in detail. There is a need for the proper explanation and justification of the method used.

Reviewer 3 ·

Basic reporting

(1)Clarity and Professional English Expression
The overall English expression of the manuscript is clear and professional, but minor adjustments can be made to improve precision. For example, in the “Data Splitting” section, the description of “artificially splitting IID data into non-IID subsets” is too general. Precise details need to be supplemented, such as the specific parameters of the Dirichlet distribution-based splitting strategy and relevant code snippets (if applicable). Unify the terminology by changing all “Non-IID” to lowercase “non-IID” to maintain style consistency.
(2)Introduction and Research Background
The “RELATED WORK” section is not comprehensive enough. When comparing FedAvg and FedProx, recent algorithms such as FedNova and FedOPT are omitted, and the adaptability of data augmentation in the federated learning scenario is not deeply explored. Supplementing these contents can make the research background more relevant and comprehensive.
(3)Structure and Logical Coherence
Although the structure conforms to academic norms, its clarity can be optimized. For example, the “Model Selection” section only mentions the performance differences between VGG19 and ResNet50 but fails to explain why such differences occur (such as gradient divergence caused by model complexity). It is recommended to add a sub-section to explicitly relate the model architecture to the adaptability of federated learning, making the logic more coherent.

Experimental design

(1)Method Details for Reproducibility
The “Data Splitting” method lacks sufficient details. To meet the “reproducibility” requirement, the following should be clearly defined:
1)Splitting strategy (such as Dirichlet distribution parameters and key code).
2)The quantification method of non-IID heterogeneity (such as category proportions and JS divergence indicators during the splitting process, rather than just providing post-hoc results).
(2)Discussion on Data Preprocessing
The “Data Augmentation” section describes techniques such as cropping and rotation, but lacks key analysis:
1)How do these augmentations specifically alleviate non-IID bias? (For example, compare the JS divergence before and after augmentation to show how heterogeneity is reduced).
2)Are there trade-offs (such as whether over-augmentation will distort the features of medical images)? Exploring these can add more depth to the method.
(3)Ethical and Technical Rigor
Although there are no issues with sensitive data, it is necessary to confirm whether the best practices of medical AI are followed (such as data anonymization; if clinical data is used, supplement the statement of Institutional Review Board (IRB) approval, if applicable).

Validity of the findings

(1)Conclusions and Scope of Application
The conclusions are supported by data, but they must strictly correspond to the results. For example, the conclusion that “μ = 1.0 is optimal” does not explore its universality (Is μ = 1.0 still optimal for datasets with low heterogeneity? Experiments or explanations can be added to define the boundaries of the conclusion).
(2)Adequacy of Experiments

After testing FedProx + ResNet50, the effectiveness can be further enhanced in the following ways:
1)Compare with more models (such as the lightweight MobileNet) to clarify the advantages of ResNet50 in this federated learning scenario.
2)Conduct data augmentation ablation experiments (remove one augmentation method at a time to separately observe its effect on alleviating non-IID.

Additional comments

To improve the quality of the manuscript, the following revisions are recommended:
(1)Expand the “RELATED WORK” Section
Add 2–3 paragraphs to introduce recent federated learning algorithms and the research on the adaptability of data augmentation in the federated learning scenario.
(2)Clarify Method Details
Provide detailed explanations of data splitting parameters, the impact of augmentation on non-IID indicators, and the basis for selecting the model architecture.
(3)Strengthen Conclusions
Clearly define the scope of application of conclusions such as “μ = 1.0 is optimal” and explicitly relate the conclusions to the experimental scope.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.