Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Applying vision transformer to assess multi-scale morphological features in mammography for breast cancer detection: multiscale image morphological extraction vision transformer (MIME-ViT)

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on February 19th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
The Academic Editor made their initial decision on April 20th, 2025.
The first revision was submitted on July 23rd, 2025 and was reviewed by 2 reviewers and the Academic Editor.
A further revision was submitted on August 12th, 2025 and was reviewed by the Academic Editor.
The article was Accepted by the Academic Editor on September 8th, 2025.

Version 0.3 (accepted)

Ahmed Elazab · Sep 8, 2025 · Academic Editor

Accept

The authors have revised the manuscript and addressed the reviewers' comments. It should be ready for publication.

[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]

Download Version 0.3 (PDF) Download author's response letter (v0.3) - submitted Aug 12, 2025

Version 0.2

Ahmed Elazab · Aug 4, 2025 · Academic Editor

Minor Revisions

The authors have addressed the reviewers' comments in the revised manuscript. However, one reviewer still has some comments that need to be considered before final acceptance.

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.

YuYao Lin · Aug 2, 2025

Basic reporting

NO PROBLEM

Experimental design

NO PROBLEM

Validity of the findings

NO PROBLEM

Cite this review as

Lin Y (2025) Peer Review #1 of "Applying vision transformer to assess multi-scale morphological features in mammography for breast cancer detection: multiscale image morphological extraction vision transformer (MIME-ViT) (v0.2)". PeerJ Computer Science

Reviewer 2 · Aug 1, 2025

Basic reporting

Well-written with clear and professional English. Recent literature references from 2022-2024 are incorporated.

To further strengthen the background and contextual framework, the authors are encouraged to cite recent comprehensive reviews on AI-based breast cancer detection systems that discuss machine learning, deep learning, and vision transformers (e.g.,[https://link.springer.com/article/10.1007/s11042-024-19620-y]). This would better situate MIME-ViT within the current state-of-the-art.

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.

Additionally, the motivation behind selecting Vision Transformer (ViT) as the baseline for MIME-ViT should be explicitly clarified in the revised manuscript to highlight the rationale and expected advantages guiding the architectural choices.

Experimental design

We appreciate the authors for carefully considering the previous suggestion and adding Table 1 summarizing the key hyperparameters of MIME-ViT.

Validity of the findings

The study presents novel findings with appropriate statistical robustness. The conclusions are well-supported by the results.
The authors have appropriately addressed the concern regarding the bounding box generation method for IoU calculation. The added discussion on how their approach may underestimate detection performance, especially for irregularly shaped lesions, is well noted. Acknowledging this limitation and indicating directions for future improvement strengthens the validity and transparency of the findings.
The discussion could be enhanced by integrating recent advances in explainability and interpretability of AI models in medical imaging. Notably, recent work employing Explainable AI (XAI) techniques with CNN-based breast cancer detection systems (e.g., [https://link.springer.com/article/10.1007/s42979-025-04170-3]) highlights the importance of transparency to build clinical trust.
No further issues noted.

Additional comments

The authors are commended for thoroughly addressing prior reviewer comments and improving the manuscript. The integration of Vision Transformers with multiscale analysis is a valuable contribution. For further enhancement, explicit motivation for using ViT as the baseline model for MIME-ViT should be added, clarifying why this architecture was chosen over alternatives. This will help readers appreciate the design philosophy and contextualize MIME-ViT’s performance gains. Including recent references related to AI-assisted breast cancer detection and explainability strengthens the manuscript’s scientific foundation and clinical relevance.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Applying vision transformer to assess multi-scale morphological features in mammography for breast cancer detection: multiscale image morphological extraction vision transformer (MIME-ViT) (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter (v0.2) - submitted Jul 23, 2025

Version 0.1 (original submission)

Ahmed Elazab · Apr 20, 2025 · Academic Editor

Major Revisions

The reviewers have raised some issues with the current version of the manuscript. Please, revise accordingly and address their concerns in the revised version.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

YuYao Lin · Mar 17, 2025

Basic reporting

Q1.The authors noted on the results that DETR-S did not recognize any lesions, masses, or calcifications, as shown in Table 1. However, it can be seen in Table 2 that Specificity is 100%, please explain the reason according to the formula of Specificity.

Experimental design

Q1. When writing the model training and implementation, the authors did not give the total number of training epoches and training data for each epoch (such as loss and IOU for training and validation sets). I hope the authors can add loss and IOU line charts during the training, so as to simplify readers' understanding and see the comparison more effectively.

Q2. In addition to Specificity, Sensitivity is also a critical indicator for comparison of medical image models, but the author has not specified Sensitivity, so I hope the author can add the comparison of Sensitivity.

Q3. Did the authors conduct Contrast experiment and ablation experiment when constructing the MIME-ViT model (such as the convolution kernel size of CNN and specific scale ranges of VIT)? If so, please supplement.

Q4. DETR is a VIT model proposed in 2020, which is a little old. Is there newer models that is consistent with your research subject? If so, I hope to make a comparison to ensure that your research is valuable.

Validity of the findings

no commen

Cite this review as

Lin Y (2025) Peer Review #1 of "Applying vision transformer to assess multi-scale morphological features in mammography for breast cancer detection: multiscale image morphological extraction vision transformer (MIME-ViT) (v0.1)". PeerJ Computer Science

Reviewer 2 · Mar 25, 2025

Basic reporting

English is Ok.
Recent literature references missing.

Experimental design

No Comment

Validity of the findings

The method of generating bounding boxes around the exterior of segmented patches for IoU calculation may have negatively impacted detection scores. This approach likely introduced excess background area, particularly for irregularly shaped lesions, potentially underestimating the model's true detection performance. Future work should explore alternative bounding box generation methods to provide a more accurate evaluation.

Additional comments

The manuscript presents a novel and well-executed study on breast cancer detection using the Multiscale Image Morphological Extraction Vision Transformer (MIME-ViT). The integration of Vision Transformers with CNNs to enhance mammographic imaging analysis is particularly noteworthy. The study is well-structured, precise, and clearly reported, and it adds significant value to the field of medical imaging. The authors’ effort in developing an advanced model for breast cancer detection is commendable.

However, I have the following suggestions for improvement:
1. While the study is well-researched, most of the cited references are relatively old, with limited citations from 2022, 2023, and 2024. To enhance the relevance and timeliness of the study, the authors are encouraged to incorporate more recent literature.

2. A parametric table summarizing the MIME-ViT model’s key attributes and a comparative analysis with other model variants would strengthen the manuscript. This would offer readers a clearer understanding of the model’s architectural choices, hyperparameters, and performance in relation to existing methods. Small details such as batch size, and how much patience callback is used for early stopping to prevent overfitting, may also be added.

Overall, this study represents a significant advancement in the application of deep learning for breast cancer detection. Addressing these points would further enhance its impact and comprehensiveness.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Applying vision transformer to assess multi-scale morphological features in mammography for breast cancer detection: multiscale image morphological extraction vision transformer (MIME-ViT) (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted Feb 19, 2025

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Applying vision transformer to assess multi-scale morphological features in mammography for breast cancer detection: multiscale image morphological extraction vision transformer (MIME-ViT)

Summary

Version 0.3 (accepted)

Ahmed Elazab · Sep 8, 2025 · Academic Editor

Version 0.2

Ahmed Elazab · Aug 4, 2025 · Academic Editor

YuYao Lin · Aug 2, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · Aug 1, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.1 (original submission)

Ahmed Elazab · Apr 20, 2025 · Academic Editor

YuYao Lin · Mar 17, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · Mar 25, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
Applying vision transformer to assess multi-scale morphological features in mammography for breast cancer detection: multiscale image morphological extraction vision transformer (MIME-ViT)