All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Thank you for your valuable contribution.
[# PeerJ Staff Note - this decision was reviewed and approved by Shawn Gomez, a PeerJ Section Editor covering this Section #]
no comment
no comment
no comment
no comment
Very good.
Very good.
Very good.
Most of the points raised in my first review have been satisfactorily addressed in the revised manuscript.
-
-
-
Thank you for your thorough and thoughtful revisions in response to the prior review. I acknowledge and appreciate the substantial improvements made in the current version of the manuscript, especially in the following areas:
(-) Enhanced clarity and conciseness in the Introduction and Methodology.
(-) Addition of hyperparameter ranges and stratified k-fold cross-validation.
(-) Inclusion of ablation studies and comparisons with lightweight models.
(-) Clarified novelty claims and practical positioning of the proposed approach.
(-) More transparent reporting of external validation performance and error analysis, including per-class confusion matrix and case study examples.
These changes address the remaining concerns from the previous round, and the manuscript now demonstrates greater methodological rigor and practical relevance. Although the contributions remain incremental, they are meaningful and presented with appropriate modesty.
I therefore support the acceptance of this revised manuscript for publication.
There are a few remaining items that need to be addressed.
no comment
no comment
no comment
no comment
Most of the initial review comments have been satisfactorily addressed and the suggested changes have been incorporated into the revised manuscript.
Very good.
Very good.
The manuscript shows clear improvement in language and structure compared to the previous version. Professional editing has enhanced readability, and redundant justifications about transfer learning have been minimized. The novelty is now emphasized more specifically (multi-stream resizing, tailored augmentation, domain-aware fine-tuning).
However, some expressions in the Introduction and Methodology remain verbose or repetitive, and the novelty is still somewhat overstated relative to the actual technical contribution. Further tightening of the writing style would strengthen clarity and avoid exaggeration.
The authors have expanded the methodology substantially, now providing:
(1) Systematic hyperparameter tuning (grid search with reported ranges).
(2) Ablation studies demonstrating the contribution of resizing, augmentation, and multi-stream approaches.
(3) Comparative experiments with lightweight models (MobileNetV3, ShuffleNet, EfficientNet-Lite).
These are valuable improvements. Nevertheless:
(1) The multi-stream fusion approach, while better explained (with pseudocode), still represents an architectural variation rather than a clear methodological innovation.
(2) The overall contribution remains incremental, largely refining existing pipelines rather than introducing novel algorithmic advances.
The addition of statistical testing (paired t-tests, confidence intervals) and stratified k-fold cross-validation improves the reliability of results. The authors also now clarify that the high accuracy (99.2%) applies only to internal validation, with a tempered interpretation in the conclusions.
That said, two concerns remain:
The reported external dataset performance (AUC 0.95–0.98) appears unusually high and somewhat inconsistent with typical literature trends, where external validation often shows significant performance drops. This requires stronger explanation and transparency regarding dataset composition and evaluation protocols.
Error analysis has been improved but remains limited; per-class performance breakdown and deeper discussion of common misclassifications would add credibility.
Overall, the manuscript has improved in clarity, methodology transparency, and robustness of reporting. It is closer to publication readiness. However, the work still represents an incremental advance, with limited methodological novelty, and the external dataset results warrant further scrutiny.
I suggest acceptance after the following points are addressed:
(1) Provide more detailed error analysis (per-class confusion, case studies of misclassification).
(2) Clarify and justify the very high external validation results to avoid the perception of over-reporting.
(3) Streamline the novelty claims so they are proportional to the actual contribution.
Please go through all requests and criticisms of the reviewers.
**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
No Comments
The manuscript does not provide specific hyperparameters (e.g., batch size, optimizer settings, learning rate schedules) used during training. This omission hinders reproducibility.
The dataset split (70% train, 15% validation, 15% test) lacks justification. Uneven splits or small test sets may skew performance metrics. Justify the split ratio and consider stratified sampling to maintain class balance.
While data augmentation is used to address class imbalance (malignant samples <2%), the manuscript does not quantify its effectiveness. Metrics like F1-score or AUC for minority classes are missing. Report per-class metrics and consider techniques like SMOTE or weighted loss functions.
The manuscript notes that augmented models perform worse on external datasets (AUC drops from 0.96 to 0.41), suggesting augmentation may introduce artificial patterns. Analyze which augmentations harm generalization (e.g., brightness shifts) and validate on diverse external datasets.
The "multi-stream" approach (resizing, cropping) is described vaguely. It is unclear how these streams are integrated (e.g., late fusion, attention mechanisms). Provide a diagram or pseudocode detailing the fusion strategy.
The reported 99.2% accuracy is suspiciously high for medical imaging tasks, especially given the ROC curve (AUC = 0.42) shows poor discrimination (worse than random). Reconcile discrepancies between accuracy and AUC metrics. Highlight potential overfitting or dataset biases.
The manuscript mentions k-fold cross-validation but does not report results. Single-split validation risks optimistic bias. Include k-fold results (mean ± std) for robustness.
Performance drops significantly on external data (AUC 0.41), indicating poor generalization. Discuss domain shift (e.g., skin tone diversity) and propose mitigation strategies.
Good.
Good.
Good.
None.
The manuscript is improved in organization and language compared to the first submission. The authors have attempted to clarify the novelty, include clinical relevance, and justify the use of EfficientNetV2 B0. The addition of a detailed pseudocode for both EfficientNetV2 B0 and ImageNet implementation, and expanded explanations on preprocessing and augmentation, enhances replicability. However, redundancy and unclear expressions remain in several sections (e.g., Introduction and Methodology), and the novelty still appears overstated relative to the content.
Recommendation:
(1) Conduct professional English editing for clarity and conciseness.
(2) Avoid repeated general justifications (e.g., about the usefulness of transfer learning) and focus on how this work is novel.
The authors now provide a more comprehensive explanation of the preprocessing steps, augmentation, architecture, and training process, including the dataset split (70/15/15), model parameters, and augmentation details. They include ablation studies and report both internal validation and external dataset testing. These additions are significant improvements.
However:
(1) The hyperparameter tuning is still not systematically explained (e.g., how learning rate or batch size was selected).
(2) The justification for architectural choices (e.g., resizing + multi-stream approach) lacks comparative baseline evaluation.
(3) The methodology still does not provide sufficient technical depth or innovation to distinguish it from numerous previous works.
Although the authors now acknowledge overfitting and address external validation, several concerns remain: The cross-validation AUC drops drastically from 0.96 to 0.41 on external data, suggesting serious overfitting; Figure 10 and Figure 11 show contradictory model performance (e.g., EfficientNetV2 performing worse than random on external data); Claims of 99.2% accuracy remain unsupported given poor generalization; this must be contextualized and downplayed in the conclusions; There is still no statistical significance testing, no confidence intervals, and limited error analysis.
Recommendation:
(1) Include statistical comparisons to baseline models (e.g., p-values or CI).
(2) Clearly state that high accuracy is on internal validation only and temper the interpretation.
(3) Discuss causes and mitigation of overfitting.
The revised manuscript demonstrates improvements in reporting and structure, with added discussions of clinical relevance and methodology. However, the work remains limited in terms of scientific contribution and methodological innovation. The use of EfficientNetV2 B0 and ImageNet for skin cancer classification is well-established, and the current study does not introduce substantial novel approaches, optimization strategies, or insights into clinical deployment challenges. Furthermore, the significant drop in external dataset performance undermines the robustness of the model.
To strengthen the manuscript:
(1) Clarify novelty beyond use of EfficientNetV2 and standard preprocessing.
(2) Conduct comparative experiments with other lightweight models.
(3) Expand the error analysis and provide statistical validation.
(4) Discuss deployment constraints, e.g., performance on different skin types and resource-constrained environments.
Please follow in detail the requests and criticisms of the reviewers.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.
**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
The manuscript suffers from awkward phrasing, typos (e.g., “Everyone offers a transfer learning-based approach…”), and inconsistent tense usage.
Examples:
Line 29: “Everyone offers...” should likely be “The authors propose...”
Line 277: "The training set has been significantly improved using..." is vague and passive.
Redundant phrases are common (e.g., the benefits of EfficientNetV2 B0 and transfer learning are repeated across sections).
Overuse of the phrase “deep learning” without concise definitions or differentiation among models.
Figures are referenced (e.g., Figure 4, 7), but descriptions are vague, and some images seem low-resolution or not well-integrated into the narrative.
The pseudocode tables (e.g., Table 5 and Table 6) are poorly formatted and difficult to interpret.
Model configuration (e.g., batch size, learning rate) is inconsistently mentioned and lacks justification.
The rationale for using three resized image streams (100%, 75%, 50%) lacks empirical comparison or ablation testing.
The model is only compared with other CNN-based methods, omitting traditional machine learning or hybrid techniques for contrast.
Although the dataset split is mentioned (70% training, 30% test, 50% validation, which seems inconsistent), no cross-validation or external dataset testing is performed.
What are the major contributions of this work? Novelty is lacking. Compare your approach with recent approaches.
A reported accuracy of 99.2% is unusually high given the class imbalance and dataset variability: no metrics like AUC-ROC or confidence intervals are provided.
Results suggest overfitting, yet no regularization methods are discussed.
The confusion matrix is referenced but not analyzed. There’s no mention of false positives or false negatives: critical for medical diagnostics.
The superiority claim over other methods lacks statistical validation or p-value reporting.
Revise the English for clarity, technical tone, and grammar.
Provide more comprehensive evaluation, including:
Cross-validation
ROC curves
External dataset testing
Include ablation studies to demonstrate the contribution of resizing, augmentation, or multi-stream inputs.
Improve formatting of pseudocode and architecture diagrams—currently unclear and not standard-compliant.
Discuss clinical relevance, limitations (false negatives in real-world settings), and the need for explainability in AI diagnostics.
The paper proposes a transfer learning approach using EfficientNetV2 B0 and ImageNet for melanoma classification on the ISIC 2020 dataset. It highlights the model's potential for early skin cancer detection, though further validation across diverse populations is needed.
General comments:
1. While augmentation techniques (e.g., rotation, zoom, flipping) increase dataset size and variability, they may introduce artificial patterns that do not fully represent real-world variations in dermoscopy images. The paper does not provide a detailed analysis of how these augmentations impact the model's ability to generalize to unaugmented, real-world images, which could affect its clinical reliability.
2. The paper lacks a thorough discussion of the computational trade-offs involved in training three independent CNNs at different scales.
3. The related work section provides a deficient analysis of relevant methods for skin cancer classification, such as [R1] "MoNuSAC2020: A multi-organ nuclei segmentation and classification challenge," IEEE Transactions on Medical Imaging, 2021; and [R2] "Melanoma detection using adversarial training and deep transfer learning," Physics in Medicine & Biology, 2020.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.
4. The paper does not adequately discuss the limitations of the proposed method. A critical discussion on potential weaknesses should be included.
The experimental design effectively utilizes the ISIC 2020 dataset, robust preprocessing, and a multi-scale CNN approach to achieve high melanoma classification accuracy, but it is limited by the absence of cross-validation, and limited generalizability across diverse populations.
The findings, with a reported 99.2% accuracy in melanoma classification using EfficientNetV2 B0 and ImageNet on the ISIC 2020 dataset, are promising but their validity is limited by the lack of cross-validation and testing on diverse populations, raising concerns about overfitting and generalizability.
The manuscript is written in generally understandable English, though it contains several redundant phrases and lacks clarity in parts of the methodology section. The introduction and related work provide an overview of existing studies, but the manuscript fails to clearly define a research gap or justify the novelty of the proposed approach. The use of transfer learning with EfficientNet and ImageNet for skin lesion classification is already well-established in the literature, and no unique motivation or innovation is convincingly presented.
The experimental setup involves standard preprocessing, data augmentation, and training of pre-trained CNN models using the ISIC 2020 dataset. However, there is no clear rationale for design choices such as data splitting proportions or hyperparameter settings. Furthermore, the model training strategy and evaluation process lack detail and do not follow best practices in experimental reproducibility. The work does not introduce any novel architecture, optimization, or data handling techniques beyond common practices.
The study claims high classification accuracy (up to 99.2%), but these results are presented without rigorous validation or comparison against current state-of-the-art methods. The manuscript does not include statistical testing, confidence intervals, or error analysis, which are essential to verify the significance of findings. Overall, the conclusions are not well-supported by the evidence provided, and the reported performance raises concerns about potential overfitting or data leakage.
The manuscript under review is devoid of both originality and significant contributions. The employment of transfer learning with ImageNet datasets in CNN models, often referred to as pre-trained CNN, is a prevalent practice among researchers aimed at enhancing the performance of their models.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.